van Gog, T., & Sweller, J. (2015). Not New, but Nearly Forgotten: The Testing Effect Decreases or even Disappears as the Complexity of Learning Materials Increases. Educational Psychology Review, 27(2), 247–264

Review article opening a special issue of the journal on the limitations of the Testing effect with high Element interactivity material. With John Sweller.

The authors point out that almost all the literature about the Testing effect has worked with quite low complexity material, even when testing with students in classrooms. But there are studies dating back to 1914 (Kühn) and 1917 (Gates) which show diminished or absent testing effects for material with higher Element interactivity.

The authors suggest a few possible explanations, informed by some of the relevant experiments:

  1. With low element interactivity material, recall offers more of an opportunity to notice or generate organizational/relational structure than re-studying does. For high element interactivity material, this effect disappears because the material already exhibits clear organizational structure.
  2. For low element interactivity material, students may estimate that they’re already quite familiar, so they may not put much effort/attention into re-studying; whereas with recollecting they have no choice. For high element interactivity material, students know that they aren’t familiar with it, so they pay more attention when re-studying.

Note that the authors aren’t claiming that testing doesn’t work—few of the cited studies find inverted testing effects. It’s just that the advantage over re-studying is diminished or absent.


The special issue includes two commentaries criticizing Van Gog and Sweller’s conclusions.

Criticism by Karpicke and Aue

Karpicke, J. D., & Aue, W. R. (2015). The testing effect is alive and well with complex materials. Educational Psychology Review, 27, 317–326. https://doi.org/10.1007/s10648-015-9309-3

Jeffrey Karpicke and Aue reply in the same issue with their disagreement:

  1. Van Gog and Sweller conflate complexity of materials, the learning activity, and the assessment. K&A complain that Element interactivity has no quantitative metric. As a result, K&A claim that VG&S mis-rated lots of prior experiments as having lower element interactivity than was appropriate.
    • “tasks in which students filled in individual words in isolated sentences were ratedas high in element interactivity, while tasks where students freely recalled, produced summa-ries, answered inferential short-answer questions, or created concept maps were deemed lowor, at best, medium in element interactivity”
  2. Element interactivity wasn’t actually a manipulated variable in the cited experiments. (But it was in Karpicke, J. D., & Blunt, J. R. (2011). Retrieval Practice Produces More Learning than Elaborative Studying with Concept Mapping. Science, 331(6018), 772–775, which they marked as having low/medium element interactivity).
  3. VG+S simply fail to include some important experimental studies.

Finally, they argue that the studies which VG+S did include as high-element-interactivity mostly fail for methodological reasons but that they did in fact demonstrate small positive testing effects.

Criticism by Rawson

Rawson, K. (2015). The Status of the Testing Effect for Complex Materials: Still a Winner. Educational Psychology Review, 27. https://doi.org/10.1007/s10648-015-9308-4

She echoes K+A’s criticism that the the articles which VG+S cite actually have a small positive effect, not a null effect.

Regarding the prior literature, her conclusions are more moderated than those of K+A: she expresses concern that complex problem solving isn’t well represented in the literature, and notes that the experiments which exist report somewhat conflicting results. But on text material, she argues, the prior literature is on firmer footing (with feedback or high learning performance, g=0.73!)

Criticism in Roelle, J., & Berthold, K. (2017). Effects of incorporating retrieval into learning tasks: The complexity of the tasks matters. Learning and Instruction, 49, 142–156

Roelle and Berthed mention van Gog and Sweller’s concerns here. They explain the lack of testing effect in the cited experiments by pointing out that the conditions didn’t have the same kind of knowledge construction. The re-study “might have had an advantage because their task (rereading) made it possible for them to make connections between the target content items, whereas the learners in the retrieval groups, who were instructed to retrieve keywords missing from the content, probably did not engage in this relational processing.” In R+B’s experiments, the tasks are the same, and retrieval is controlled by making study open vs closed book. And they do indeed find a Testing effect.

Last updated 2023-07-25.