Roediger, H. L., & Karpicke, J. D. (2006). The Power of Testing Memory: Basic Research and Implications for Educational Practice. Perspectives on Psychological Science, 1(3), 181–210

A thorough review paper on the Testing effect. This paper covers the history of experiment on the test in laboratory settings and more practical classroom settings and reviews some more recent research by the authors. Henry L. Roediger

In free recall:

Repeatedly studying material is beneficial for tests given soon after learning, but on delayed criterial tests with retention intervals measured in days or weeks, prior testing can produce greater performance than prior studying. In the case of delayed recall, test trials produce a much greater gain than study trials.

Testing reduces forgetting of recently studied material, and multiple tests have a greater effect in slowing forgetting than does a single test

The same effects are observed for paired-associate learning (i.e. tasks in which students are learning pairs of words and are later tested on recall of one member of the pair using the other).

On the Spacing effect:

Most of the research has indicated that spaced retrieval practice leads to better retention than massed practice, but the evidence is mixed regarding whether expanding-interval retrieval is a superior form of spaced retrieval. The most recent evidence points to the conclusion that expanding-interval retrieval may not benefit long-term retention, as was originally thought, because the initial test in an expanding schedule appears too soon after study, rendering it ineffective for enhancing learning. Although the efficacy of expanding and equally spaced schedules remains an open issue, the research we have reviewed shows that delaying an initial retrieval attempt and spacing repeated tests often will boost later retention with paired-associate materials.

On question type:

Most evidence points to the conclusion that tests involving production of information (essay and short-answer tests) produce greater benefits on later tests than do multiple-choice tests, which involve recognition of a correct answer among alternatives. The literature is not totally consistent on this point, however, so it remains a hypothesis for further investigation.

On theories of the testing effect:

Can the testing effect simply be due to spending more time with the material? No, because many tests use restudying as a control condition, which involves just as much or more time (since the testing condition spends no extra time with material which has been forgotten).

Various experiments suggest that increased retrieval effort enhances later retention; this suggests that the retrieval process itself is at play in the testing effect. See also Desirable difficulties, after Bjork.

Maybe effortful retrieval creates more retrieval routes?

For example, McDaniel and Masson (1985) manipulated whether studied words were processed with semantic or phonemic encoding tasks, the typical levels-of-processing manipulation (Craik & Tulving, 1975). Soon after study, subjects were given cued-recall tests with phonemic or semantic cues, and the cues either matched or mismatched the type of initial encoding. Subjects took a final cued-recall test 24 hr later. (There were also conditions in which items were tested only on the second test, to assess the testing effect.) McDaniel and Masson found that the testing effect that appeared on the second test was greater when the cues for the first test mismatched the original encoding and yet successful retrieval occurred than when the cues on the first test and the type of encoding matched. This result can be understood as due to an increase in the types of retrieval routes that permit access to the memory trace (or perhaps a multiplexing of the features of the memory trace itself).

This theory suggests that retrieval performance depends on the similarity between the retrieval process and the encoding process. For instance, studying definitions may not help you use terms in context. Through this lens, we might interpret more difficult testing tasks as ones which are creating more practice with transfer. The theory seems not to be strictly true, since various laboratory tests don’t suggest a dependence of the testing effect on the final test’s format.

To summarize:

More specifically, elaboration of encoding, more effortful or deeper encoding, and creation of different routes of access can account for the basic effect.

Jeffrey Karpicke

Last updated 2023-03-29.