Agarwal, P. K., Nunes, L. D., & Blunt, J. R. (2021). Retrieval Practice Consistently Benefits Student Learning: A Systematic Review of Applied Research in Schools and Classrooms. Educational Psychology Review.

A systematic review from Pooja Agarwal and colleagues examining “apples-to-apples” comparisons of Retrieval practice versus “business as usual” practice in classroom, typically lecture, but including any activities which don’t qualify as retrieval (including Concept mapping, see Does concept mapping involve covert retrieval practice?). Highly focused on single-measure quantitative outcomes. Covers 50 experiments across 37 publications.

Included only studies which actually performed retrieval practice on relevant course materials, contra e.g. Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology, 30(9), 641–656; see Most (especially early) experimental literature on the spacing effect involves inauthentic learning environments

Unsurprisingly, the effects are generally quite large:

In other words, 28 out of 49 Cohen’s ds were greater than 0.50.

Roughly {1/3} of studies showed d>0.8; roughly {1/4} showed 0.5<d<0.8; roughly {40}% of studies showed d<0.5.

By education level, {middle school} students seemed to receive the greatest gains, and {undergraduates} the least, but there’s a lot of noise in this data.

Includes one study (McDermott, 2014) which has d=2.19!

Retrieval practice and transfer learning: “Effect sizes were generally smaller for experiments with rephrased questions compared to experiments with verbatim or repeated questions.” They didn’t plot or analyze this data, but Adam Comella kindly did (notebook). Looks like a real, though likely “small” effect.

Follow-up queue

Q. This systematic review includes papers comparing performance of retrieval practice to what?
A. Any kind of non-retrieval “business as usual” practice, including lecture, re-studying, etc.

Q. What finding was made w.r.t. rephrased vs. verbatim test questions?
A. Effects of retrieval practice were smaller when questions were rephrased on the test, by a relatively small amount.

Q. Key finding from this paper on retrieval practice vs. testing modality?
A. Matching modalities produced ~2x effect sizes; choice of (matched) modality not terribly important.