Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19(4–5), 528–558

Experimental test of multiple choice vs. short answer on the testing effect. Henry L. Roediger

They ran two similar experiments on undergraduates. In the first, multiple choice produced better recall in a retention test 3 days later, but they noticed that this may be because students got a much greater proportion of questions correct on the initial test, so they ran a second experiment which provided feedback after the initial test.

I’ll summarize just the second experiment, since it’s more relevant to the kind of Spaced repetition memory system design I’m doing. Students who practiced via short-answer enhanced their recall the most relative to a control condition and a condition in which students simply read the correct answers. Short-answer-practicing students performed better than those who practiced via multiple-choice on a multiple choice post-test (d=0.41) but not significantly better than MC-practicers on a short-answer post-test.

Looking at the conditional probabilities, we can see that students who practiced with short answer questions were much more likely to correct wrong answers:

Some weak evidence here for Retrieval practice may be less effective without feedback:

Post hoc comparisons using independent samples t-tests revealed that whereas the provision of feedback during the intervening SA tests led to greater final test performance, t(190) = 2.75, d = 0.40, feedback during the intervening MC tests did not make a difference to final performance, t(190)= -1.06, p = .29.

Theoretical questions

Is the best study format dependent on the test format—i.e. studying via multiple choice will produce better results on a multiple choice test, and studying via short answer will produce better results on a short answer test? Theoretically, the authors suggest this would follow the “transfer appropriate processing framework”, which suggests that “memory performance depends on the overlap between encoding and retrieval processes.” It would make the testing effect more of a “test practice effect” than something durably useful. These results suggest that’s not so: matched formats didn’t produce the best performance.

Or is the best study format about retrieval effort (Desirable difficulties, after Bjork)? Is the additional effort involved in the short answer questions the cause of increased memory performance? The data seem to suggest this interpretation, since studying via short answer produced the most reliable retention in any kind of post-test.

On the relationship to the Generation effect:

The effect of SA testing bears noticeable similarity to the generation effect. It has been amply demonstrated that when target items are self- generated by subjects in response to cues provided by the experimenter, those items are better retained than items merely presented to be read (Slamecka & Graf, 1978). Importantly, Slamecka and Fevreiski (1983) found that the generation effect is obtained even when subjects fail to correctly generate an item, if the correct response is presented after the failed generation attempt. This, of course, does not mean that the causal mechanisms behind the generation effect and the SA testing advantage obtained in our study are identical. Investigations into the generation effect have shown that factors other than retrieval can contribute to the effect (e.g., allocation of attentional resources at encoding; Schmidt, 1990).

Last updated 2023-07-13.