In-essay Quantum Country prompts mildly boost performance on first few repetition (2020 efficacy experiment)

What are the effects of the Mnemonic medium’s embedded prompts? If we imagine a simpler alternative medium, in which essays are simply bundled with a downloadable Anki deck to be reviewed sometime later, what are we losing? Do the prompts feel out-of-context? (The mnemonic medium gives structure to normally-atomized spaced repetition memory prompts) Do we lose an important element of the reading experience? (The mnemonic medium’s in-text prompts may support active reading).

In 2020-01 Quantum Country efficacy experiment, we held out nine prompts from the essay reading experience for some users and re-inserted them into their first review session. The control group also reviewed these prompts after five days. We can use this manipulation to begin to answer some of the broader questions above.

I guess I can ask: is there some time-equivalence between these groups? Like, by what repetition number does the manipulated group begin performing roughly the same as the control group? Is there e.g. a two repetition “penalty”? That seems like a decent place to start.

delay5Days, median correct answers out of 9 (IQR); average (N=78) 20210413104653:

  • repetition 1: 5 (4-7); 5.4
  • repetition 2: 8 (7-9); 7.3
  • repetition 3: 8 (7-9); 7.7

control, median correct answers out of 9 (IQR); average (N=122) 20210413105221:

  • in-essay: 7 (5-9); 6.6
  • repetition 1: 7 (6-9); 7.0
  • repetition 2: 8 (7-9); 7.6
  • repetition 3: 9 (7-9); 8.1

So the control group performs better, but the difference isn’t that enormous, particularly given that the control group performs an extra repetition (i.e. more work). There’s a substantial calendar-time advantage to the control group, but that’s not necessarily important.

The difference might have been much larger if the manipulated group hadn’t been reviewing all the other prompts in-essay as well.

We might like to know the long-term impact of embedded prompts on efficiency and retention. Like, over the first six months, what was the “cost” of these prompts for the manipulated and control groups? How many reviews (and how many calendar days) did it take each group of users to reach the two week level of retention? Was the long-term fluency of the manipulated users affected?

One key problem for the efficiency comparisons is that by having both groups review five days later, I’ve really handicapped the efficiency of the control group. They complete one extra review no matter what. Probably their first repetition could be a month or more out after that, whereas I expect the manipulated group needs the prompts to be introduced relatively soon. Of course, this analysis suggests otherwise: Accuracy rates for withheld Quantum Country questions were roughly equal at 5 days and 1 month (2020 efficacy experiment)