In-essay Quantum Country prompts boost performance on first repetition

What are the effects of the Mnemonic medium’s embedded prompts? If we imagine a simpler alternative medium, in which essays are simply bundled with a downloadable Anki deck to be reviewed sometime later, what are we losing? Do the prompts feel out-of-context? (The mnemonic medium gives structure to normally-atomized spaced repetition memory prompts) Do we lose an important element of the reading experience? (Mnemonic medium prompts are interleaved into the reading experience).

Comparing at one month using 2021-04 data

Comparing across cohorts of two different experiments, we can get a one month sample of the impact of in-essay prompts. In 2020-01 Quantum Country efficacy experiment, we held out nine prompts from the essay reading experience for some users and re-inserted them a month later; in 2021-04 Quantum Country schedule experiment we have a 1 month initial interval (plus make-up sessions).

For the hardest two cards (xiNW1zgeb2ITHGi6uQtg and s8duZcGBbu0dxb4xEAGg), recall rates were 42% without support and 71% / 68% with in-essay practice and make-up sessions.

For the easiest card (h1AXHXVtsGKxkamS8Hb2), recall rates were 89% without support and 91% with in-essay practice and make-up sessions.

So the impact is greatest for “hard” questions. For “easy” questions, in-essay support may not be necessary.

See Log: Quantum Country analysis at 2022-02-03 and the preceding couple day for sources and full data.

Comparing at five days using 2020-01 data

In 2020-01 Quantum Country efficacy experiment, we held out nine prompts from the essay reading experience for some users and re-inserted them into their first review session. The control group also reviewed these prompts after five days. We can use this manipulation to begin to answer some of the broader questions above.

One approach is to ask: is there some time-equivalence between these groups? Like, by what repetition number does the manipulated group begin performing roughly the same as the control group? Is there e.g. a two repetition “penalty”? That seems like a decent place to start.

delay5Days, median correct answers out of 9 (IQR); average (N=78) 20210413104653:

repetition 1: 5 (4-7); 5.4
repetition 2: 8 (7-9); 7.3
repetition 3: 8 (7-9); 7.7

control, median correct answers out of 9 (IQR); average (N=122) 20210413105221:

in-essay: 7 (5-9); 6.6
repetition 1: 7 (6-9); 7.0
repetition 2: 8 (7-9); 7.6
repetition 3: 9 (7-9); 8.1

In repetition 1, control does have a substantial advantage. But it’s not large, particularly given that the control group performs an extra repetition (i.e. more work). There’s a substantial calendar-time advantage to the control group, but that’s not necessarily important.

Subsequent repetitions are hard to compare directly. Because the median number of correct answers was 5 in the first repetition for delay5Days, the second repetition of that group has many more prompts at a 5-day interval than the second repetition of the second group, which will be almost entirely at a 2-week interval. So yes, people perform about as well in repetition 2 in both groups, but control is doing a harder task in that repetition.

The difference might have been much larger if the manipulated group hadn’t been reviewing all the other prompts in-essay as well.

We might like to know the long-term impact of embedded prompts on efficiency and retention. Like, over the first six months, what was the “cost” of these prompts for the manipulated and control groups? How many reviews (and how many calendar days) did it take each group of users to reach the two week level of retention? Was the long-term fluency of the manipulated users affected?

One key problem for the efficiency comparisons is that by having both groups review five days later, I’ve really handicapped the efficiency of the control group. They complete one extra review no matter what. Probably their first repetition could be a month or more out after that, whereas I expect the manipulated group needs the prompts to be introduced relatively soon. Of course, this analysis suggests otherwise: Accuracy rates for withheld Quantum Country questions were roughly equal at 5 days and 1 month (2020 efficacy experiment)