Quantum Country users who forget in-essay exhibit sharp forgetting curves

We begin to see this in the data from 2021-04 Quantum Country schedule experiment; see Log: Quantum Country analysis on 2021-11-16.

Forgetting in first review session

With the aggressiveStart schedule (used in late 2019 and 2020), questions forgotten in-essay were next reviewed five days later (same as remembered questions). To compensate for the longer intervals added in 2021-04 Quantum Country schedule experiment, I changed this behavior so that questions forgotten in-essay would be reviewed one day later before attempting recall across the lengthier interval. So we can see the differences in forgetting by comparing these two pools of samples (albeit from different cohorts):

1 day (759 readers, 5169 reviews): 71%
5 days (3315 readers, 25201 reviews): 58%

20220118113917

Readers don’t necessarily complete their reviews on the day they’re assigned. We can see more detail by looking at the number of days which actually elapsed. This plot depicts all data points which had at least 50 users (numbers above points represent # of reviews in sample):

20220119094651

So while the average recall rate for the cohort assigned to repeat the questions after 1 day was 71%, the readers who actually repeated the questions after 1 day scored 85%. It’s the later samples which drag down the average somewhat. Note the difference between 1 day and 2 days: quite striking! Almost a 10pp drop! Yet this is broadly consistent with what I’ve seen in the literature, more so than QC’s data often is.

A substantial take-away from this plot: if you weren’t able to answer something successfully while reading, then you’d better review that again soon if you want to remember it. Retrying until you can remember within the essay isn’t enough. Wait a week, and your recall rate has dropped under 60%.

One might imagine that the declines in these plots are really about conscientiousness: people who complete their reviews on time are more serious about studying, so they’ll get higher scores. But look at what happens at days 5 and 6, where the graphs cross. The scores are quite similar! If the conscientiousness hypothesis were true, you’d expect the 1-day curve to dip substantially below the 5-day curve.

What’s going on with day 4 on the 5-day curve? I’m not sure. Reviewing at 4 days is possible because we round the current day when computing which cards are due. Given that this sample is about 1/10th the size of its neighbor, it could just be noise we’re seeing: assuming a binomial sample, the 95% CI at that point is ±4%. (The 95% for the larger sample at 5 days is ±1%.)

There’s enough noise here that it’s maybe worth choosing a bucket width larger than 1 day. Here’s the same plot, bucketed at 48 hours—a little cleaner:

And with a 96 hour bucket width:

Forgetting after first delayed recall attempt

Then, following that session, averaging across all first delayed recall attempts, for prompts forgotten during initial read (20211011120323, figures updated 2022-01-19):

1 week (99 readers, 655 reviews): 86%
2 weeks (80 readers, 543 reviews): 77%
1 month (68 readers, 419 reviews): 71%
2 months (33 readers, 167 reviews): 57%

(https://docs.google.com/spreadsheets/d/1rSrlWq_Fg7xsL1BCe0o4c0lShvsuvmvZiJjluVMirCE/edit#gid=0)

Updated 2022-12-05; includes only people who ~finished a first delayed repetition of QCVC:

Very steep drop-off in participation (73 -> 32 -> 14 for 14 -> 30 -> 60 days), implying probably a pretty strong selection effect flattening this curve.

And using data from aggressiveStart, for the first recall attempt after the initially-forgotten card was remembered in a “make-up” review session:

2 weeks (1483 readers, 12818 reviews): 82% 20220118114835

While it’s not directly comparable (no retry), the original schedule had 87% for a comparable situation at 1 day. 20211116190100

I’ve tried to draw per-day “actual review day” forgetting curves here, as I did for the initial reviews, but the data are too noisy—not enough samples (as of 2022-01-19). 20220119105505

Older data, using aggressiveStart vs. original

I can’t demonstrate this is in a strongly controlled fashion, but I can get a sense of the forgetting curve for initially-forgotten questions if I use the differences between the first intervals for the old schedule (1 day) and the new schedule (5 days). We see roughly a 10pp recall % drop over the first 5 days (60% -> 50%). 20210408112426

Old schedule:

New (well, late 2019) schedule, without retry:

With retry: 20210408112811