==Update, 2021-04-12==
I don’t really believe this analysis anymore. It’s comparing two different user cohorts. For more recent analysis on lapses, see Demonstrated retention reliably bounds future recall attempts on Quantum Country. On retry, see Retry intervention produces substantial increases in early accuracy on Quantum Country.
If a reader forgets a question in a review session, we shorten that question’s interval so that they’re more likely to remember it during the next session in which it appears.
Question 1: Does shortening lapsed question intervals help readers remember? Or do people often just keep forgetting, over and over again?
Question 2: We recently added a related intervention: when a reader forgets a question during a review session, they’ll review it again in that session. Does that make readers more likely to remember the answer during the next session in which it appears?
I can answer both of those questions at odnce. I’ll focus on forgotten questions at the 2-week level, since that’s the first level which (in our new schedule) will shorten subsequent review intervals when forgotten. If a new-schedule reader forgets a 5-day question, it stays at 5 days. Query 1
Here are the accuracy rates on “lapsed” 2-week questions, in their first non-retry review after they were forgotten:
==Update, 2020-03-07:==
These accuracy rates sit fairly low, compared to readers’ overall accuracy rates for questions at the 2-week level, which is 95% (N=71,475) Query 2. We’d expect these lapsed-question accuracies to be lower, but that’s a lot lower!
The lapsed-question accuracy rates remain similar at longer intervals (can’t compare with versus without retry—too little data):
So, big-picture: if readers forget the answer to a question, they’re fairly likely (~1:4) to forget it again next time, roughly irrespective of interval.
This analysis doesn’t dig into whether it’s the same questions being forgotten over and over again for a given reader. That’d be interesting to know.
It maybe helps a bit.
The confidence intervals overlap slightly, but I’d believe it’s a couple percentage points. (Also, the binomial confidence interval analysis here is wonky because individual lapsed question performance will be highly correlated within samples from an individual reader).
Another limitation of this analysis is that new-schedule readers will have reviewed a given question fewer times when they hit the 2-week mark than their old-schedule peers. I’m comparing 2-week lapses to 2-week lapses to control for interval, but I’m not controlling for repetition. That said, there appears to be fairly little variation with repetition count, so it probably doesn’t matter that much.
~75% is probably too low to produce long-term confidence here. We’ll probably need a stronger intervention to address lapses reliably.
If retry isn’t terribly helpful, should we remove it? It’s making people do more work. My instinct is that retry is emotionally important. It communicates: “hey! we know you forgot that, but don’t fret: you’ll see it again soon, and we’re keeping track.” Of course, we don’t want to waste people’s time or mislead them—it’d be better to replace retry with a stronger mechanic that also yields this emotional response—but the cost seems low now.
Query 1:
WITH
users AS (
SELECT
userID,
schedule AS category
FROM
`logs.registeredUsers`),
reviewsWithoutRetry AS (
SELECT
*
FROM
`logs.reviews`
WHERE
isRetry IS NOT TRUE
AND sessionID IS NOT NULL),
laggingReviewTimestamps AS (
SELECT
*,
LAG(timestamp) OVER (PARTITION BY userID, cardID ORDER BY timestamp ASC) AS previousReviewTimestamp,
LAG(reviewMarking) OVER (PARTITION BY userID, cardID ORDER BY timestamp ASC) AS previousReviewMarking,
LAG(beforeInterval) OVER (PARTITION BY userID, cardID ORDER BY timestamp ASC) AS previousInterval
FROM
reviewsWithoutRetry),
withRetryInBetween AS (
SELECT
*,
IF
((
SELECT
COUNT(*) > 0
FROM
`logs.reviews` AS r
WHERE
r.userID = l.userID
AND r.cardID = l.cardID
AND r.timestamp > l.previousReviewTimestamp
AND r.timestamp < l.timestamp
AND r.isRetry IS TRUE),
TRUE,
FALSE) AS didInterveningRetry
FROM
laggingReviewTimestamps AS l),
accuracies AS (
SELECT
previousInterval,
category,
COUNT(*) AS N,
COUNTIF(reviewMarking="remembered") AS countCorrect,
COUNTIF(reviewMarking="remembered")/COUNT(*) AS accuracy
FROM
withRetryInBetween
JOIN
users
USING
(userID)
WHERE
previousReviewMarking = "forgotten"
AND previousInterval > 1000*60*60*24*5
AND ((category = "aggressiveStart"
AND didInterveningRetry)
OR (category = "original"
AND NOT didInterveningRetry))
GROUP BY
category,
previousInterval
ORDER BY
category,
previousInterval),
Query 2:
SELECT
COUNTIF(reviewMarking="remembered")/COUNT(*) AS accuracy,
COUNT(*) AS N
FROM
`logs.reviews`
WHERE
beforeInterval = 1209600000
AND isRetry IS NOT TRUE