A spaced repetition algorithm can only work well when the user feedback allows it to reliably map their knowledge state. Do people “cheat” when self-grading prompts? In what situations? Do they lie more often when a prompt is particularly old?
Quantum Country data suggests that lying isn’t totally ubiquitous. We can see, for instance, that Half of all long-term Quantum Country lapses come from just 12% of its questions. So if there’s a lot of lying, we might not have absolute measures of accuracy, but at least we have relative measures. And we can still compare two conditions to see relative differences in behavior, so long as we’d expect the same distribution of cheating in both conditions.
That said, the in-essay recall rate is supposedly 100% at the 75th %ile. I don’t really buy that. Many of these readers are probably lying. 20220127110634. If you include people who completed a first repetition, this falls to something like 95%, which is more plausible (about 6 questions forgotten). 20220127111101
But one problem even for the relative approach is that cheating may be time-dependent. Maybe people start out mostly honest but eventually just “phone it in” and increasingly mark stuff as remembered.
We do know that this time-dependent lying isn’t complete, since when we surreptitiously insert questions readers have never seen before, they are much more likely to mark these questions as forgotten: Without review, most Quantum Country readers forget at least a third of the material after a month (2020 efficacy study). So we can still certainly get some signal… the problem is more that if there’s a time-dependent effect, we must be very careful in comparing accuracy numbers across conditions.