IRT parameters predict reader and question performance no better than simple recall rates

In analyses like QCVC questions are initially forgotten at very different rates, I’ve written about “difficult” vs “easy” questions and similarly for “high-performing” vs “low-performing” readers. Giacomo Randazzo has nudged me to try using Item response theory difficulty / ability parameters for this rather than simple recall rates, since these paint a more holistic picture and should account better for stochastic variations.

Do these make a difference?

Looks like: not really, no. Ability/recall correlation is 0.94 and Easiness/recall correlation is somewhat lower at 0.49. But I think the relatively lower latter value is mostly a matter of the handful of outliers we’re seeing, which are almost all at or near a recall rate of 100%. I mean, it makes sense that it’s hard to distinguish the easiness of multiple questions with a perfect or near-perfect recall rate. They’re not discriminative.

OK, but… do the IRT parameters predict first repetition behavior better than dumb in-essay recall rates? … nope! Correlations of 0.39 and 0.65 respectively.

What about for ability parameters? … nope! They’re about the same. Correlations of 0.78 and 0.8 respectively.

See qc-analysis.Rmd, IRT Fits (psych)