The first paper to suggest Worked example effect (AFAICT), by John Sweller and Graham Cooper. The paper predates Cognitive load theory but its claims are clearly starting to point in that direction.
Two main inspirations:
The authors suggest that students may be better off studying worked examples instead. They engage in five experiments to test this effect; controlling for study time, students who study worked examples take roughly half as long to solve similar problems on a post-test, and they make 1/5 as many errors. But the effect doesn’t seem to extend to problems which are varied on the post-test.
Q. High-level findings of experiment 5?
A. Algebra students who study worked examples take roughly half as long to solve similar problems on a post-test, and they make 1/5 as many errors. But the effect doesn’t seem to extend to problems which are varied on the post-test.
Q. Key limitation of findings?
A. The effect doesn’t seem to extend to post-test problems with even modest variations from the problems studied.