In prompt generation, choosing reinforcement targets and writing prompts for those targets are two separate problems

For the problem of Using machine learning to generate good spaced repetition prompts from explanatory text, most people seem to be pointing the language model at a passage or a highlighted phrase and telling it “write some prompts about this!” The results rarely seem to be very good, except when the input text more or less fully determines what the prompt should be about (“Carbon’s atomic number is 6.”)

My impression, just intuitively, is that choosing what to write prompts about is much more difficult problem for a language model than the problem of writing a prompt to reinforce a specific detail. See Framing prompt generation as a filtering problem on reinforcement targets for more on this.

One reason for this is that the model cannot know—at least not without a lot of other external information—what you’re interested in, what you already know, what’s important relative to your goals in reading this material. This is already a problem when an author is writing prompts for a generic reader (see The mnemonic medium should give readers control over the prompts they collect), but at least in that case, the author has the structure of the whole book in mind, and strong opinions about what’s most important to learn as part of the subject.

Another reason for this relative difficulty is that we can give the model a lot of specific advice about how to write effective prompts. For instance, if we give it a model the principles from my “How to write good prompts”, and ask it to verify that each prompt it writes conforms to those principles, it does a much better job than if simply asked to write a prompt about a particular detail. By contrast, we don’t yet have a corresponding list of principles about target selection. We just tell it to e.g. “write prompts about the most important details.” Really, we need to encode some kind of theory of knowledge. Some progress is probably possible here, but I suspect we’ll always need user input here, even if only to refine the model’s selections. (Related, on a smaller scale: In prompt generation, LLMs lack prompt-writing patterns for complex conceptual material)

I see this difficulty as basically fine. It’s very natural to use a highlighter while reading to point at the parts which seem most important or interesting (Deciding to remember something with a spaced repetition system is (aspirationally) a lightweight gesture). That mostly does the job here. Going back through and writing prompts, given those highlights, is quite effortful and time-consuming (Writing good spaced repetition memory prompts is hard); if models can help with that, that’s great. Even highlights aren’t quite enough: In prompt generation, LLMs often need extra hints about what angle to reinforce.


A few inklings led me here over the past couple years. Ozzie Kirkby and I noticed (2021-06-10) that models generate much better prompts from sentences in personal notes which have already distilled what you care about from a longer passage. Likewise, the observation that GPT-3 can transform cloze deletion prompts into question-answer prompts is a prompt generation task where the reinforcement target has already been specified pretty precisely.

Last updated 2023-07-13.