A technique for prompting Large language models: ask them to “think step by step” (or similar) and to emit a line of reasoning before producing the desired output. This often improves performance, sometimes quite substantially.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
This is an interesting case study in AI risk because the GPT-3 paper was published 19 months earlier. And so, this is an instance of us discovering, a year and a half later, that a publicly-released model was in fact much more capable than we’d thought. This (and related instances) suggest that it may not be possible to reliably “evaluate models’ capabilities before public release”. See Bowman, S. R. (2023). Eight Things to Know about Large Language Models for more discussion on this point.