DSPy

An attempt to create an optimizable programmable system for language models. The idea is that users can construct “programs” using large language models by describing desired inputs and outputs of various modules, which can be connected by traditional programming techniques (conditional branching, loops, computation, etc). Then a “compiler” (optimizer) tries to find the best prompt (mostly few-shot demos, but also in some cases wording) given some trainset and devset.

First author is Omar Khattab, at Stanford.

stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models

How does BootstrapFewShotWithRandomSearch work?

It’s built from LabeledFewShot and BootstrapFewShot.
LabeledFewShot provides few-shot demos by simply grabbing a random slice of the trainset.
- Because these demos come from the training data, they aren’t “augmented”.
- For example, if you use Retrieval-augmented generation or Chain-of-thought reasoning, those intermediate outputs won’t be included in the few-shot demos.
BootstrapFewShot uses a “teacher” program to train a “student” program.
- If a teacher program is not provided, one is synthesized by running LabeledFewShot on the student program.
- Then the compiler attempts to bootstrap full augmented traces by repeatedly evaluating the teacher on individual examples of the training set, keeping any traces in which the output satisfies a metric.
- The augmented traces are (optionally) supplemented with some of the unaugmented demos from LabeledFewShot, or from the validation set.
- BootstrapFewShot is designed for boolean metrics. Traces are kept if the metric evaluates to a truthy value. Only a few traces are stored as demos in the resulting program, and no attempt is made to use the traces which had the highest metric value.
- Key parameters:
- max_bootstrapped_demos=4: the number of augmented traces it tries to produce. Execution stops once this many examples have been boostrapped.
- max_labeled_demos=16: the number of “raw” demos the synthetic LabeledFewShot teacher will use; also, the total number of demos desired in the compiled program. When max_labeled_demos > max_bootstrapped_demos, extra unaugmented demos will be emitted.
- max_rounds=1: the number of times it will attempt to bootstrap each input example
BootstrapFewShotWithRandomSearch creates many candidate programs by shuffling the training set, and picks the best one.
- It begins by adding a few “simple” candidate programs to the pool:
- the student program, as-is
- the student program compiled by LabeledFewShot
- the student program compiled by BootstrapFewShot
- Then it creates a pool of random candidates by shuffling the training set, choosing a different number of bootstrapped samples to be try, and compiling the student with BootstrapFewShot.
- Each program is evaluated on the validation set; the program with the best scores is outputted.
- Key parameters:
- num_candidate_programs=16 the number of randomly permuted programs to create with BootstrapFewShot
- (other parameters are as in BootstrapFewShot)

Last updated 2024-01-30.