Operant conditioning

Key explanatory framework of Behaviorism, pioneered by B. F. Skinner. Unlike the behaviors in Classical conditioning (“respondent behaviors”), operant conditioning involves “operant behaviors” which are voluntary and affected by their consequences (the provision/withdrawal of rewards/punishments), possibly in the presence of some discriminative stimulus (like a light being on).

Thorndike’s puzzle box was a classic experiment in this vein: put a cat in a box, observe that it takes less time to escape (and eat its reward, sitting outside the box) by pressing a lever in successive trials. This experiment suggested that classical conditioning couldn’t explain all behavior, since it doesn’t explain how a cat’s reward would cause it to learn to operate the lever. He called this the “law of effect”: pleasant consequences increase behaviors; unpleasant consequences inhibit behaviors.

Skinner wanted to extend this work to study it more precisely. He constructed “skinner boxes,” which were controlled chambers used to study operant conditioning. Animals are taught to perform actions (pressing a lever) in response to a stimulus (light, sound). When the animal performs the action a reward is delivered or a punishment avoided.

Reinforcement and consequences

In operant conditioning, there are two categories of consequences: a {reinforcer} consequence follows a behavior if {the future probability of the behavior increases}; a {punisher} if {it decreases}.

Either type of consequence can be made positive or negative.

Positive reinforcement is when a desirable outcome occurs after a behavior (increasing that behavior).
Negative reinforcement is when an undesirable outcome doesn’t occur after a behavior (also increasing that behavior).
Conversely, negative punishment is when a reward is withheld, and positive punishment is when a negative consequence is applied.

Q. What is negative punishment?
A. When a reward is withheld (hence decreasing a behavior).

Q. What does operant conditioning call training which relies on disciplining a student when they do something wrong?
A. Positive punishment.

Q. What does operant conditioning call training which relies on ceasing negative stimuli when they behave as desired?
A. Negative reinforcement.

Reinforcement schedules

Different patterns of operant learning will result from different schedules of reinforcement. For instance, outcomes might occur after some number of behaviors (fixed ratio) or amount of time (fixed interval), or an unpredictable number (variable ratio) / amount (variable interval). The most impactful schedule is variable ratio—like a slot machine. Fixed ratio is next, then the interval schedules.

Q. What reinforcement schedule for operant conditioning produces the highest rate of response?
A. Variable ratio

Q. In a variable ratio scheme of operant conditioning, when does reinforcement occur?
A. After a variable number of responses since the previous reinforcement.

Last updated 2023-07-13.