occurs when a behavior is paired with a consequence, a process called operant conditioning. Though B. F. Skinner originated the term operant conditioning (also known as instrumental conditioning), his approach to studying animal behavior was largely based on the work of Edward L. Thorndike. As a graduate student, Edward Thorndike studied how success and failure affect behavior (i.e., trial and error learning) by putting cats (among other species) inside a “puzzle box.” The cats had incentive to leave the box; they were hungry and there was food outside of the box that entrapped them. The box could be opened from the inside, but only if the cat pressed a lever, pulled a string, and lifted a latch. Naturally, a cat with no experience would struggle haphazardly to get out of the box. During its struggle, it would accidentally press the lever, pull the string, and lift the latch, and voila! the door would open. At first, the cats were slow and unsystematic when trying to open the box. However, Thorndike observed that the cats opened the box faster with more practice. Based on these observations, Thorndike developed the “law of effect,” which states that behaviors resulting in a pleasant consequence are likely to be repeated, and those resulting in an unpleasant consequence are likely to stop.
B. F. Skinner found Thorndike’s experimental setup to be lacking, mainly because he had to place the cat in the puzzle box after every successful escape. Thus, he looked to create new equipment. The apparatus he made was an operant chamber, a box in which a pigeon could peck an illuminated disk or a rat could press a lever to earn food (see Figure 3.1). With this apparatus, Skinner was able to control exactly when the animals would be rewarded and didn’t have to take the animal out after every trial. Furthermore, data from the operant chamber were collected electronically on a device called a cumulative recorder. He ran a series of experiments in which he tested how an animal’s response rate increased or decreased as a result of the frequency of reward. Skinner differentiated between the behaviors in his operant chamber from reflexes by using the term operant behavior.
Figure 3.1 A pigeon in a modern, touch‐screen‐equipped operant chamber.
Unlike responses learned through respondent conditioning, operant behaviors are those that “operate” or act on their environment to produce consequences. A key distinction between respondent behaviors and operant behaviors is that operant behaviors are strengthened and weakened by consequences. For example, if the key is turned then the car starts; if the tail is pulled then the dog bites; if the target is touched then food is delivered; if a leash is pulled then the dog is choked; if the electric fence is touched then the animal is shocked. With operant conditioning, the consequence only occurs if the animal engages in a particular behavior; the consequence impacts the likelihood that the behavior occurs again.
Through his research, Skinner demonstrated the effects of reinforcement and punishment. He found that behavior can be changed by its consequences and went on to distinguish between two types of consequences based on how they affect behavior. Behaviors that are followed by reinforcement are strengthened and more likely to occur again in the future. Thorndike’s cat that pressed the lever, pulled a string, and lifted a latch to leave the box was likely to repeat that sequence and even get faster at it because there was food available after escaping. On the other hand, behaviors that are followed by punishment are weakened and less likely to occur again. If instead of getting food after escaping the cat experienced an electric shock, the cat is less likely to repeat the sequence needed to escape the box. It is important to note that reinforcement and punishment are defined functionally. This means that it doesn’t matter what the consequence is, it could be food, a sound, or an object. As long as a stimulus increases behavior, it is reinforcement, and as long as it decreases behavior, it is punishment.
Table 3.1 The four contingencies in operant conditioning.
Increases behavior (reinforcement) | Decreases behavior (punishment) | |
---|---|---|
Stimulus is added (positive) | Positive reinforcement | Positive punishment |
Stimulus is removed (negative) | Negative reinforcement | Negative punishment |
Skinner (1938, 1953) identified four basic arrangements by which operant conditioning occurs (see Table 3.1). In this context, the words “positive” and “negative” are related to mathematical terms; “positive” means adding a stimulus to the situation, and “negative” means taking away a stimulus. Adding or removing a stimulus can increase or decrease behavior, depending on the situation. To train a dog to sit, a trainer might offer a dog a treat after she sits down. This would be an instance of positive reinforcement because the consequence consisted of a treat added to the dog’s environment resulting in an increased likelihood of sitting in the future. A cat owner might describe using a spray bottle to reduce furniture scratching. This would be an instance of positive punishment because the consequence—the water spray—was added to the cat’s environment and decreased scratching.
In negative reinforcement, a response results in the removal of an aversive event, and the response increases. The negative reinforcer is ordinarily something the animal tries to avoid or escape, such as a shock from an electric fence. For example, consider training a dog to sit. Instead of offering the dog a treat, a trainer might put pressure on the dog’s bottom to get the dog to sit and then release the pressure once the dog is sitting. Assuming the behavior of sitting increases, the behavior of sitting was negatively reinforced. The response (sitting) results in the removal of an event (pressure from the trainer’s hand) and the likelihood of the response increases (sitting when hand is on their bottom). A second example of negative reinforcement is a guard dog barking at a fence as a person walks by. If that person leaves the dog’s sight, the dog is likely to bark at the next person that comes to the fence. The response (barking) results in the removal of an event (seeing a person) and the likelihood of the response increases (barking when a person walks by).
The last basic arrangement is negative punishment. In this case, the removal of a stimulus decreases the target behavior. For example, if a dog jumps on their owner to get the person’s attention, the owner might remove that attention by walking away or turning their back to the dog in an attempt to decrease the behavior. If the jumping up behavior decreases when attention is removed, this is an example of negative punishment. Negative punishment occurs when a behavior results in the removal of a pleasant stimulus, causing a decrease in the behavior’s occurrence in the future.
3.4 Effectiveness of Consequences
There are two major factors that can determine the effectiveness of reinforcement and punishment: when and how often the consequences occur. Remember that operant conditioning takes place when a behavior is paired or associated with a consequence. It becomes increasingly difficult for an association to take place if the consequence is delayed from the moment behavior occurs (Wilkenfield et al. 1992). Therefore, timing (the when) is one important factor for the effectiveness of consequences during the acquisition of new behaviors.
Browne et al. (2013) demonstrated the importance of timing by attempting to teach dogs to sniff the inside of one of two containers with either an immediately delivered reinforcer or a reinforcer delayed by 1 second. Most dogs (86%) were able to learn the behavior within 20 minutes when treats were delivered immediately. In contrast, only 40% of dogs learned the behavior when treats were delayed by 1 second. In fact, if a consequence is delayed from the moment of the target behavior,