L5 - Operant Conditioning PDF
Document Details
Uploaded by BestSellingBalalaika
Tags
Summary
This document discusses operant conditioning, a type of learning that focuses on how consequences influence behavior. It examines various aspects of operant conditioning, including flooding and desensitization techniques. The document uses real-life examples to illustrate these concepts.
Full Transcript
Peter had an intense fear of rabbits. Jones had fear-evoking stimuli under circumstances in which he or a rabbit gradually brought closer to Peter while he she remains relaxed (Kaskas et al., 2...
Peter had an intense fear of rabbits. Jones had fear-evoking stimuli under circumstances in which he or a rabbit gradually brought closer to Peter while he she remains relaxed (Kaskas et al., 2017). For example, munched candy and cookies. Jones first placed the rab- while feeling relaxed, Little Albert might have been bit in a far corner of the room while Peter munched given an opportunity to look at photos of rats or to see and crunched. Peter cast a wary eye, but he continued rats from a distance before they were brought closer. to consume the treats. Over a couple of months, the Systematic desensitization takes longer than flooding animal was brought closer until Peter simultaneously but is not as unpleasant. ate and touched the rabbit. Jones theorized that the joy In any event, people can learn by means of simple of eating was incompatible with fear and thus counter- association. In terms of the evolutionary perspective, conditioned it. organisms that can learn by several routes—including conditioning and conscious reflection—would stand a 5-3d FLOODING AND SYSTEMATIC greater chance of survival than organisms whose learn- DESENSITIZATION ing is limited to conditioning. If Mary Cover Jones had simply plopped the rabbit on Peter’s lap rather than bring it gradually closer, she would 5-4 OPERANT CONDITIONING: have been using the method of flooding. Flooding, like counterconditioning, is a behavior therapy method LEARNING WHAT DOES WHAT for reducing fears. It is based on the classical condi- tioning principle of extinction (Green et al., 2017). In TO WHAT flooding, the client is exposed to the fear-evoking stimu- Through classical conditioning, we learn to associate lus until fear is extinguished. Little Albert, for example, stimuli. As a result, a simple, usually passive response might have been placed in close contact with a rat until made to one stimulus is then made in response to his fear had become extinguished. In extinction, the the other. In the case of Little Albert, clang- CS (in this case, the rat) is presented repeatedly ing noises were associated with a rat. As in the absence of the UCS (the clanging of a result, the rat came to elicit the fear the steel bars) until the CR (fear) is no caused by the clanging. However, clas- longer evoked. sical conditioning is only one kind of Although flooding is usually effec- learning that occurs in these situa- tive, it is unpleasant. (When you are tions. After Little Albert acquired his fearful of rats, being placed in a room fear of the rat, his voluntary behavior with one is no picnic.) For this reason, changed: he tried to avoid the rat as a behavior therapists frequently prefer to way of reducing his fear. Thus, Little use systematic desensitization, in Albert engaged in another kind of which the client is gradually exposed to learning—operant conditioning. As a follow-up to Watson and flooding a behavioral fear-reduction technique Rayner’s experiment with Little based on principles of classical Albert, Mary Cover Jones conditioning; fear-evoking showed that fears could stimuli (CSs) are presented be counterconditioned continuously in the absence by associating the of actual harm so that feared object with fear responses (CRs) are extinguished pleasant experiences. She famously fed systematic a two-year-old boy desensitization jsolpietro/Shutterstock.com cookies and candy a behavioral fear-reduction technique in which a hierarchy while a feared rabbit of fear-evoking stimuli is was brought gradually presented while the person closer. remains relaxed CHAPTER 5: Learning 129 Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 129 6/14/18 9:06 AM In operant conditioning, organisms learn to do Thorndike explained the cat’s learning to pull the things—or not to do things—because of the conse- string in terms of his law of effect: a response (such quences of their behavior. For example, I avoided but- as string pulling) would be—to use Thorndike’s term— tered popcorn to prevent nausea. But we also seek fluids “stamped in” (i.e., strengthened) in a particular situ- when we are thirsty, sex when we are aroused, and an ation (such as being inside a puzzle box) by a reward ambient temperature of 688 to 708F when (escaping from the box and eating). But we feel too hot or too cold. Classical con- Fig.5.3 PROJECT PIGEON punishments—using Thorndike’s termi- ditioning focuses on how organisms form nology once again—“stamp out” response. anticipations about their environments. That is, organisms would learn not to Operant conditioning focuses on what behave in ways that bring on punishment. they do about them. Let’s look at the Later, we shall see that the effects of pun- contributions of Edward L. Thorndike ishment on learning are not so certain. and B. F. Skinner to operant conditioning. 5-4b B. F. SKINNER AND 5-4a EDWARD L. THORNDIKE REINFORCEMENT AND THE LAW OF EFFECT When it comes to unusual war stories, In the 1890s, stray cats were mysteri- few will top that of B. F. Skinner (1904– ously disappearing from the alleyways 1990). One of Skinner’s wartime efforts of Harlem. Some of them, it turned out, was “Project Pigeon.” During World were being brought to the quarters of War II, Skinner proposed that pigeons be Columbia University doctoral student trained to guide missiles to their targets. Edward L. Thorndike (1874–1939). In their training, the pigeons would be Thorndike was using them as subjects in reinforced with food pellets for peck- experiments on the effects of rewards and ing at targets projected onto a screen (see punishments on learning. Figure 5.3). Once trained, the pigeons Thorndike placed a cat in a “puzzle would be placed in missiles. Their peck- box.” If it pulled a dangling string, a latch ing at similar targets displayed on a would be released, allowing it to jump screen would correct the missile’s flight out and reach a bowl of food. When first path, resulting in a “hit” and a sacrificed placed in a puzzle box, a cat would claw pigeon. However, plans for building the and bite at the bars and wire. Through necessary missile—for some reason called such random behavior, it might take three the Pelican and not the Pigeon—were to four minutes for the cat to chance upon scrapped. The pigeon equipment was too the response of pulling the string. When bulky, and Skinner’s suggestion was not placed back in the cage, it might again taken seriously. take several minutes for the cat to pull Project Pigeon may have been the string. But with repetition, it took less scrapped, but the principles of learning time, and after seven or eight tries, the cat Skinner applied to the project have found might pull the string immediately. wide application. Skinner taught pigeons and other animals to engage in operant behavior, behavior that operates on, or law of effect Thorndike’s view that pleasant manipulates, the environment. In classical events stamp in responses, and unpleasant events stamp them out conditioning, involuntary responses such as salivation or eyeblinks are often con- reinforce to follow a response with a stimulus that increases the frequency of the response ditioned. In operant conditioning, volun- During World War II, B. F. Skinner proposed the use tary responses such as pecking at a target, operant behavior behavior that operates on, or manipulates, the environment of pigeons that had been pressing a lever, or skills required for play- trained to peck at images ing tennis are acquired, or conditioned. operant conditioning a simple form of of military targets to guide Operant conditioning is there- learning in which an organism learns to engage in missiles to them. It never behavior because it is reinforced fore defined as a simple form of learning happened. in which an organism learns to engage in 130 CHAPTER 5: Learning Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 130 6/14/18 9:06 AM Fig.5.4 A RAT IN A “SKINNER BOX” T F During World War II, psychologist B. F. Skinner Water Light Screen proposed that pigeons be trained to guide missiles to their targets. Food pellet dispenser It is true that B. F. Skinner proposed that pigeons be used to guide missiles to their target during World War II. However, the proposal never got off the ground. certain behavior because of the effects of that behavior. In operant conditioning, we learn to engage in operant behaviors, also known simply as operants, that result in Food tray Lever presumably desirable outcomes such as food, a hug, an A on a test, attention, or social approval. For example, some children learn to conform to social rules to earn the atten- Skinner used the conditioning of rats and pigeons as tion and approval of their parents and teachers. Ironically, “models” for much human learning. His boxes allowed him to control the environments of the animals and demonstrate other children may learn to “misbehave” because mis- how the environments determined the animals’ behavior. behavior also gets attention. In particular, children may Skinner’s approach left no roles for thinking and decision learn to be “bad” when their “good” behavior is routinely making. ignored. Some children who do not do well in school seek the approval of deviant peers (Wentzel & Muenks, 2016). The rat in Figure 5.4 was deprived of food and placed 5-4c METHODS OF OPERANT CONDITIONING in a Skinner box with a lever at one end. At first it sniffed Skinner (1938) made many theoretical and technologi- its way around the cage and engaged in random behav- cal innovations. Among them was his focus on discrete ior. The rat’s first pressing of the lever was accidental. behaviors, such as lever pressing, as the unit, or type, of However, because of this action, a food pellet dropped behavior to be studied. Other psychologists might focus into the cage. The arrival of the food pellet increased the on how organisms think or “feel.” Skinner focused on probability that the rat would press the lever again. The measurable things they do. Many psychologists have pellet is thus said to have reinforced lever pressing. found these kinds of behavior inconsequential, espe- In operant conditioning, it matters little why or how cially when it comes to explaining and predicting human the first “correct” response is made. The animal can hap- behavior. But Skinner’s supporters point out that focus- pen on it by chance or be physically guided to make the ing on discrete behavior creates the potential for helpful response. You may command your dog to “Sit!” and then changes. For example, in helping people combat depres- press its backside down until it is sitting. Finally, you rein- sion, one psychologist might focus on their “feelings.” A force sitting with food or a pat on the head and a kind Skinnerian would focus on cataloging (and modifying) word. Animal trainers use physical guiding or coaxing to the types of things that “depressed people” do. Directly bring about the first “correct” response. Can you imagine modifying depressive behavior might also brighten how long it would take to train your dog if you waited for clients’ self-reports about their “feelings of depression.” it to sit or roll over and then seized the opportunity to To study operant behavior, Skinner devised an ani- command it to sit or roll over? mal cage (or “operant chamber”) that has been dubbed People, of course, can be verbally guided into the Skinner box. (Skinner himself repeatedly requested desired responses when they are learning tasks such as that his operant chamber not be called a Skinner box, spelling, adding numbers, or operating a machine. But but history has thus far failed to honor his wishes.) Such they need to be informed when they have made the a box is shown in Figure 5.4. The cage is ideal for labora- correct response. Often, tory experimentation because experimental conditions knowledge of results is all operant the same as an operant can be carefully introduced and removed, and their the reinforcement people behavior effects on laboratory animals can be observed. need to learn new skills. CHAPTER 5: Learning 131 Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 131 6/14/18 9:06 AM 5-4d TYPES OF REINFORCERS For example, some students socialize when they should be studying because the pleasure of socializing is imme- Any stimulus that increases the probabil- diate. Studying may not pay off until the final ity that responses preceding it—whether exam or graduation. (This is why younger pecking a button in a Skinner box or study- students do better with frequent tests.) ing for a quiz will be repeated—serves as a It is difficult to quit smoking ciga- reinforcer. Reinforcers include food pellets rettes because the reinforcement when an animal has been deprived of food, water of nicotine is immediate and the when it has been deprived of liquid, the oppor- tunity to mate, and the sound of a tone that has health consequences of smok- previously been associated with eating. Skinner ing are more distant. Focusing distinguished between positive and negative rein- on short-term reinforcement forcers and primary and secondary reinforcers. is also connected with risky sex, such as engaging in sex- POSITIVE AND NEGATIVE ual activity with a stranger or REINFORCERS Positive failing to prevent pregnancy reinforcers increase the Nina Leen/Time Life Pictures/Getty Images (Rettenberger et al., 2016; probability that a behavior will Shuper et al., 2010). One occur when they are applied. of the aspects of being Food and approval usually human is the ability to serve as positive reinforc- foresee the long-range ers. Negative reinforcers consequences of one’s increase the probability that behavior and to make a behavior will occur when choices. But immedi- the reinforcers are removed (see Figure 5.5). People often Skinner himself repeatedly requested that his operant ate reinforcers—such as learn to plan ahead so that they chamber not be called a Skinner box, but history has those cookies staring in thus far failed to honor his wishes. the face of the would-be need not fear that things will go wrong. In such cases, fear acts dieter—can be powerful as a negative reinforcer because temptations indeed. removal of fear increases the probability that the behaviors PRIMARY AND SECONDARY REINFORCERS We can preceding it (such as planning ahead) will be repeated. also distinguish between primary and secondary, or con- IMMEDIATE VERSUS DELAYED REINFORCERS ditioned, reinforcers. Primary reinforcers are effec- Immediate reinforcers are more effective than delayed tive because of the organism’s biological makeup. For reinforcers. Therefore, the short-term consequences of behavior often provide more Fig.5.5 POSITIVE VERSUS NEGATIVE REINFORCERS of an incentive than the long- term consequences. Procedure Behavior Consequence Change in behavior Positive reinforcer Use of positive Behavior (teacher approval) Frequency positive reinforcer a reinforcer reinforcement (studying) is presented of behavior that when presented increases the when student studies increases frequency of an operant (student studies more) negative reinforcer a reinforcer that when removed Negative reinforcer increases the frequency of an operant Use of negative Behavior (teacher disapproval) Frequency reinforcement (studying) is removed of behavior primary reinforcer when student studies increases an unlearned reinforcer whose effectiveness is based on the (student studies more) biological makeup of the organism and not on learning 132 CHAPTER 5: Learning Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 132 6/14/18 9:06 AM 5-4f REINFORCERS VERSUS Nicotine Creates Short-Term Reinforcement. REWARDS AND One of the difficulties in quitting PUNISHMENTS C. Sherburne/Photolink/Getty Images smoking cigarettes is that the Reinforcers are defined as stimuli that reinforcement of nicotine is strong increase the frequency of behavior. and immediate, whereas the Skinner distinguished between rein- health consequences of smoking are a distant and forcers, on the one hand, and rewards uncertain punishment or threat. and punishments, on the other. Reinforcers are known by their effects, whereas rewards and punishments example, food, water, warmth (positive reinforcers), are more known by how they feel. It may be that most and pain (a negative reinforcer) all serve as primary reinforcers—food, hugs, having the other person admit to reinforcers. Secondary reinforcers acquire their starting the argument, and so on—feel good, or are pleas- value through being associated with established ant events. Yet things that we might assume would feel bad, reinforcers. For this reason they are also termed such as a slap on the hand, disapproval from a teacher, even conditioned reinforcers. We may seek money suspensions and detention may be positively reinforcing to because we have learned that it may be exchanged for some people—perhaps because such experiences confirm primary reinforcers. negative feelings toward teachers or one’s belonging within a deviant subculture (Atkins et al., 2002; Chaplain, 2017). 5-4e EXTINCTION AND SPONTANEOUS Skinner preferred the concept of reinforcement to that of reward because reinforcement does not suggest RECOVERY IN OPERANT trying to “get inside the head” of an organism (whether CONDITIONING a human or lower animal) to guess what it would find Keisha’s teacher writes “Good” on all of her homework pleasant or unpleasant. A list of reinforcers is arrived at assignments before returning them. One day, her scientifically and empirically—that is, by observing what teacher no longer writes anything on the assignments— sorts of stimuli increase the frequency of the behavior. the reinforcement ends. Reinforcers are used to Whereas reinforcers—even negative reinforcers— strengthen responses. What happens when reinforce- increase the frequency of the behavior they follow, pun- ment stops? ishments decrease it (see Figure 5.6). Punishment can In Pavlov’s experiment, the meat powder was the rapidly suppress undesirable behavior (Marchant et al., event that followed and confirmed the appropriateness 2013) and may be warranted in “emergencies,” such as of salivation. In operant conditioning, the ensuing events when a child tries to run into the street. are reinforcers. The extinction of learned responses Psychologists distinguish between positive punish- results from the repeated performance of operant behav- ments and negative punishments. Both kinds of punish- ior without reinforcement. Keisha might stop doing her ments are aversive events, and both decrease the frequency homework if she is not reinforced for completing it. In of the behavior they follow. Positive punishment is the other words, reinforcers maintain operant behavior or application of an aversive stimulus to decrease unwanted strengthen habitual behavior in operant conditioning. behavior, such as spanking, scolding, or a parking ticket. With humans, fortunately, people can reinforce them- Negative punishment is the removal of a pleasant stimu- selves for desired behavior by telling themselves they lus, such as removing a student’s opportunity to talk with did a good job—or in Keisha’s case, she may tell herself friends in class by seating them apart, or removing a stu- that she is doing the right thing regardless of whether her dent’s opportunity to mentally escape from class by taking teacher recognizes it. his or her smart phone or Spontaneous recovery of learned responses occurs tablet computer. “Time secondary reinforcer in operant conditioning as well as in classical condition- out” is a form of nega- a stimulus that gains reinforcement ing. Spontaneous recovery is adaptive in operant condi- tive punishment because value through association with established reinforcers tioning as well as in classical conditioning. Reinforcers it places a misbehaving may once again become available after time elapses, child in an environment conditioned reinforcer another term for a secondary just as there are new tender sprouts on twigs when the in which she or he cannot reinforcer spring arrives. experience rewards. CHAPTER 5: Learning 133 Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 133 6/14/18 9:06 AM For the pigeon in our experi- Fig.5.6 NEGATIVE REINFORCERS VERSUS PUNISHMENTS ment, the behavior of pecking the button when the light is off Procedure Behavior Consequence Change in behavior is extinguished. A moment’s reflection will Negative reinforcer Use of negative Behavior (teacher disapproval) Frequency suggest many ways in which reinforcement (studying) is removed of behavior discriminative stimuli influence when student studies increases our behavior. Isn’t it more effi- (student studies more) cient to answer the telephone when it is ringing? Do you Punishment Frequency think it is wise to ask someone Behavior (detention) Use of of behavior (talking is presented decreases for a favor when she or he is dis- punishment in class) when student talks playing anger and disapproval in class toward you? (student talks less in class) We noted that a pigeon learns to peck a button if food drops into its cage when it does so. What if you want the pigeon to continue to peck the button, but you’re running out of 5-4g DISCRIMINATIVE STIMULI food? Do not despair. As we see in the following section, Skinner might not have succeeded in getting his pigeons you can keep that bird pecking away indefinitely, even as into the drivers’ seats of missiles, but he had no prob- you hold up on most of the food. lem training them to respond to traffic lights. Imagine yourself try- 5-4h SCHEDULES OF ing the following experiment: place “Education is what REINFORCEMENT a pigeon in a Skinner box with a button on the wall. Deprive it of survives when what In operant conditioning, some food for a while. Drop a food pellet responses are maintained by means into the cage whenever it pecks the has been learned of continuous reinforcement. button. Soon it will learn to peck has been forgotten.” You probably become warmer every the button. Now you place a small time you put on heavy clothing. green light in the cage and turn it B. F. Skinner, American You probably become less thirsty on and off intermittently through- Psychologist (1904–1990) every time you drink water. Yet if out the day. Reinforce button peck- you have ever watched people toss ing with food whenever the green their money down the maws of slot light is on, but not when the light is off. It will not take machines, you know that behavior can also be maintained long for the pigeon to learn that it will gain as much by by means of partial reinforcement. grooming itself or cooing and flapping around as it will Folklore about gambling is based on solid learning by pecking the button when the light is off. theory. You can get a person “hooked” on gambling by The green light has fixing the game to allow heavy winnings at first. Then become a discriminative you gradually space out the winnings (reinforcements) discriminative stimulus stimulus. Discriminative until gambling is maintained by infrequent winning—or in operant conditioning, a stimulus that indicates that reinforcement stimuli , such as green even no winning at all. Partial reinforcement sched- is available or red lights, indicate ules can maintain gambling, like other behavior, for a whether behavior (in the great deal of time, even though it goes unreinforced continuous reinforcement a schedule of reinforcement in case of the pigeon, pecking (James et al., 2017). which every correct response is a button) will be reinforced Responses that have been maintained by partial reinforced (by a food pellet being reinforcement are more resistant to extinction than partial reinforcement dropped into the cage). responses that have been maintained by continuous one of several reinforcement Behaviors (or operants) reinforcement (Yeung et al., 2014). From the cogni- schedules in which not every correct that are not reinforced tive perspective, we could suggest that organisms that response is reinforced tend to be extinguished. have experienced partial reinforcement do not expect 134 CHAPTER 5: Learning Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 134 6/14/18 9:06 AM reinforcement every time they engage in a response. Therefore, they are more likely to persist in the absence of reinforcement. T F Slot-machine players pop coins into the machines most There are four basic reinforcement schedules: fixed- rapidly when they have no idea when they might win. interval, variable-interval, fixed-ratio, and variable-ratio. It is true that slot-machine players pop INTERVAL SCHEDULES In a fixed-interval schedule, coins into the machines most rapidly a fixed amount of time—say, a minute—must elapse when they have no idea when they before the correct response will result in a reinforcer. With might win. Uncertainty of reinforcement a fixed-interval schedule, an organism’s response rate falls can lead to rapid repetition of behavior. off after each reinforcement and then picks up again as the time when reinforcement will occur approaches. For example, in a one-minute fixed-interval schedule, a rat is reinforced with, say, a food pellet for the first operant—for Reinforcement is more unpredictable in a variable- example, the first pressing of a lever—that occurs after interval schedule. Therefore, the response rate is one minute has elapsed. lower, but it is also steadier. If the boss calls us in for The rat’s rate of lever pressing slows down after each a weekly report, we probably work hard to pull things reinforcement, but as the end of the one-minute interval together just before the report is to be given, just as we draws near, lever pressing increases in frequency, as sug- might cram the night before a weekly quiz. But if we know gested in Figure 5.7. It is as if the rat has learned that it must that the boss might call us in for a report on the progress wait awhile before it is reinforced. The resultant record on of a certain project at any time (variable-interval), we are the cumulative recorder shows a typical series of upward likely to keep things in a state of reasonable readiness at waves, or scallops, which are called fixed-interval scallops. all times. However, our efforts are unlikely to have the Car dealers use fixed-interval reinforcement sched- intensity they would in a fixed-interval schedule (e.g., a ules when they offer incentives for buying up the remain- weekly report). Similarly, we are less likely to cram for der of the year’s line in summer and fall. In a sense, they unpredictable pop quizzes than to study for regularly are suppressing buying at other times, except for consum- scheduled quizzes. But we are likely to do at least some ers whose current cars are in their death throes or those studying on a regular basis in preparation for pop quizzes. with little self-control. Similarly, you learn to check your Likewise, if you receive email from your correspondent email only at a certain time of day if your correspondent irregularly, you are likely to check your email regularly writes at that time each day. for his or her communications, but with less eagerness. RATIO SCHEDULES In a fixed-ratio Fig.5.7 THE FIXED-INTERVAL SCALLOP schedule, reinforcement is provided after a fixed number of correct responses have been made. In a variable-ratio schedule, reinforcement is provided Cumulative frequency of responses Response rate picks up as time of availability of fixed-interval schedule a schedule reinforcement approaches in which a fixed amount of time must elapse between the previous and subsequent times that reinforcement is available variable-interval schedule a schedule in which a variable amount of time must elapse between the previous and subsequent times that Response rate slacks reinforcement is available off after reinforcement fixed-ratio schedule a schedule in which reinforcement is provided after a fixed number of correct responses 0 variable-ratio schedule a schedule in Time which reinforcement is provided after a variable Fixed-interval schedule number of correct responses CHAPTER 5: Learning 135 Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203 41055_ch05_hr_120-145.indd 135 6/14/18 9:06 AM