Instrumental Conditioning: Foundations Chapter 5 PDF

CHAPTER 5 Instrumental Conditioning: Foundations Early Investigations of Instrumental...

CHAPTER 5 Instrumental Conditioning: Foundations Early Investigations of Instrumental Omission Training or Negative Conditioning Punishment Modern Approaches to the Study of Fundamental Elements of Instrumental Instrumental Conditioning Conditioning Discrete-Trial Procedures The Instrumental Response Free-Operant Procedures The Instrumental Reinforcer Instrumental Conditioning Procedures The Response–Reinforcer Relation Positive Reinforcement Sample Questions Punishment Key Terms Negative Reinforcement CHAPTER PREVIEW This chapter begins our discussion of instrumental conditioning and goal-directed behavior. This is the type of conditioning that is involved in training a quarterback to throw a touchdown or a child to skip rope. In this type of conditioning, obtaining a goal or reinforcer depends on the prior occurrence of a designated response. I will first describe the origins of research on instrumental conditioning and the investigative methods used in contemporary research. This discussion lays the groundwork for the following section in which the four basic types of instrumental conditioning procedures are described. I will conclude the chapter with discussions of three fundamental elements of the instrumental conditioning paradigm: the instrumental response, the reinforcer or goal event, and the relation between the instrumental response and the goal event. In the preceding chapters, I discussed various aspects of how responses are elicited by discrete stimuli. Studies of habituation, sensitization, and classical conditioning are all concerned with the mechanisms of elicited behavior. Because of this emphasis, the procedures used in experiments on habituation, sensitization, and classical condition- ing do not require the participant to make a particular response to obtain food or 121 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 122 Chapter 5: Instrumental Conditioning: Foundations other unconditioned or conditioned stimuli. Classical conditioning reflects how organisms adjust to events in their environment that they do not directly control. In this chapter, we turn to the analysis of learning situations in which the stimuli an organism encounters are a result or consequence of its behavior. Such behavior is commonly referred to as goal-directed or instrumental because responding is neces- sary to produce a desired environmental outcome. By studying hard, a student can earn a better grade; by turning the car key in the ignition, a driver can start the engine; by putting a coin in a vending machine, a child can obtain a piece of candy. In all these cases, some aspect of the individual’s behavior is instrumental in producing a significant stimulus or outcome. Furthermore, the behavior occurs because similar actions produced the same type of outcome in the past. Students would not study if doing so did not yield better grades; drivers would not turn the ignition key if this did not start the engine; and children would not put coins in a vending machine if they did not get a candy in return. Behavior that occurs because it was previously effective in producing certain consequences is called instrumental behavior. The fact that the consequences of an action can determine whether you make that response again is obvious to everyone. If you happen to find a dollar bill when you glance down, you will keep looking at the ground as you walk. How such a consequence influences future behavior is not so readily apparent. Many of the upcoming chapters of this book are devoted to the mechanisms responsible for the control of behavior by its consequences. In this chapter, I will describe some of the history, basic techniques, pro- cedures, and issues in the experimental analysis of instrumental, or goal-directed, behavior. How might one investigate instrumental behavior? One way would be to go to the natural environment and look for examples. However, this approach is not likely to lead to definitive results because factors responsible for goal-directed behav- ior are difficult to isolate without experimental manipulation. Consider, for example, a dog sitting comfortably in its yard. When an intruder approaches, the dog starts to bark vigorously, and the intruder goes away. Because the dog’s barking is followed by the departure of the intruder, we may conclude that the dog barked to produce this outcome—that barking was goal-directed. However, an equally likely possibility is that barking was elicited by the novelty of the intruder and persisted as long as this eliciting stimulus was present. The departure of the intruder may have been inci- dental to the dog’s barking. Deciding between such alternatives is difficult without experimental manipulations of the relation between barking and its consequences. (For an experimental analysis of a similar situation in a fish species, see Losey & Sevenster, 1995.) Early Investigations of Instrumental Conditioning Laboratory and theoretical analyses of instrumental conditioning began in earnest with the work of the American psychologist E. L. Thorndike. Thorndike’s original intent was to study animal intelligence (Thorndike, 1898, 1911; for a more recent commentary, see Lattal, 1998). As I noted in Chapter 1, the publication of Darwin’s theory of evolution encouraged scientists to think about the extent to which human intellectual capacities were present in animals. Thorndike pursued this question through empirical research. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Early Investigations of Instrumental Conditioning 123 He devised a series of puzzle boxes for his experiments. His training procedure consisted of placing a hungry animal (often a young cat) in the puzzle box with some food left outside in plain view of the animal. The task for the animal was to learn how to get out of the box and get the food. Different puzzle boxes required different responses to get out. Some were easier than others. Figure 5.1 illustrates two of the easier puzzle boxes. In Box A, the required response was to pull a ring to release a latch that blocked the door on the outside. In Box I, the required response was to push down a lever, which released a Science Source latch. Initially, the cats were slow to make the correct response, but with continued practice on the task, their latencies became shorter and shorter. Figure 5.2 shows E. L. Thorndike the latencies of a cat to get out of Box A on successive trials. The cat took 160 sec- onds to get out of Box A on the first trial. Its shortest latency later on was 6 seconds (Chance, 1999). Thorndike’s careful empirical approach was a significant advance in the study of animal intelligence. Another important contribution was Thorndike’s strict avoidance of anthropomorphic interpretations of the behavior he observed. Although he titled his treatise Animal Intelligence, to Thorndike many aspects of behavior seemed rather unin- telligent. He did not think that his animals got faster in escaping from a puzzle box because they gained insight into the task or figured out how the release mechanism was Box A Box I FIGURE 5.1 Two of Thorndike’s puzzle boxes, A and I. In Box A, the participant had to pull a loop to release the door. In Box I, pressing down on a lever released a latch on the other side. (Left: Based on “Thorndike’s Puzzle Boxes and the Origins of the Experimental Analysis of Behavior,” by P. Chance, 1999, Journal of the Experimental Analysis of Behaviour, 72, pp. 433–440. Right: Based on Thorndike, Animal Intelligence Experimental Studies, 1898.) Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 124 Chapter 5: Instrumental Conditioning: Foundations FIGURE 5.2 Latencies to escape from Box A during successive trials. The longest latency was 160 seconds; the shortest was 6 seconds. (Notice that the axes are not labeled, as in Thorndike’s original report.) © Cengage Learning designed. Rather, he interpreted the results of his studies as reflecting the learning of a new S–R association. When a cat was initially placed in a box, it displayed a variety of responses typical of a confined animal. Eventually, some of these responses resulted in opening the door. Thorndike believed that such successful escapes led to the learning of an association between the stimuli of being in the puzzle box (S) and the effective escape response (R). As the association, or connection, between the box cues and the successful response became stronger, the animal came to make that response more quickly. The consequence of a successful escape response strengthened the association between the box stimuli and that response. On the basis of his research, Thorndike formulated the law of effect. The law of effect states that if a response R in the presence of a stimulus S is followed by a satisfying event, the association between the stimulus S and the response R becomes strengthened. If the response is followed by an annoying event, the S–R association is weakened. It is important to stress here that, according to the law of effect, what is learned is an associ- ation between the response and the stimuli present at the time of the response. Notice that the consequence of the response is not one of the elements in the association. The satisfying or annoying consequence simply serves to strengthen or weaken the associa- tion between the preceding stimulus and response. Thus, Thorndike’s law of effect involves S–R learning. Thorndike’s law of effect and S–R learning continue to be of considerable interest more than 100 years since these ideas were first proposed. A key feature of Thorndike’s S–R mechanism is that it compels the organism to make response R whenever stimulus S occurs. This feature has made the law of effect an attractive mechanism to explain compulsive habits that are difficult to break, such as biting one’s nails, snacking, or smoking cigarettes. Once you start on a bucket of popcorn while watching a movie, you cannot stop eating because the sight and smell of the popcorn (S) compels you to grab some more popcorn and eat it (R). The compulsive nature of eating popcorn is such that you continue to eat beyond the point of enjoying the taste. Once learned, habitual responses occur because they are triggered by an antecedent stimulus and not because they result in a desired consequence (Everitt & Robbins, 2005; Wood & Neal, 2007). A habitual smoker who knows that smoking is harmful will continue to smoke because S–R mechanisms compel lighting a cigarette independent of the conse- quences of the response. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Modern Approaches to the Study of Instrumental Conditioning 125 BOX 5.1 E. L. Thorndike: Biographical Sketch Edward Lee Thorndike was born in the attic of psychologist James psychology in the Teachers College in1874 and died in 1949. As an Cattell. for many years. Among other things, undergraduate at Wesleyan Univer- Thorndike received his Ph.D. he worked to apply to children the sity, he became interested in the from Columbia in 1898, for his work principles of trial-and-error learning work of William James, who was entitled Animal Intelligence: An he had uncovered with animals. He then at Harvard. Thorndike himself Experimental Analysis of Associative also became interested in psycholog- entered Harvard as a graduate stu- Processes in Animals. This included ical testing and became a leader in dent in 1895. During his stay, he the famous puzzle-box experiments. that newly formed field. By his began his research on instrumental Thorndike’s dissertation has turned retirement, he had written 507 behavior, at first using chicks. out to be one of the most famous scholarly works (without a computer Because there was no laboratory dissertations in more than a century or word processor), including about space in the psychology department of modern psychology. After obtain- 50 books (Cumming, 1999). Several at the university, he set up his ing his Ph.D., Thorndike spent a years before his death, Thorndike project in William James’s cellar. short stint at Western Reserve returned to Harvard as the William Soon after that, he was offered a University in Cleveland and then James Lecturer, a fitting honor con- fellowship at Columbia University. returned to Columbia, where he sidering the origins of his interests in This time, his laboratory was located served as professor of educational psychology. Modern Approaches to the Study of Instrumental Conditioning Thorndike used 15 different puzzle boxes in his investigations. Each box required differ- ent manipulations for the cat to get out. As more scientists became involved in studying instrumental learning, the range of tasks they used became smaller. A few of these became “standard” and have been used repeatedly to facilitate comparison of results obtained in different experiments and laboratories. Discrete-Trial Procedures Discrete-trial procedures are similar to the method Thorndike used in that each train- ing trial begins with putting the animal in the apparatus and ends with removal of the animal after the instrumental response has been performed. Discrete-trial procedures these days usually involve the use of some type of maze. The use of mazes in investiga- tions of learning was introduced at the turn of the twentieth century by the American psychologist W. S. Small (1899, 1900). Small was interested in studying rats and was encouraged to use a maze by an article he read in Scientific American describing the complex system of underground burrows that kangaroo rats build in their natural habi- tat. Small reasoned that a maze would take advantage of the rats’ “propensity for small winding passages.” Figure 5.3 shows two mazes frequently used in contemporary research. The runway, or straight-alley, maze contains a start box at one end and a goal box at the other. The rat is placed in the start box at the beginning of each trial. The barrier separating the start box from the main section of the runway is then raised. The rat is allowed to make its way down the runway until it reaches the goal box, which usually contains a reinforcer, such as food or water. Behavior in a runway can be quantified by measuring how fast the animal gets from the start box to the goal box. This is called the running speed. The running speed Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 126 Chapter 5: Instrumental Conditioning: Foundations FIGURE 5.3 Top view of a runway and a T- maze. S is the start box; G G is the goal box. G G Removable barrier © Cengage Learning S S typically increases with repeated training trials. Another common measure of behavior in runways is response latency. The latency is the time it takes the animal to leave the start box and begin running down the alley. Typically, latencies become shorter as training progresses. Another maze that has been used in many experiments is the T maze, shown on the right in Figure 5.3. The T maze consists of a start box and alleys arranged in the shape of a T. A goal box is located at the end of each arm of the T. Because the T maze has two choice arms, it can be used to study more complex questions. For example, Panagiotaropoulos and colleagues (2009) were interested in whether rats that are less than two weeks old (and still nursing) could learn where their mother is located in contrast with another female. To answer this question, they placed the mother rat in the goal box on the right arm of a T maze and a virgin female rat in the goal box on the left arm of the T. The rat pups learned to turn to the right rather than the left arm of the maze with suc- cessive trials. Furthermore, this conditioned preference persisted when the pups were tested at the end of training without a female in either goal box. The results show that nursing rat pups can distinguish their mother from a virgin female and can learn to go where their mother is located. Free-Operant Procedures In a runway or a T maze, after reaching the goal box, the animal is removed from the apparatus for a while before being returned to the start box for its next trial. Thus, the animal has limited opportunities to respond, and those opportunities are scheduled by the experimenter. By contrast, free-operant procedures allow the animal to repeat the instrumental response without constraint over and over again without being taken out of the apparatus until the end of an experimental session. The free-operant method was invented by B. F. Skinner (1938) to study behavior in a more continuous manner than is possible with mazes. Skinner (Figure 5.4) was interested in analyzing in the laboratory a form of behav- ior that would be representative of all naturally occurring ongoing activity. However, he Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Modern Approaches to the Study of Instrumental Conditioning 127 FIGURE 5.4 B. F. Skinner (1904–1990) Bettmann/CORBIS recognized that before behavior can be experimentally analyzed, a measurable unit of behavior must be defined. Casual observation suggests that ongoing behavior is contin- uous; one activity leads to another. Behavior does not fall neatly into units, as do mole- cules of a chemical solution or bricks on a sidewalk. Skinner proposed the concept of the operant as a way of dividing behavior into meaningful measurable units. Figure 5.5 shows a typical Skinner box used to study free-operant behavior in rats. (A Skinner box used to study pecking in pigeons was presented in Figure 1.8). The box is a small chamber that contains a lever that the rat can push down repeatedly. The FIGURE 5.5 A Skin- ner box equipped with a response lever and food-delivery device. Electronic equipment is used to program procedures and record responses automatically. Photo Researchers, Inc. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 128 Chapter 5: Instrumental Conditioning: Foundations chamber also has a mechanism that can deliver a reinforcer, such as food or water, into a cup. The lever is electronically connected to the food-delivery system so that when the rat presses the lever, a pellet of food automatically falls into the food cup. An operant response, such as the lever press, is defined in terms of the effect that the behavior has on the environment. Activities that have the same environmental effect are considered to be instances of the same operant response. Behavior is not defined in terms of particular muscle movements but in terms of how the behavior operates on the environment. The lever-press operant is typically defined as sufficient depression of the lever to activate a recording sensor. The rat may press the lever with its right paw, its left paw, or its tail. These different muscle movements constitute the same operant if they all depress the lever sufficiently to trigger the sensor and produce a food pellet. Various ways of pressing the lever are assumed to be functionally equivalent because they all have the same effect on the environment. We perform numerous operants during the course of our daily lives. In opening a door, it does not matter whether we use our right hand or left hand to turn the door knob. The operational outcome (opening the door) is the critical measure of success. Similarly, in basketball or baseball, it’s the operational outcome that counts—getting the ball in the basket or hitting the ball into the outfield—rather than the way the task is accomplished. With an operational definition of behavioral success, one does not need a sophisticated judge to determine whether the behavior has been successfully accom- plished. The environmental outcome keeps the score. This contrasts with behaviors such as figure skating or gymnastics. In those cases, the way something is performed is just as important as is the environmental impact of the behavior. Getting a ball into the basket is an operant behavior. Performing a graceful dismount from the parallel bars is not. However, any response that is required to produce a desired consequence is an instrumental response because it is “instrumental” in producing a particular outcome. Magazine Training and Shaping When children first attempt to toss a basketball in the basket, they are not very successful. Many attempts end with the ball bouncing off the backboard or not even landing near the basket. Similarly, a rat placed in a Skinner box will not press the lever that produces a pellet of food right away. Successful training of an operant or instrumental response often requires carefully designed training steps that move the student from the status of a novice to that of an expert. This is clearly the case with something like championship figure skating that requires hours of daily practice under the careful supervision of an expert coach. Most parents do not spend money hiring an expert coach to teach a child basketball. However, even there, the child moves through a series of training steps that may start with a small ball and a Fisher Price basketball set that is much lower than the standard one and is easier to reach. The training basket is also adjustable so that it can be gradually raised as the child becomes more proficient. There are also preliminary steps for establishing lever-press responding in a labora- tory rat. First, the rat has to learn when food is available in the food cup. This involves classical conditioning: The sound of the food-delivery device is repeatedly paired with the release of a food pellet into the cup. The food-delivery device is called the food magazine. After enough pairings of the sound of the food magazine with food delivery, the sound elicits a classically conditioned approach response: The animal goes to the food cup and picks up the pellet. This preliminary phase of conditioning is called magazine training. After magazine training, the rat is ready to learn the required operant response. At this point, food is given if the rat does anything remotely related to pressing the lever. For example, at first the rat may be given a food pellet each time it gets up on its hind legs anywhere in the experimental chamber. Once the rearing response has Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Modern Approaches to the Study of Instrumental Conditioning 129 been established, the food pellet may be given only if the rat makes the rearing response over the response lever. Rearing in other parts of the chamber would no longer be reinforced. Once rearing over the lever has been established, the food pellet may be given only if the rat touches and depresses the lever. Such a sequence of training steps is called response shaping. As the preceding examples show, the shaping of a new operant response requires training components or approximations to the final behavior. Whether you are trying to teach a child to throw a ball into a basket, or a rat to press a response lever, at first any response that remotely approximates the final performance can be reinforced. Once the child becomes proficient at throwing the ball into a basket placed at shoulder height, the height of the basket can be gradually raised. As the shaping process continues, more and more is required, until the reinforcer is given only if the final target response is made. Successful shaping of behavior involves three components. First, you have to clearly define the final response you want the trainee to perform. Second, you have to clearly assess the starting level of performance, no matter how far it is from the final response you are interested in. Third, you have to divide the progression from the starting point to the final target behavior into appropriate training steps or successive approximations. The successive approximations make up your training plan. The execution of the train- ing plan involves two complementary tactics: reinforcement of successive approximations to the final behavior and withholding reinforcement for earlier response forms. Although the principles involved in shaping behavior are not difficult to understand, their application can be tricky. If the shaping steps are too far apart, or you spend too much time on one particular shaping step, progress may not be satisfactory. Sports coa- ches, piano teachers, and driver education instructors are all aware of how tricky it can be to design the most effective training steps or successive approximations. The same principles of shaping are involved in training a child to put on his or her socks or to drink from a cup without spilling, but the training in those cases is less formally orga- nized. (For a study of shaping drug-abstinence behavior in cocaine users, see Preston, Umbricht, Wong, & Epstein, 2001.) Shaping and New Behavior Shaping procedures are often used to generate new behavior, but exactly how new are those responses? Consider, for example, a rat’s lever- press response. To press the bar, the rat has to approach the bar, stop in front of it, raise its front paws, and then bring the paws down on the bar with sufficient force to push it down. All of these response components are things the rat is likely to have done at one time or another in other situations (while exploring its cage, interacting with another rat, or handling materials to build a nest). In teaching the rat to press the bar, we are not teaching new response components. Rather, we are teaching the rat how to combine familiar responses into a new activity. Instrumental conditioning often involves the con- struction, or synthesis, of a new behavioral unit from preexisting response components that already occur in the organism’s repertoire (Balsam et al., 1998). Instrumental conditioning can also be used to produce responses unlike anything the trainee ever did before. Consider, for example, throwing a football 60 yards down the field. It takes more than putting familiar behavioral components together to achieve such a feat. The force, speed, and coordination involved in throwing a football 60 yards is unlike any- thing an untrained individual might do. It is an entirely new response. Expert perfor- mances in sports, in playing a musical instrument, or in ballet all involve such novel response forms. Such novel responses are also created by shaping (Figure 5.6). The creation of new responses by shaping depends on the inherent variability of behavior. If a new shaping step requires a trainee to throw a football 30 yards, each Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 130 Chapter 5: Instrumental Conditioning: Foundations FIGURE 5.6 Shaping is required to learn special skills, such as the pole vault. Ryan Remiorz/AP Photo throw is likely to be somewhat different. The trainee may throw the ball 25, 32, 29, or 34 yards on successive attempts. This variability permits the coach to set the next succes- sive approximation at 33 yards. With that new target, the trainee will start to make longer throws. Each throw will again be different, but more of the throws will now be 33 yards and longer. The shift of the distribution to longer throws will permit the coach to again raise the response criterion, perhaps to 36 yards this time. With gradual iterations of this process, the trainee will make longer and longer throws, achieving dis- tances that he or she would have never performed otherwise. The shaping process takes advantage of the variability of behavior to gradually move the distribution of responses away from the trainee’s starting point and toward responses that are entirely new in the trainee’s repertoire. Through this process, spectacular new feats of performance are learned in sports, dancing, or the visual arts. (For laboratory studies of shaping, see Deich, Allan, & Zeigler, 1988; and Stokes, Mechner, & Balsam, 1999.) Response Rate as a Measure of Operant Behavior In contrast to discrete-trial techniques for studying instrumental behavior, free-operant methods permit continuous observation of behavior over long periods. With continuous opportunity to respond, the organism, rather than the experimenter, determines the frequency of its instrumental response. Hence, free-operant techniques provide a special opportunity to observe changes in the likelihood of behavior over time. How might we take advantage of this opportunity and measure the probability of an operant response? Measures of response latency and speed that are commonly used in discrete-trial procedures do not characterize the likelihood of repetitions of a response. Skinner proposed that the rate of occurrence of operant behavior (e.g., frequency of the response per minute) be used as a measure of response probability. Highly likely responses occur often and have a high rate. In contrast, unlikely responses occur seldom and have a low rate. Response rate has become the primary measure in studies that employ free-operant procedures. Instrumental Conditioning Procedures In all instrumental conditioning situations, the participant makes a response and thereby produces an outcome or consequence. Paying the boy next door for mowing the lawn, yelling at a cat for getting on the kitchen counter, closing a window to prevent the rain Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Instrumental Conditioning Procedures 131 from coming in, and revoking a teenager’s driving privileges for staying out late are all forms of instrumental conditioning. Two of these examples involve pleasant events (get- ting paid, driving a car), whereas the other two involve unpleasant stimuli (the sound of yelling and rain coming in the window). A pleasant event is technically called an appeti- tive stimulus. An unpleasant stimulus is technically called an aversive stimulus. The instrumental response may produce the stimulus, as when mowing the lawn results in getting paid. Alternatively, the instrumental response may turn off or eliminate a stimu- lus, as in closing a window to stop the incoming rain. Whether the result of a condition- ing procedure is an increase or a decrease in the rate of responding depends on whether an appetitive or aversive stimulus is involved and whether the response produces or eliminates the stimulus. Four basic instrumental conditioning procedures are described in Table 5.1. Positive Reinforcement A father gives his daughter a cookie when she puts her toys away; a teacher praises a student for handing in a good report; an employee receives a bonus check for performing well on the job. These are all examples of positive reinforcement. Positive reinforcement is a procedure in which the instrumental response produces an appetitive stimulus. If the response occurs, the appetitive stimulus is presented; if the response does not occur, the appetitive stimulus is not presented. Thus, there is a positive contingency between the instrumental response and the appetitive stimulus. Positive reinforcement procedures produce an increase in the rate of responding. Requiring a hungry rat to press a response lever to obtain a food pellet is a common laboratory example of positive reinforcement. Punishment A mother reprimands her child for running into the street; your boss criticizes you for being late to a meeting; a teacher gives you a failing grade for answering too many test questions incorrectly. These are examples of punishment (also called positive punish- ment). In a punishment procedure, the instrumental response produces an unpleasant, or aversive, stimulus. There is a positive contingency between the instrumental response and the stimulus outcome (the response produces the outcome), but the outcome is aver- sive. Effective punishment procedures produce a decrease in the rate of instrumental responding. TABLE 5.1 TYPES OF INSTRUMENTAL CONDITIONING PROCEDURES RESPONSE-OUTCOME NAME OF PROCEDURE CONTINGENCY RESULT OF PROCEDURE Positive Reinforcement Positive: Response produces an Reinforcement or increase in appetitive stimulus response rate Punishment (Positive Punishment) Positive: Response produces an Punishment or decrease in response aversive stimulus rate Negative Reinforcement Negative: Response eliminates or Reinforcement or increase in (Escape or Avoidance) prevents the occurrence of an response rate © Cengage Learning aversive stimulus Omission Training (DRO) or Negative: Response eliminates or Punishment or decrease in response Negative Punishment prevents the occurrence of an rate appetitive stimulus Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 132 Chapter 5: Instrumental Conditioning: Foundations Negative Reinforcement Opening an umbrella to stop the rain from getting you wet, putting on a seatbelt to silence the chimes in your car, and putting on your sunglasses to shield you from bright sunlight are examples of negative reinforcement. In all of these cases, the instrumental response turns off an aversive stimulus. Hence, there is a negative contingency between the instrumental response and the aversive stimulus. Negative reinforcement procedures increase instrumental responding. You are more likely to open an umbrella because it stops you from getting wet when it is raining. People tend to confuse negative reinforcement and punishment. An aversive stimu- lus is used in both procedures. However, the relation of the instrumental response to the aversive stimulus is drastically different. In punishment procedures, the instrumental response produces the aversive event, whereas in negative reinforcement, the response terminates the aversive event. This difference in the response–outcome contingency pro- duces very different results. Instrumental behavior is decreased by punishment and increased by negative reinforcement. Omission Training or Negative Punishment In omission training or negative punishment, the instrumental response results in the removal of a pleasant or appetitive stimulus (Sanabria, Sitomer, & Killeen, 2006). Omis- sion training is being used when a child is given a time-out (e.g., Donaldson & Vollmer, 2011) or told to go to his or her room after doing something bad. There is nothing aver- sive about the child’s room. Rather, by sending the child to the room, the parent is with- drawing sources of positive reinforcement, such as playing with friends or watching television. Suspending someone’s driver’s license for drunken driving also constitutes omission training or negative punishment (withdrawal of the pleasure and privilege of driving). Omission training or negative punishment involves a negative contingency between the response and an environmental event (hence the term “negative”) and results in a decrease in instrumental responding (hence the term “punishment”). Nega- tive punishment is often preferred over positive punishment as a method of discouraging human behavior because it does not involve delivering an aversive stimulus. BOX 5.2 DRO as Treatment for Self-Injurious Behavior and Other Behavior Problems Self-injurious behavior is a prob- self-injurious behavior, an omission The results of the study are pre- lematic habit that is evident in some training procedure, or DRO, was put sented in Figure 5.7. During the first individuals with developmental dis- into place (Lindberg, Iwata, Kahng, & 19 sessions, when Bridget received abilities. Bridget was a 50-year-old DeLeon, 1999). The training proce- attention for her self-injurious woman with profound mental retar- dure was implemented in 15-minute behavior, the rate of head banging dation whose self-injurious behavior sessions. During the omission train- fluctuated around six responses per was hitting her body and head and ing phase, Bridget was ignored when minute. The first phase of DRO banging her head against furniture, she banged her head against a hard training (sessions 20–24) resulted in walls, and floors. Preliminary surface but received attention peri- a rapid decline in head banging. The assessments indicated that her head odically if she was not head banging. self-injurious behavior returned dur- banging was maintained by the The attention consisted of the ing sessions 25–31, when the baseline attention she received from others therapist talking to Bridget for 3–5 condition was reintroduced. DRO when she banged her head against seconds and occasionally stroking training was resumed in session 32 a hard surface. To discourage the her arm or back. and remained in effect for the Continued Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Instrumental Conditioning Procedures 133 BOX 5.2 (continued) FIGURE 5.7 Rate of 14 Bridget’s self-injurious behavior during baseline sessions (1–19 and 12 25–31) and during sessions in which a DRO contingency was in effect 10 (20–24 and 32–72) (based on Lindberg et al., Responses per minute (SIB) 1999). 8 6 4 2 0 10 20 30 40 50 60 70 Sessions remainder of the study. These results is difficult to ignore, but paying baboon to flip by carping, nor an show that self-injurious behavior was attention to the child will serve to elephant to paint by pointing out decreased by the DRO procedure. encourage the misbehavior. As with everything the elephant did wrong.… The study with Bridget illustrates Bridget, the best approach is to ignore Progressive animal trainers reward the several general principles. One is that the disruptive behavior and pay behavior they want and, equally attention is a very powerful reinforcer attention when the child is doing importantly, ignore the behavior they for human behavior. People do all something else. Deliberately reinfor- don’t” (p. 59). In her engaging book, sorts of things for attention. As with cing other behavior is not easy and What Shamu Taught Me About Life, Bridget, even responses that are requires conscious effort on the part Love, and Marriage, Amy Sutherland injurious to the individual can of the parent. argues that one can profitably use the develop if these responses result in No one questions the need for such same principles to achieve better results attention. Unfortunately, some conscious effort in training complex with one’s spouse by not nagging them responses are difficult to ignore, but responses in animals. As Amy about leaving their dirty socks on the in attending to them, one may be Sutherland (2008) pointed out, animal floor but by providing attention and actually encouraging them. A child “trainers did not get a sea lion to salute social reinforcement for responses misbehaving in a store or restaurant by nagging. Nor did they teach a other than the offending habits. Omission-training procedures are also called differential reinforcement of other behavior (DRO). This term highlights the fact that in omission training, the individual periodically receives the appetitive stimulus provided he or she is engaged in behavior other than the response specified by the procedure. Making the target response results in omission of the reinforcer that would have been delivered had the individual performed some other behavior. Thus, omission training involves the reinforcement of other behavior. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 134 Chapter 5: Instrumental Conditioning: Foundations Fundamental Elements of Instrumental Conditioning As we will see in the following chapters, analysis of instrumental conditioning involves numerous factors and variables. However, the essence of instrumental behavior is that it is controlled by its consequences. Thus, instrumental conditioning fundamentally involves three elements: the instrumental response, the outcome of the response (the reinforcer), and the relation or contingency between the response and the outcome. In the remainder of this chapter, I will describe how each of these elements influences the course of instrumental conditioning. The Instrumental Response The outcome of instrumental conditioning procedures depends in part on the nature of the response being conditioned. Some responses are more easily modified than others. In Chapter 10, I will describe how the nature of the response influences the outcome of negative reinforcement (avoidance) and punishment procedures. This section describes how the nature of the response determines the results of positive reinforcement procedures. Behavioral Variability Versus Stereotypy Thorndike described instrumental behavior as involving the stamping in of an S–R association, while Skinner wrote about behavior being strengthened or reinforced. Both of these pioneers emphasized that reinforcement increases the likelihood that the instrumental response will be repeated in the future. This emphasis encouraged the belief that instrumental condi- tioning produces repetitions of the same response—that it produces uniformity or ste- reotypy in behavior. Stereotypy in responding does develop if that is allowed or required by the instrumental conditioning procedure (e.g., Schwartz, 1988). However, that does not mean that instrumental conditioning cannot be used to produce creative or variable responses. We are accustomed to thinking about the requirement for reinforcement being an observable action, such as pressing a lever or hitting a baseball. Interestingly, however, the criteria for reinforcement can also be defined in terms of more abstract dimensions of behavior, such as its novelty. The behavior required for reinforcement can be defined as doing something unlike what the participant did on the preceding four or five trials. To satisfy this requirement, the participant has to perform differently on each trial. In such a procedure, response variability is the basis for instrumental reinforcement. Numerous experiments with laboratory rats, pigeons, and human participants have shown that response variability increases if variability is the response dimension required to earn reinforcement (Neuringer, 2004; Neuringer & Jensen, 2010). In one study, college students were asked to draw rectangles on a computer screen (Ross & Neuringer, 2002). They were told they had to draw rectangles to obtain points but were not told what kind of rectangles they should draw. For one group of participants, a point was dispensed if the rectangle drawn on a given trial differed from other rectangles the student previously Courtesy of A. Neuringer drew. The new rectangle had to be novel in size, shape, and location on the screen. This group was designated VAR for the variability requirement. Students in another group were paired up or yoked to students in group VAR and received a point on each trial that their partners in group VAR were reinforced. However, the YOKED participants A. Neuringer had no requirements about the size, shape, or location of their rectangles. The results of the experiment are shown in Figure 5.8. Students in group VAR showed considerably greater variability in the rectangles they drew than participants Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Fundamental Elements of Instrumental Conditioning 135 FIGURE 5.8 Degree Vary Yoked response variability 0.98 along three dimensions of drawing a rectangle 0.95 (size, shape, and 0.93 location) for human participants who were 0.90 reinforced for varying 0.88 U-value the type of rectangles 0.85 they drew (VARY) or received reinforcement 0.82 on the same trials but 0.80 without any requirement to vary the nature of 0.77 their drawings 0.75 (YOKED). Higher values of U indicate greater Size Shape Location variability in responding Response dimension (based on Ross & Neuringer, 2002). in group YOKED. This shows response variability can be increased if the instrumental reinforcement procedure requires variable behavior. Another experiment by Ross and Neuringer (2002) demonstrated that different aspects of drawing a rectangle (the size, shape, and location of the rectangle) can be controlled independently of one another by contingencies of reinforcement. For example, participants who are required to draw rec- tangles of the same size will learn to do that but will vary the location and shape of the rectangles they draw. In contrast, participants required to draw the same shape rectangle will learn to do that while they vary the size and location of their rectangles. These experiments show that response variability can be increased with instrumental conditioning. Such experiments have also shown that in the absence of explicit reinforce- ment of variability, responding becomes more stereotyped with continued instrumental conditioning (e.g., Page & Neuringer, 1985). Thus, Thorndike and Skinner were partially correct in saying that responding becomes more stereotyped with continued instrumen- tal conditioning. However, they were wrong to suggest that this is an inevitable outcome. BOX 5.3 Detrimental Effects of Reward: More Myth Than Reality Reinforcement procedures have these questions has produced incon- (e.g., Eisenberger & Shanock, 2003). become commonplace in educational sistent results. However, more recent As in experiments with pigeons and settings as a way to encourage students metaanalyses of the results of laboratory rats, reinforcement can to read and do their assignments. numerous studies indicate that under increase or decrease response vari- However, some have been concerned most circumstances reinforcement ability, depending on the criterion that reinforcement may actually increases creative responses without for reinforcement. If highly original undermine a child’s intrinsic interest reducing intrinsic motivation (Akin- responding is required to obtain and willingness to perform a task once Little et al., 2004; Byron & Khazanchi, reinforcement, originality increases, the reinforcement procedure is 2012; Cameron, Banko, & Pierce, provided that the reinforcer is not removed. Similar concerns have been 2001). Research with children also so salient as to distract the participant expressed about possible detrimental indicates that reinforcement makes from the task. (For a more effects of reinforcement on creativity children respond with less originality general discussion of creativity, and originality. Extensive research on only under limited circumstances see Stokes, 2006.) Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 136 Chapter 5: Instrumental Conditioning: Foundations Novel response forms can be readily produced by instrumental conditioning if response variation is a requirement for reinforcement. Relevance or Belongingness in Instrumental Conditioning As the preceding section showed, instrumental conditioning can act on abstract dimensions of behav- ior, such as its variability. How far do these principles extend? Are there any limita- tions on the types of new behavioral units or response dimensions that may be modified by instrumental conditioning? A growing body of evidence indicates that there are important limitations. In Chapter 4, I described how classical conditioning occurs at different rates depending on the combination of conditioned and unconditioned stimuli used. For example, rats readily learn to associate tastes with sickness, but associations between tastes and shock are not so easily learned. For conditioning to occur rapidly, the CS has to belong with the US or be relevant to the US. Analogous belongingness and relevance relations occur in instrumental conditioning. As Jozefowiez and Staddon (2008) com- mented, “A behavior cannot be reinforced by a reinforcer if it is not naturally linked to that reinforcer in the repertoire of the animal” (p. 78). This type of natural linkage was first observed by Thorndike. In many of his puzzle- box experiments, the cat had to manipulate a latch or string to escape from the box. However, Thorndike also tried to get cats to scratch or yawn to be let out of a puzzle box. The cats could learn to make these responses. However, the form of the responses changed as training proceeded. At first, the cat would scratch itself vigorously to be let out of the box. On later trials, it would only make aborted scratching movements. It might put its hind leg to its body but would not make a true scratch response. Similar results were obtained in attempts to condition yawning. As training progressed, the ani- mal would open its mouth, but it would not give a bona fide yawn. Thorndike used the term belongingness to explain his failures to train scratching and yawning as instrumental responses. According to this concept, certain responses nat- urally belong with the reinforcer because of the animal’s evolutionary history. Operating a latch and pulling a string are manipulatory responses that are naturally related to release from confinement. By contrast, scratching and yawning characteristically do not help animals escape from confinement and therefore do not belong with release from a puzzle box. The concept of belongingness in instrumental conditioning is nicely illustrated by a more recent study involving a small fish species, the three-spined stickleback (Gasterosteus aculeatus). During the mating season each spring, male sticklebacks establish territories in which they court females but chase away and fight other males. Sevenster (1973) used the presentation of another male or a female as a reinforcer in instrumental con- ditioning of male sticklebacks. One group of fish was required to bite a rod to obtain access to the reinforcer. When the reinforcer was another male, biting behavior increased; access to another male was an effective reinforcer for the biting response. By contrast, biting did not increase when it was reinforced with the presentation of a female fish. However, the presentation of a female was an effective reinforcer for other

Instrumental Conditioning: Foundations Chapter 5 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue