L2 Operant Conditioning 2023 Lecture Notes PDF

Operant Conditioning Mind, Brain, & Behaviour 1 PSYC10003 Learning & Cognition Week 2, Lecture 2 Associate Professor Meredith McKague [email protected] Last lecture we talked about Classical Conditioning. This lecture we focus on another form of conditioning, or associative learning, called operant conditioning. We focus on the work of B.F Skinner - an influential behaviourist whose work during the 50s, 60s and 70s gave us our understanding of the ways in which our environments shape our voluntary behaviours. Although Classical Conditioning is a very important and powerful form of learning, the Behaviourists recognised that they also needed explain how stimuli in the environment shape voluntary behaviours through learned associations. The shaping of voluntary behaviours by association with their consequences is the focus of operant conditioning. The title slide image is a painting representing B.F. Skinner (1904-1990) about to load a trained kamikaze pigeon into the nose-cone of a missile. (B.F. Skinner With Project Pigeon, 1986, Anton van Dalen). During WW2, the US Navy required a weapon effective against the German Bismarck class battleships. Although missile and TV technology existed, the size of the primitive guidance systems available rendered any weapon ineffective. Project Pigeon was potentially an extremely simple and effective solution, but despite an effective demonstration it was abandoned when more conventional solutions became available. The project involved dividing the nose cone of a missile into three compartments, and encasing a pigeon in each. Each compartment used a lens to project an image of what was in front of the missile onto a screen. The 0 pigeons would peck toward the object, thereby directing the missile. Skinner complained "our problem was no one would take us seriously." The point is perhaps best explained in terms of human psychology (i.e., few people would trust a pigeon to guide a missile no matter how reliable it proved). I have provided Skinner’s account of Project Pigeon in the reading folder. 0 Learning Outcomes for Operant Conditioning • Understand the processes of reinforcement and punishment, including: • Positive and negative reinforcement • The effects of continuous and partial reinforcement schedules • Extinction of reinforcement • Shaping of complex behaviours • Positive and negative punishment • Factors that increase the effectiveness of punishment (the three C’s) • Drawbacks of, and alternatives to, punishment • The role of antecedent and discriminant stimuli in controlling operantly conditioned behaviours. 1 This slide outlines the learning outcomes for today’s lecture. Use this slide to help guide your study towards key concepts, theories and studies that are discussed. 1 “Behaviour operates on the environment to generate consequences.” B.F. Skinner (1904-1990) 2 This is a photo of Burrhus Frederic Skinner who took the School of Behaviorism beyond classical conditioning by investigating the processes by which voluntary behaviours are shaped by their consequences. The quotation sums up Skinner's approach and hints at the reasons for why the approach is called operant conditioning. In Skinner’s view, we learn by producing behaviors as ‘operants’ that generate consequences to inform our future behaviors. We are more likely to reproduce behaviours that have been rewarded and suppress behaviors that have been punished. 2 Operant Conditioning • Behaviour is shaped by the learner’s history of experiencing rewards and punishments for their actions. 3 According to Skinner – our behaviours are shaped by our history of experiencing rewards and punishments as consequences. 3 Studying operant conditioning: The Skinner Box • Skinner developed the Skinner Box as a ‘microworld’ in which he could could control the animal’s experience of reinforcement and punishment. • Pressing the lever was the target behavior, which could be strengthened through reinforcement and weakened through punishment. 4 On this slide, we have a diagram of Skinner’s primary experimental apparatus – the Skinner Box or Operant Chamber. During the 50’s 60’s and 70’s, Psychology departments in universities across the world were filled with Skinner boxes. You can see one from the University of Melbourne on display in the Melbourne Museum Mind Exhibition. The Skinner Box provided Skinner with a controlled environment in which to study the behavior and learning of laboratory animals, such as rats and pigeons. The box was wired up to computers so that the animals’ behaviours, and the stimuli produced in the box, were all recorded. The box is equipped for providing consequences for behaviors. For example, the rat might receive a food pellet each time it presses the lever. In this case, we would say that the lever-pressing has been positively reinforced by receiving a rewarding consequence. Or it might be that the consequence of lever pressing is to terminate an unpleasant stimulus, such as an electric shock that is produced under the rat’s feet via the electric grid. In this case, we would say that the behaviour has been negatively reinforced by removing an unpleasant stimulus as a consequence. You’ll notice that the box is also equipped for providing visual and auditory stimuli that can become associated with particular behaviour-contingent consequences. For example, it might be the case that a flash of the red light reliably signals the presentation of a food pellet from the food dispenser, if the rat presses the lever when it sees the red light. Or it could be that a tone from the speaker signals the likelihood of an electric shock. The animal can learn to avoid the electric shock if it presses the lever. 4 • A behavior is reinforced (strengthened) whenever a desirable outcome is the consequence. Reinforcement • Behaviours that are reinforced are more likely to be repeated. • A reinforcer is any consequence of a behaviour that makes that behaviour more likely to recur in future. • Reinforcers can be either positive (+) or negative (-). 5 Skinner was particularly interested in the effects of rewarding consequences as “reinforcing”, or strengthening, behaviours. That is, behaviours that are followed by desirable outcomes are reinforced, making them more likely to be repeated in the future. In the language of OC – a reinforcer is any consequence of a behaviour that makes that behaviour more likely to recur in future. Reinforcers can be either positive or negative. 5 Positive and negative reinforcement • Positive Reinforcement • An animal will learn to produce a behaviour if the consequence of doing so is receiving something pleasant. • Negative Reinforcement • An animal will learn to produce a behavior if the consequence is that something unpleasant will stop. • Positive reinforcer • something pleasant that is added to increase behavior • Negative reinforcer • something unpleasant that is removed to increase behaviour 6 6 Continuous vs. Partial Reinforcement • Continuous reinforcement rarely occurs in natural environments • Behaviour is usually reinforced on a partial “schedule”. • Partial reinforcement leads to more persistent learning because the learner is used to the fact that reinforcement occurs on some occasions and not others. • Continuous reinforcement leads to rapid extinction once the reinforcer is withheld. • Why is gambling so addictive? • See Skinner’s explanation here 7 Importantly, outside of the Skinner box, behaviours are rarely able to be reinforced continuously – that is for every occurrence. So, it is important to understand how different patterns of partial reinforcement influence behaviour. Skinner talked about Schedules of Reinforcement and was interested in how different frequencies of reinforcement, either in terms of time (intervals) or in terms of number of behaviours (ratios), resulted in different, but highly predictable, patterns of behaviour. Pages 229231 of the Burton et al text cover the four different schedules of reinforcement described by Skinner, and the patterns of behaviour that they produce. The ratio or interval by which reinforcers are given can be either fixed or variable. The form I will highlight here is what Skinner referred to as a variable-ratio schedule. The VR schedule occurs when a behaviour is rewarded unpredictably after varying numbers of performances ( for example, an average once every ~10 times the behaviour is produced - sometimes after 5 times, 15, times, 9 times, etc). The VR schedule is what underlies the way rewards are distributed In gambling. The player knows that there are rewards for their behaviour, but they don’t know how many times they will need to ‘roll the dice’ to receive a reward. This keeps the player playing ‘just one more time’, testing whether their next attempt will bring the reward. 7 Extinction of a reinforced behaviour • Extinction of an operantly conditioned behaviour occurs when reinforcement is withheld. • Not immediate - sometimes there is a brief increase in responding referred to as an extinction burst followed by decrease in trained behaviour. • Responses that were reinforced partially will be harder to extinguish than those reinforced continuously 8 8 Shaping behaviour: Pigeons playing pingpong Shaping reinforces successive approximations to the desired behaviour (reinforcing small steps). • Start by reinforcing a high frequency component of the desired response. • Then drop this reinforcement – behaviour becomes more variable again. • Await a response that is still closer to the desired response – then reintroduce the reinforcer. • Keep cycling through as closer and closer approximations to the desired behaviour are achieved. • Enables the molding of a response that is not normally part of an animal’s repertoire, • See shaping in action in Dr Chris Groot’s visit behind the scenes of the seal exhibit at the Melbourne Zoo 9 Pigeons playing ping pong video with Skinner https://www.youtube.com/watch?v=vGazyH6fQQ4 How can we teach more complex behaviours than lever pressing? Skinner described a process he called Shaping. This involves using reinforcement to reward small steps towards a desired response. Consider the example of teaching a child the sequence of behaviors that need to be coordinated to brush their teeth, or how you might train a complex behaviour in an animal. This slides walks you through the sequence of steps it takes to reinforce ‘successive approximations” towards a desired complex behaviour. Chapter 6 on learning from the Burton text covers shaping over pages 230231 9 Punishment • A behavior is punished (weakened) whenever the learner experiences an undesirable consequence for that behaviour. • Behaviours that are followed by punishment are less likely to be repeated. • A punisher is any consequence of a behavior that makes that behaviour less likely to recur in future • Punishers can also be either positive (+) or negative (-). 10 Whereas reinforcing consequences serve to promote the production of behaviours, punishing consequences lead to the suppression of behaviours. Skinner also investigated the effects of unpleasant consequences on behavior in his study of punishment. He found that there are many drawbacks to using punishment to change behaviour, and preferred to work with reinforcement. We will discuss the reasons for this shortly. 10 • Positive Punishment • An animal stop producing a behaviour if the consequence is the presentation of an unpleasant stimulus. • Negative Punishment (response cost) • An animal will stop producing a behavior if the consequence is that something desirable is taken away. • Positive punisher • An unpleasant stimulus that weakens behaviour when added as consequence of the behaviour • Negative punisher • A pleasant stimulus that weakens behaviour when removed as a consequence of the behaviour 11 Lever pressing will be suppressed by experiencing an unpleasant stimulus as a consequence, or by experiencing the loss of something desirable as a consequence. These consequences are referred to as positive and negative punishment, respectively. Negative punishment is also referred to as ‘response cost’. 11 When is punishment effective? The three Cs Contingency – the relationship between the behavior and the punisher must be clear Contiguity – the punisher must follow the behavior swiftly Consistency – the punisher needs occur for every occurrence of the behaviour 12 Sometimes positive punishment can be used to draw attention to and stop a dangerous behaviour. For example, yelling and grabbing a child to pull them back from a busy road (giving them a shock) is effective for suppressing the behavior in the short-term and may have a longer-term impact in making the child more attentive when approaching busy roads. However, in many situations, punishment is not effective for promoting long-term behaviour change. This is because it is difficult to administer punishment effectively given how sensitive it is to what we will call the ‘three Cs”. Why must punishment be consistent? Because, assuming the undesirable behavior has a desirable consequence, partial punishment actually produces a partial reinforcement schedule. Why do people do undesirable things? Because they are rewarding (e.g., I am tempted to speed when driving because the payoff is getting to my destination faster - I misbehave in class because I get the attention of my peers - I throw a tantrum until I get what I want.). 12 • Positive punishment rarely works for long-term behaviour change. • It tends to only suppress behaviour. Drawbacks of Punishment • It does not teach a more desirable behaviour. • If the threat of punishment is removed, the behaviour returns. • Produces negative feelings in the learner, which do not promote new learning. • Harsh punishment may teach the learner to use such behaviour towards others (social learning). 13 Better to focus on rewarding the desirable behaviour than paying attention to the undesirable behaviour Removing the rewarding consequence of the behavior is often more effective than applying some form of positive punishment. Consider time-out for misbehavior in class. Misbehavior is often rewarded by the attention that the child receives, even when that is positive punishment (e.g., being reprimanded). It can be more effective to remove the child from the environment that is providing reinforcement for behavior. That is, to impose a response cost – losing something desirable (attention) works more effectively than positive punishments such as verbal reprimanding. 13 Alternatives to punishment • Stop reinforcing the problem behaviour (extinction). • Reinforce an alternative behaviour that is both constructive and incompatible with the undesirable behaviour. • Reinforce the non-occurrence of the undesirable behaviour. • Generate your own examples for each of these. 14 Problematic, undesirable behaviors are usually being maintained through their reinforcing consequences. For this reason, if you want to change the behavior of someone else (an annoying colleague, or a problematic behavior in a child or pet), it is important to analyse the situation and work out what is reinforcing the behavior. Indeed, sometimes we inadvertently/unintentionally reward behaviors in others that we would prefer to see eliminated. Once you have identified the reinforcer, you need to work out how to remove it. Another way to approach behavior change is to take the attention off the problematic behavior and focus instead on reinforcing an alternative desirable behavior that is incompatible with the undesirable behavior. For example, with a child that is whining to get what they want, attend to the child only when they speak in a normal tone of voice. Another approach is to reinforce the nonoccurrence of the behavior – that is, reward self-control. For example, if you have two children who fight a lot, you could set a period of time in which, if they do not fight, you will provide them with a reward (extra screen time or something). See if you can generate some more examples that are relevant to your experiences. 14 • Stimuli in the environment can become antecedents for operantly conditioned behaviours Antecedents: The relationship between classical and operant responses • An antecedent is a ‘cue’ that signals the availability of a reinforcer. • Note that the antecedent-reinforcer relationship is based on a classically conditioned association. • Classically conditioned associations become cues for operant behaviours. • For example, the sight of my mobile-phone is associated with the rewarding consequences of scrolling through social media • The phone becomes a cue (antecedent) for the behaviour of scrolling social media and its attendant rewards. • “ABC model” of operant conditioning • Antecedent →Behaviour → Consequence 15 Discriminant stimuli (antecedents) are covered on page 231 of the Burton text. 15 Antecedents, habits and addictions • Antecedent stimuli drive habitual behaviours: • The sight of my favourite café is associated with the rewarding consequences of my morning coffee • the café is an antecedent for the behaviour of buying a coffee. • The sign for the pokies is an antecedent for gambling behaviour. It is associated with the rewards of winning. • I can structure the environment with antecedent stimuli to encourage positive behaviours • For example, leaving my running shoes beside the bed each night can become an antecedent for my morning run. • Or, if I want to reduce social media before going to sleep then remove the phone from the bedroom • Watch the Dwight Conditioning Video and explain in terms of conditioning processes - Is this an example of classical conditioning, operant conditioning, or both? https://youtu.be/iTWopzBJFyY 16 Discriminant stimuli • An antecedent becomes a discriminative stimulus when it signals which of two or more potential behaviours is appropriate in a context. • For example, swearing is punished in some contexts and is associated with rewarding outcomes in others - the context allows us to discriminate between situations associated with rewards or punishments for a particular behaviour. • In a Skinner box, a green light may signal food availability whereas a red light may signal impending foot-shock. • Receiving the food or avoiding the foot-shock may then be contingent on pressing a lever or moving to the opposite side of the cage, respectively. • Note, the discriminant stimulus-reward relationship is based on a classically conditioned association. 17 17 Discriminant stimuli • Animal training involves learning discriminant signals for different behaviours • Different hand signals and/or verbal commands signal which behaviour to produce for a reward. • Skinner taught pigeons to turn circles counter-clockwise to receive a reward when in one box, and clockwise to receive a reward in another box • The pigeons learned that each box provided a distinct discriminant stimulus for each behaviour. • How much of your voluntary behaviour is controlled by antecedent and discriminant stimuli in the environment? • Watch this video, made by our own Dr Chris Groot to see principles of Classical and Operant Conditioning in Action with the Seals at the Melbourne Zoo. 18 https://www.youtube.com/watch?v=wBhcnwf9Jus&feature=youtu.be 18 Homework • Slides from Classical Conditioning - answer the questions. • Map out the Dwight Conditioning video for elements that related to classical and/or operant conditioning. • Think of examples of your ‘voluntary behaviour’ that are driven by antecedent stimuli...... • I will also post these activities in the “tutorial preparation” section of the Week 2 page on Canvas, and Elektra will be in touch with a reminder on Friday. 19 19

L2 Operant Conditioning 2023 Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript