Operant Conditioning Lecture Notes PDF

Operant Conditioning PSYU2236/PSYX2236 Biopsychology & Learning Lecturer: Dr Patrick Nalepka (he/him) Updates Tutorial 4 (Learning in Artificial Agents) continues. Week 11 is Stream B Congratulations to those who turned in their research report and written reflection. Marking team is busy assessing. Students waiting for Special Consideration approvals should submit when ready, as extensions are applied from the original due date, not from the date you receive the approval. Students who need to replace their uploaded paper must send it to the unit convenor via the Microsoft Contact Form. The next lecture topic is “Extinction”, presented by Dr Gaurav Patil Reading: Revise Mazur Textbook Chapters 5-6 titled "Basic Principles of Operant Conditioning" and "Reinforcement Schedules: Experimental Analyses and Applications“. Extinction is discussed on pages 64-66; 126. The final exam will be 60 multiple choice questions 30 Biopsychology-related questions, 30 Learning-related questions 2-hour exam (i.e., 2 minutes per question) Make use of our PSYUX2236-2024-S2 Team for unit discussions, ask questions, or help answer questions from your peers! Outline 1. The difference between Operant and Classical Conditioning 2. What can a reinforcer be 3. The effect of reinforcement schedules on behaviour 4. Changing behaviour Classical vs. Operant Conditioning Classical conditioning involves involuntary responses (a reflex) to a conditioned stimulus (e.g., a bell causing a dog to salivate). Operant conditioning involves voluntary behaviours given a situation, where the likelihood of these behaviours can increase or decrease due to their consequences (e.g., being presented a reward or punishment). In the beginning … Thorndike Thorndike’s Law of Effect: If, in a specific situation, a response is followed by a reinforcer, the response will become associated with that situation and will be more likely to occur again in that situation. Skinner Box OPERANT CONDITIONING The organism operates on its environment in some way to achieve some desirable outcome Behaviour is associated with consequences 6 Skinner’s Operant Box Key features of Skinner’s Operant Box Some behaviour that can be done to obtain reward. Rate measured by experimenter A dispenser of food or liquid used as a reinforcer (reward) Tones or lights to signal availability of opportunity for reward or pending punishment Used in discrimination and generalisation studies Measure rate of bell rings to gain food (consequence) The consequence is pleasant so probability of repeating the action increases ∴ positive reinforcement 8 Shaping or successive approximations Shaping is the use of reinforcement of successive approximations of a desired behaviour. Specifically, when using a shaping technique, each approximate desired behaviour that is demonstrated is reinforced, while behaviours that are not approximations of the desired behaviour are not reinforced. E.g., the cat’s owner may have first rewarded them when they lifted their paw, then when they reached towards the bell, then when they touched the bell regardless of if it rang, then finally only when they rang the bell. 9 What can a reinforcer be Consequences of our Behaviour The behaviour… Increases Decreases Added to Positive Positive the Reinforcement Punishment Something environment is… Removed Negative Negative from the Reinforcement Punishment environment (Avoidance Learning) Consequences: Positive Reinforcement Behaviour Consequence Outcome 12 Consequences: Positive Punishment Positive Punishment Something is added to the environment, that causes the behaviour to decrease in frequency ∴ that something must have been unpleasant 13 Consequences: Positive Punishment Being made to do chores is positive punishment Consequences: Negative Reinforcement Negative reinforcement Something is removed from the environment, that causes the behaviour to increase in frequency ∴ that something must have been unpleasant Behaviour—arriving late to a meeting Negative reinforcer—avoiding the boring meeting Consequences: Negative Reinforcement Negative reinforcement This is often about avoidance learning – learning how to avoid unpleasant situations Applying sunscreen to prevent sunburn Consequences: Negative Punishment Negative Punishment Something is removed from the environment, that causes the behaviour to decrease in frequency ∴ that something must have been pleasant AKA Response Cost or Omission Training – but regardless of name – they all involve the removal of a stimulus, following the targeted behaviour, that the person values/desires/enjoys. To facilitate the process they may be reinforced for exhibiting another more desirable behaviour (DRO: Differential Reinforcement of Other behaviour) Consequences: Negative Punishment Negative Punishment (aka Response cost or Omission training) If the person makes the “wrong” response then → they will lose something of value So they should learn to inhibit or omit the “wrong” behaviour (omission learning). So what should the mother/father also do? Implement the threat and reward child for tidying the toys Behaviour-Consequence Relationships: Emotions Happiness Positive Reinforcement Application of Pleasant Stimulus Ecstasy Elation Anger Omission Learning Rage Anger Frustration Pleasure Relief Relief Removal of a Negative Reinforcement Pleasant Stimulus Removal of an Unpleasant Negative Stimulus punishment Grief Sadness Apprehension Fear Terror Positive Punishment Application of Unpleasant Stimulus Fear The effect of reinforcement schedules on behaviour Schedules of Reinforcement 21 Continuous Schedule Behaviour is followed by a consequence each time it occurs Excellent for getting a new behaviour started Behaviour stops quickly when reinforcement stops 22 Partial schedules for resistance to extinction Ratio Schedules: (Responses/actions) e.g. after the pre-determined number of responses has been made → outcome Interval Schedules: (Time lapse) e.g. the 1st response after the specified time has Combinations elapsed → outcome Fixed-Ratio Variable-Ratio Fixed Schedules: (set rate/time) Fixed-Interval e.g., every 5 responses (ratio) or every 5 mins (interval) → outcome Variable- i.e., a predictable schedule Interval Variable Schedules: (random average) E.g., every 2 - 5 responses (ratio) or every 2 - 5 mins (interval) → outcome i.e., an unpredictable schedule 23 Fixed-Ratio Schedule Behaviour/reinforcement (100/1 or 15/1) Response Rate: (Higher ratio = faster responding) Behaviour: tend to work hard (Ratio run); receive reinforcement; then brief post- reinforcement pause – see point A on figure), then work hard…. Resistance to Extinction: Low 24 Behaviour on a Fixed Ratio High rates of responding → pause after receiving reward (PRP) → then onwards for the next reward Make the number of responses too high → ratio strain: a disruption in responding due to an overly demanding response requirement PRP 25 Behaviour on a Fixed Ratio Ratio Strain A result of abrupt increases in ratio requirements Characteristics include: avoidance, aggression, and unpredictable pauses in responding Ratio strain is the point of too much energy expended in exchange for too little in return. It is the point of If things don't get better, I don't know how much longer I can keep doing this. Behaviour on a Fixed Ratio Note also the closer they get to their target # of responses – so the rate of bar pressing increases – known as a ratio run Ratio run 27 Goal gradient hypothesis Hull (1932, 1934) wrote papers with titles like The Goal- Gradient Hypothesis and Maze Learning to explore the effect of goal distance and speed of goal progress. As Hull put it, “...animals in traversing a maze will move at a progressively more rapid pace as the goal is approached.” Hull, C.L. (1934). The rat’s speed-of-locomotion gradient in the approach to food, Comparative Psychology, 17, 393-422. Variable-Ratio Schedule Behaviour/Reinforcement: random/unpredictable number of responses between reinforcements Response Rate: Fast Behaviour: Work hard and at steady rate Resistance to Extinction: High What if the rewards came on a temporal basis? How would it affect rate of responding? Reduce it – no point working if its not making the rewards come any faster Button pressing on a Sydney pedestrian crossing – Fixed Interval! Sydney CBD traffic lights generally operate on an automated cycle on Mon-Sat 7am-7pm 30 Only on Sundays and AH → pushing the button makes the lights change Fixed-Interval Schedule Behaviour/Reinforcement: After 1st then fixed amount of time Response: Scalloped Behaviour: High before Reinforcement/Long pause after Lowest rate of responding Resistance to Extinction: Low 31 Variable-Interval Schedule Behaviour/Reinforcement: First Response, then “average” time period elapse. Response Rate: Slow Behaviour: Work at Steady rate Resistance to extinction: High 32 Variable-Interval Schedule E.g., Petrol prices follow a Variable-Interval Schedule of reinforcement Highest rate of responding Ratio:→ Consistent rates of responding (higher the ratio the higher the rate of responding) Lowest rate of responding Interval: Shorter intervals → higher rates 34 Part 1 Summary In Operant conditioning a relationship between a behaviour and its consequences is learned Both negative and positive reinforcers increase the likelihood of that behaviour Both negative and positive punishment decrease the likelihood of that behaviour Different schedules of reinforcement can be used to produce different patterns of behaviour that are more or less resistant to extinction. Break Break Time Changing behaviour Part 2: Changing behaviour Schedules for reducing frequency of unwanted behaviours Differential Reinforcement of Incompatible behaviour (DRI) Differential Reinforcement of Alternative behaviour (DRA) Differential Reinforcement of Other behaviour (DRO) Differential Reinforcement of Low rates of responding (DRL) Operationalising the use of reinforcers and discrimination training Magnitude of Reinforcement Delay of Reinforcement Contingency Secondary reinforcement E.g. Money, tokens, clickers Activity reinforcers – Premack’s principle Want to reduce or eliminate a behaviour? Punishment? Extinction? Ethical issues – not to mention the effectiveness of these methods so…. 39 Differential Reinforcement Four procedures that incorporate reinforcement to address and treat disruptive behaviours are: 1. Differential Reinforcement of Other behaviour (DRO) 2. Differential Reinforcement of Low rates of responding (DRL) 3. Differential Reinforcement of Incompatible behaviour (DRI) 4. Differential Reinforcement of Alternative behaviour (DRA) not necessarily incompatible 40 Differential Reinforcement of Other Behaviour Differential Reinforcement of Other behaviours (DRO) In this case the subject periodically receives the positive reinforcer provided it is engaged in other behaviours. E.g. Sitting quietly (thus omission training) involves reinforcing “other” behaviour, Whole Interval DRO (WIDRO)—If target behaviour has not occurred throughout entire period e.g. 1 minute → positive reinforcer Differential Reinforcement of Other Behaviour E.g. → Rf Rate of hand-flapping (responses per minute) by a student with an intellectual disability during baseline conditions, when reinforcement was not provided, and during omission training phases (DRO), when reinforcement was provided for 1-minute periods without hand-flapping. (From "Maintenance of Therapeutic Change by Momentary DRO", by L.E. Barton, A.R. Brulle, and A.C. Repp, Journal of Applied Behavior Analysis, 42 1986, 19, 277-282). Examples of other differential reinforcement Differential Reinforcement of Low rates of responding (DRL) ― A teacher wants the child to wash their hands, but not more than once before lunch. Using DRL, the teacher would reward the child by allowing them to be first in line to lunch if they avoid washing their hands more than once. Differential Reinforcement of Incompatible behaviour (DRI) ― A teacher wants the child to remain in his seat. Each time the student leaves their seat, the behaviour is ignored. However, when the child remains seated, the teacher rewards them with a sticker. Differential Reinforcement of Alternative behaviour (DRA) ― Each time the child makes a demand, their parents ignore them. Only when the child asks politely do the parents turn, acknowledge him, and satisfy their request. Establishing Operations (EOs) Establishing Operations (EOs) are factors that affect the effectiveness of reinforcers The intensity, amount, and type (i.e. quality) of reinforcer determines its effectiveness Reinforcer Quality A rat’s rate of responding (# of bar- presses per minute) increases with higher concentrations of sucrose in Reinforcer the water, or the magnitude of reward Magnitude Guttman, N. (1954) Equal-reinforcement values for sucrose and glucose solutions compared with equal- sweetness values. Journal of Comparative and Physiological Psychology. 47(5), 358-361. Reinforcer Magnitude Larger the reward the faster the acquisition of learning Reinforcer must be of sufficient magnitude for it to be worth making the response 45 Reinforcer Magnitude Reward magnitude is often a matter of “being in the eyes of the beholder”. It need not be the absolute size that is important but how it is perceived. E.g., two groups of rats reinforced for the same amount of food. Rats run faster for the same amount of food, but when it is broken up into more pieces! (Christopher, 1988). Similar studies show that many small reinforcers are generally more effective than a few large ones (Schneider, 1973; Todorov, et al., 1984; Pryor 1984). Magnitude is relative to the person. A very large reinforcer to a 5 year old, may be a very weak reinforcer to a 25 year old. 46 Contrast Effects All a matter of what you have been used to! 47 Contrast Effects Shifting the value of the reward in “mid-stream” is also effective in changing behaviour known as Contrast Effects Reinforcer magnitude is all a matter of relativity Contrast effects are obtained when the quality of the reinforcer is switched as well Responding is influenced by the reinforcement characteristics that an organism has come to expect in its 48 Delay of Reinforcement Gradient of delay: The delay decreases the contiguity between response and outcome Temporal contiguity is an important factor in the effectiveness of operant conditioning. This golden retriever’s obedience training will be much more effective if the owner rewards his dog with a treat straight after the desired response. 49 Delay of Reinforcement The delay decreases the contiguity between response and outcome  Long delays make it difficult for the person/animal to see the relationship between their response and the consequence.  A delay allows time for other behaviours to occur during the interval → superstitious reinforcement of them.  Deleterious effects of delay can be reduced by providing a signal that the reward is coming i.e. clicker. Acquisition of a running response as a function of the length of time reward is delayed on each trial (From E.J. Capaldi, 1978). If the arrival of food is delayed by 10 seconds they wont run nearly as fast as when the food is already in the goal box waiting for them! Contiguity and Reinforcement Rat breaks beam from photocell near ceiling and either 0, 4 or 10 secs later food appears in dispenser Mean cumulative responses when reinforcement was immediate, delayed 4 seconds, and delayed 10 seconds: Learn! Photocell beam Don’t learn! Schlinger, H. D., & Blakely, E. (1994). The effects of delayed reinforcement and a response-produced auditory stimulus on the acquisition of operant behaviour in rats. The Psychological Record, 391-409 Contiguity and Reinforcement But add a tone 0.25 sec after the rat breaks the photocell beam… and effects of delay are reduced. Tone → feedback to help bridge the 4 and 10 sec delay to Reinforcement  responding Photocell beam  responding Gradient of Delay Note – bar-pressing rates are low Delay of reinforcement because Rf is delayed by 30” but they gradient: The effectiveness do “get” it! of the reinforcement decreases as the delay increases When there is no bridging cue, just how long can the interval be? 30 seconds and rat can still learn the association! The mean rate of pressing a lever by a single rat when food was presented 30 seconds after a response (adapted from Lattal, K. A., & Gleeson, S. (1990). Response acquisition with delayed reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 16, 27–39. Speed of reward Addiction is linked to the speed of reward ✓ Why checking social appsis more rewarding than finishing a psych report? This is why Internet pornography is much more addictive than pornography contained in a magazine or video. And this is exactly why modern poker machines are much more addictive than older pokies – the “one-armed bandits”. Modern pokies increase the gambling 'dosage' to much higher levels. All this speed means more bets, and more bets mean more excitement and more excitement means more dopamine. More dopamine means the modern pokie is more addictive than its 54 predecessors Response-Reinforcer Contingency The Reinforcer must be the result of some Response The greater the consistency between the Reinforcer and the Response, the quicker/more effective the conditioning. GOALS MUST BE SET AND MET BEFORE A REWARD IS GIVEN Response Control Two month old infants who can make a mobile move, smile and coo at it, while those who have no control over its Moving head closed Mobile was turned for the motion, stop smiling. switch in pillow which infants (with a frequency to activated mobile for a match the infants that had Watson, J. S. (1967). Memory and "contingency analysis" in infant learning. couple of seconds control over the mobile) Merrill-Palmer Quarterly, 13, 55-76. Reinforcers: Primary & Secondary A primary reinforcer is a stimulus that is reinforcing even without previous training. Primary reinforcers are biologically relevant stimuli or events i.e. they have survival value. Examples include food, water, and sex. A conditioned (secondary) reinforcer is an arbitrary event (such as a tone, clicker or token) that increases the frequency of an operant response. Events that have been associated with rewarding experiences acquire reinforcing power. They are reinforcing because they permit an organism to obtain a primary reinforcer. Reinforcers: Primary & Secondary There are a variety of classes of reinforcers that can be subsumed under the heading secondary reinforcer: Tokens: Money! Social: The word “Thanks” or a smile → feel good Activity reinforcers: screen time (Premack principle) Covert reinforcers: Giving yourself a quiet slap on the back for sticking to your self-control goals Functions of Conditioned Reinforcement Conditioned reinforcers: Tell organism it has done right thing Tell the organism what to do next Bridge long periods between unconditioned reinforcers Conditioned Reinforcement: Clicker training Used for training animals – from dogs to horses to dolphins (Karen Pryor) Pair a hand-held clicker with food through straightforward classical conditioning. The sound of the clicker can then reinforce other behaviours – such as “Sit”. 59 Clicker Training: Advantages Clickers sound the same no matter how you are feeling when you press it A clicker is easier to discriminate from everything else we say to the dogs! Split second timing is possible with the clicker thereby reinforcing the precise behaviour. Using a primary reinforcer, such as food, can cause the dog to become focused on the food, and the food giver, rather than on the behaviour. The clicker can reinforce the behaviour immediately. Karen Pryor (1995). A Dog and A Dolphin: An Introduction to Click and Treat (TM) Training, Sunshine Books Karen Pryor Don't Shoot the Dog Bantam Books 1984, 1999 Karen Pryor Clicker Training for Dogs Sunshine Books 1999 So how powerful is a secondary reinforcer? A number of variables affect the strength of a secondary reinforcer: 1. The magnitude of the primary reinforcer 2. The number of pairings (with the primary reinforcer) 3. Time elapsing between the presentation of the secondary reinforcer and the primary reinforcer 61 61 Token Economies Token Economies were pioneered by Ted Allyon and Nate Azrin Used to teach and maintain normal behaviour of psychotic residents in a psychiatric institution. Were suffering from severe problems with verbal and social behaviour Those included in the token economy earned little metal tokens by making responses. Ayllon, T., & Azrin, N. H. (1965). The measurement and reinforcement of behavior of psychotics. Journal of the Experimental Analysis of Behavior, 8, 357-383. Use of Tokens with Humans (Ayllon & Azrin, 1965) Reinforcement in a token economy. This graph shows the effects of using tokens to reward socially desirable behaviour in a mental hospital ward. Desirable behaviour was defined as cleaning, bed making, attending therapy sessions, and so forth. Tokens earned could be exchanged for basic amenities such as meals, snacks, coffee, game-room privileges, or weekend passes. The graph shows more than 24 hours per day because it represents the total number of hours of desirable behaviour performed by all patients in the ward. Adapted from Ayllon, T., & Azrin, N. H. (1965). The measurement and reinforcement of behavior of psychotics. Journal of the Experimental Analysis of Behavior, 8, 357-383. Activity as a secondary reinforcer: Premack’s Theory "What is a reinforcer?" David Premack (1965) supplied a way out of the dilemma--just look at what an organism does as what is important "reinforcement involves a relation, typically between two responses, one that is being reinforced and another that is responsible for the reinforcement. This leads to the following generalization: Of any two responses, the more probable response will reinforce the less probable one" (1965, p. 132). This generalization, known as the Premack Principle, is usually stated somewhat more simply: High probability behaviour reinforces low probability behaviour Premack, D. (1965). Reinforcement Theory. In D. Levine (Ed.), Nebraska symposium on motivation (Vol. 13). Lincoln, NE: University of Nebraska Press. 64 64 Premack’s Theory of Reinforcement.. If you eat your veggies then you get to eat ice-cream OR If you study first then you get to watch TV …… and Punishment The theory also states that punishment occurs when the instrumental behaviour leads to a less-preferred response So if you’re naughty…. then you get to do household chores Summary Different reinforcement schedules can also be used to reduce unwanted behaviours The undesirable behaviour is not reinforced, but instead other, incompatible, or alternative responses are reinforced Or lower rates of the unwanted behaviour targeted are reinforced. The magnitude, delay, and contingency of the reinforcer impacts is effectiveness Reinforcers can either be primary (e.g., food) or secondary (money) Summary Premack’s principles proposed that activities are reinforcing High probability behaviours (behaviours that the organism frequently does when given the opportunity) reinforce low probability behaviours (behaviours that the organism would be unlikely to do on their own) Have a good day!

Operant Conditioning Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue