Lecture 4: Associative Learning PDF
Document Details
Uploaded by PrizePhotorealism
Western University
Tags
Summary
This document provides lecture notes on the topic of associative learning. It covers classical and instrumental conditioning, and includes discussion of Ivan Pavlov's work and influential figures.
Full Transcript
Lecture 4 Monday, January 6, 2025 1:25 PM Associative Learning: Beyond Habituation and Sensitization: - For the most part, habituation and sensitization involve learning just about one stimulus, and bring an organism “in tune” with its environment....
Lecture 4 Monday, January 6, 2025 1:25 PM Associative Learning: Beyond Habituation and Sensitization: - For the most part, habituation and sensitization involve learning just about one stimulus, and bring an organism “in tune” with its environment. ○ Events in the world, however, often do not occur in isolation. - If humans and non-human animals were limited to these kinds of behavioural mechanism they would be very limited in the kinds of things they could do. - Learning what stimuli occur together can help us more effectively interact with our environment! Associative Learning: - Two kinds of associative learning have dominated the psychological literature: - A. Classical (or “Pavlovian”) Conditioning - B. Instrumental (or “Operant”) Conditioning. A - Classical Conditioning: - Also known as Pavlovian conditioning - Classical conditioning is the most simple mechanism whereby organisms learn about the relations between one event and another. - Classical conditioning enables both humans and non-human animals to take advantage o orderly sequences in the world: ○ Your car does not run unless the ignition has been turned on. ○ You cannot walk through a door until it is opened. ○ It does not rain unless there are clouds in the sky. History of Classical Conditioning: - Most famously associated with Russian Scientist Ivan Pavlov (1849-1936). - Simultaneously discovered by Edwin Twitmyer (1873-1943). - Twitmyer tested knee-jerk reflexes in college students by sounding a bell.5 seconds befo striking the patellar tendon just before the knee cap (after several trials, the bell alone would elicit the knee-jerk response). Ivan Pavlov: - Not a psychologist. - His studies of classical conditioning were an extension of his original research in digestion , ms, of ore n, striking the patellar tendon just before the knee cap (after several trials, the bell alone would elicit the knee-jerk response). Ivan Pavlov: - Not a psychologist. - His studies of classical conditioning were an extension of his original research in digestion for which he won a Nobel prize. - Initially maintained a strong belief in nervism - As part of his digestive research, Pavlov developed artificial fistulae that collected stomac secretions - Lab technicians observed that dogs produced stomach secretions merely at the sight of fo or even at the sight of the person who normally fed them. - The lab assistants referred to these as psychic secretions, because they appeared to occu the mere thought of food. - Pavlov realized that the dogs drooling in mere presence of food was a simple but importa form of learning. - He began pairing the food with other, neutral stimuli, in order to study this form of learni - The process of acquiring, through experience, new and relatively enduring information or behaviours. The Classical Conditioning Paradigm: - Pavlov’s procedure involved two stimuli. - The first was a light or a tone that does not elicit salivation at the beginning of the experiment (a neutral stimulus). - The second was a food (typically meat powder) or the taste of a sour solution which was placed in the dog’s mouth. This would elicit vigorous salivation even the first time it was presented. - The tone/light is the conditioned stimulus (CS). - The food/sour taste is the unconditioned stimulus (US). - The salivation that eventually came to be elicited by the tone/light is the conditioned response (CR). n, ch ood, ur at ant ing. r - The tone/light is the conditioned stimulus (CS). - The food/sour taste is the unconditioned stimulus (US). - The salivation that eventually came to be elicited by the tone/light is the conditioned response (CR). - The salivation that was always elicited by the food/sour taste is the unconditioned respon (UR). - CS: A stimulus that does not elicit a particular response initially, but comes to do so as a result of becoming associated with a US. - US: A stimulus that elicits a particular response without the necessity of prior training. - CR: The response that comes to be made to the CS as a result of classical conditioning. - UR: A response that occurs to a stimulus without the necessity of prior training. Contemporary Studies of Pavlovian Conditioning: - Contemporary studies of Pavlovian conditioning uses many different species, including humans, rats, mice, rabbits, pigeons, quail). - These procedures were developed primarily by North American scientists during the seco half of the twentieth century. - Behaviorism dominant school in North America during this time Fear Conditioning: - Watson & Rayner (1920) believed that infants are at first limited in their emotional reactivity...interested particularly in the conditioning of emotion. - Assumed “there must be some simple method by means of which the range of stimuli wh can call out these emotions and their compounds is greatly increased” - The “simple method” they spoke of is Pavlovian conditioning. - Conditioned fear response to a white rat in baby Albert. Little Albert: - John Watson conditioned Little Albert to fear a rat through associative conditioning ○ Generalized to other white furry objects/animals nse ond hich - The “simple method” they spoke of is Pavlovian conditioning. - Conditioned fear response to a white rat in baby Albert. Little Albert: - John Watson conditioned Little Albert to fear a rat through associative conditioning ○ Generalized to other white furry objects/animals - Turns out that “Albert” was not healthy, possibly neurologically impaired and died at age 6 ○ Douglas Merritte was thought to be true identity (Fridlund & Beck, 2012) ○ Albert Barger, a man who died in 2007 has since been suggested - Proper debriefing never occurred - His mother may have been an employee of the hospital where the research took place, possibility of coercion Fear in Non-Human Animals: - Fear and anxiety are sources of considerable human discomfort, and in severe cases can l to serious psychological problems. - Scientists are currently working to better understand the neural mechanisms of fear and anxiety, and how these behaviours are acquired. - Many of these questions cannot be addressed with human subjects, for ethical reasons. Freezing in Rats: - The aversive US in these studies is a controlled shock delivered through the floor of a cag - The CS in these studies is typically a discrete stimulus (ex: light/tone). - Rats show fear by freezing. Freezing is a common defense response that occurs in a varie of species in anticipation of aversive stimulation. - Freezing is defined as immobility of the body (except for breathing) and the absence of movement of the whiskers associated with sniffing. - Freezing probably evolved as a defensive behaviour—animals are less likely to be caught predators if they are motionless. - When animals freeze, they stop lever pressing. Number of lever presses during CS vs. number of presses during non-CS period is a measure of conditioned fear. - Suppression Ratio: LP during CS/(LP during CS + LP during an equal period of time preced CS) Conditioned Suppression: e of lead ge. ety by ding Other Measures of Fear-Induced Immobility: - Conditioned suppression procedures involve the suppression of ongoing behaviours. - Lick-suppression procedure: Ongoing behaviour is licking a drinking spout. If a fear CS (a tone) is used, licking behaviour is suppressed, and it will take them longer to make a specified number of licks. - Bar-press suppression: Rats can be trained to press a lever for food reward. Tone or light then paired with brief shock. As subjects acquire the conditioned fear, they suppress leve pressing during the CS. Sign Tracking: - Pavlov’s research originally dealt with salivation and highly reflexive responses, which encouraged the belief that classical conditioning occurs only in reflexive response system - This restrictive view of Pavlovian conditioning has since been abandoned. This is because more complex paradigms like sign tracking (aka autoshaping). - Animals often approach and contact stimuli that signal the availability of food. ○ Ex: A squirrel can predict the availability of acorns based on the leaves and shape o oak trees. - By approaching these contact stimuli, the animal is likely to be rewarded with food. - Sign tracking is investigated in the laboratory by presenting a discrete, localized visual stimulus just before the delivery of food. Burns & Domjan (2000) - “long-box” procedure - Subjects were male domesticated quail - CS = wood block lowered from ceiling 30 seconds before a female quail was introduced. - CS and female were presented at opposite ends of 8 foot chamber. - Male quail actually went to the CS, rather than the side where the female would be presented. is er ms. e of of Is Sign Tracking Always Observed in Pavlovian Conditioning? - No - Individual differences in sign tracking have been attributed to individual differences in impulsivity and vulnerability to drug use (Tomie, Grimes, & Pohorecky, 2008) - Greater activation of dopamine reward circuits - Sign tracking is thus a valuable model for studying learning processes and neural mechanisms that contribute to drug addiction. Sign tracking vs. Goal Tracking: - Individual differences in sign tracking have been shown in rats. - Rats placed in chamber with food cup in the middle of the wall. Lever inserted through sl on either side of the cup. - Lever = CS, food delivered to cup = US - For each conditioning trial, lever is inserted/withdrawn, followed by delivery of food. Rat then tested in trials in which no food is delivered. - Sign tracking measured by contact with lever (CS) - Goal tracking measured by contact with food cup (US) Flagel, Akil & Robinson (2009) Learned Taste Aversions: - When we eat, the sight, taste and smell of food are experienced before the food is swallowed/digested. - The sensory aspects of food are the CS (ex: wing pattern in monarch butterfly). ○ Exploited by the viceroy, a form of mimicry - These become associated with the consequences of eating (good or bad), which are the U - Learned taste aversions can occur even if illness does not occur for several hours. lots ts US. swallowed/digested. - The sensory aspects of food are the CS (ex: wing pattern in monarch butterfly). ○ Exploited by the viceroy, a form of mimicry - These become associated with the consequences of eating (good or bad), which are the U - Learned taste aversions can occur even if illness does not occur for several hours. ○ Differences amongst species though! Long-Delay Taste Aversions (Smith & Roll 1967) - Rats put on water deprivation schedule, so that they are thirsty. - Given water flavoured with saccharin, then exposed to radiation from x-ray machine to induce sickness. - Control group taken to x-ray, but not irradiated (“sham-irradiated”). - Following radiation/sham-irradiation, given choice of normal or saccharin water to drink two days. Dietary Generalists vs. Specialists: - Rats forage for multiple foods and cannot clear their system of toxins by vomiting. ○ Neophobic, takes only a small bit of anything new ○ If it gets sick, avoids food in future - Vampire bats, unlike insectivorous bats, are dietary specialists, who consume blood. Common Pavlovian Conditioning Procedures: - To measure the effectiveness of conditioning, we can measure magnitude, probability or latency of response. - No “best” procedure, per se - When the CS occurs matters, not just the CS-US relationship - We can measure magnitude, probability or latency of response to determine effectivenes - Delayed, simultaneous, trace, and backwards conditioning all result in strong learning and vigorous conditioned responding in certain instances. - Long-delayed conditioning occurs in learned taste aversions. - Fear conditioning in CS results in freezing with short- delayed procedure, but escape with simultaneous procedure. - Backwards conditioning produces mixed results. ○ Can be inhibitory Mechanisms of Associative Learning: 1. Temporal Contiguity 2. Stimulus Salience 3. Informativeness (blocking, latent inhibition) 4. Extinction 1 - Temporal Contiguity: - A US that immediately follows a CS and a response that immediately produces a reinforce induce robust conditioning. US. for ss d h er 3. Informativeness (blocking, latent inhibition) 4. Extinction 1 - Temporal Contiguity: - A US that immediately follows a CS and a response that immediately produces a reinforce induce robust conditioning. - Similarly, when there is a long delay before receiving reinforcement, animals are more lik to make associations with extraneous stimuli (Dickenson, 1980). - There are exceptions though - Long delay taste aversion - Temporal contiguity does not account for all explanations in associative learning - “superstition” in pigeons - Timberlake (1984): When rats are reinforced after 5s, rats develop CR of gnawing at CS. When reinforcement occurs after more than 5s, foraging CR was developed. 2 - Stimulus Salience: - Animals are much quicker to acquire responses to stimuli that are salient, and responding increased when the stimuli have biological significance to the animal. 3 - Informativeness - Latent Inhibition: - Latent Inhibition: Previous exposure to the CS, in the absence of the US, hampers subsequent conditioning to the CS. - Basically...a familiar stimulus takes longer to acquire meaning as a CS than a new stimulu - This is because organism first learns that CS has no motivational significance...they must then overcome this information in order to learn the association between the CS and the The Blocking Effect: - The blocking effect involves interference with the conditioning of a novel stimulus becaus of the presence of a previously conditioned stimulus. - Phase 1: Experimental group received pairing of stimulus A with US until association is formed (ex: bread pudding CS, illness US). - Stimulus B is then presented with Stimulus A (ex: bread pudding with sauce) and paired w US (illness). - ***very little responding to stimulus B will happen when presented on its own, even thou it is paired with US*** - Bread Pudding (CS1) = Illness (US) - Bread Pudding (CS1) + Sauce (CS2) = Illness (US) - Sauce alone (CS2) = No Illness (US) So sauce on bread pudding won’t result in illness, even though it was paired with bread puddin Sauce is “blocked” by previous association with pudding. The Rescorla-Wagner Model: er kely g is us. e US. se with ugh ng. - Sauce alone (CS2) = No Illness (US) So sauce on bread pudding won’t result in illness, even though it was paired with bread puddin Sauce is “blocked” by previous association with pudding. The Rescorla-Wagner Model: - Is a model of Pavlovian conditioning in which the animal is theorized to learn from a discrepancy between what is expected to happen and what actually happens. - Mathematical model for conditioning in which the prediction of the US for a trial can be represented as the sum of all the associative strengths for the CS present during the trial. - Utilizes notion of the importance of “surprise” in US. - One of the most influential models of learning - If you expect a new pair of shoes for your birthday and instead get a car, this would be an unexpectedly large US. - If you expect a car and get a pair of shoes, this would be an unexpectedly small US. - Rescorla & Wagner assumed that the level of surprise (ie: the effectiveness of the US) depended on how different the US was from what the subject expected. ○ Strong conditioning = strong expectation of US ○ Weak conditioning = weak expectation of US How it Works: - Learning on a given conditioning trial is the change in associative value of a stimulus, represented by ΔV. - (λ – V) is the level of surprise (the difference between what occurs (λ) and what is expect (V) - k is a constant related to the salience of the CS and US. - The idea that learning depends on the level of surprise at the US is therefore expressed a follows: ΔV = k (λ – V) The Delta Rule: - ΔV = k (λ – V) - This is the fundamental equation of the Rescorla-Wagner model, sometimes known as th Delta Rule - The Delta rule indicates that the amount of learning ΔV is proportional to how far predictions of the US differ from what actually occurs (λ – V). The prediction error (λ – V) very large at first but gradually becomes smaller. ng.. n ted as he is How Does Rescorla-Wagner Explain the Blocking Effect? - Recall that the first stimulus A in the blocking effect receives extensive conditioning so th reliably predicts the US. In other words, Stimulus A has already reached the asymptote of learning (λ). - In Phase 2, Stimulus B is presented together with Stimulus A, and the two CSs are paired with the US, so V = VA + VB - Because of Phase 1 training, VA = (λ) at the start of Phase 2. But VB starts at 0. So V = λ + 0, or λ. - Therefore, stimulus B takes on no associative value in Phase 2, which results in the blocki effect. 4 - Extinction: - In classical conditioning, extinction involves repeated presentations of the CS without the US. - Appears to be the reverse of acquisition...but its not B - Operant Conditioning: - Operant conditioning, is a method of learning that occurs through rewards and punishme for behavior. It encourages the subject to associate desirable or undesirable outcomes w certain behaviors. - Also known as Instrumental Conditioning - Whereas classical conditioning concerns how animals adjust their behaviour to elements the environment that they do NOT control, operant conditioning focuses explicitly on the goal-directed or instrumental behaviour Early Investigations of Operant Conditioning: - E.L. Thorndike—Puzzle Boxes - Different boxes require different responses to get out. hat it f ing e ents with of eir the environment that they do NOT control, operant conditioning focuses explicitly on the goal-directed or instrumental behaviour Early Investigations of Operant Conditioning: - E.L. Thorndike—Puzzle Boxes - Different boxes require different responses to get out. Thorndike's Law of Effect: - States that if a response R in the presence of a stimulus S is followed by a satisfying event the association between the S and the response R becomes strengthened. - If, on the other hand, the response if followed by an annoying event, the S-R association be weakened. - ***key feature of this mechanism is that it compels the organism to make response R whenever stimulus S occurs. Explains many compulsive behaviours. ○ Ex: smell of popcorn (s) entices you to eat popcorn (r) Watson's "Behaviourist Manifesto" (1913) 1. Psychology should be purely objective, with any interpretation of conscious experience being removed, thus leading to psychology as the "science of behaviour" 2. The goals of psychology should be to predict and control behaviour (as opposed to descri and explain conscious mental states). 3. There is no notable distinction between human and non-human behaviour. B.F. Skinner(1904-1990) - Led “radical behaviourism” movement in psychology - Creator of operant conditioning chamber (also known as “Skinner Box”) - Also, the “air crib”...but no, it wasn’t a baby prison! - Considered one of the most influential psychologists of the 20th century, and the “father operant conditioning. Operant Conditioning Chamber (Skinner Box) - A laboratory device used to study animal behavior through operant conditioning. - The box contains a lever or key that the animal can press to receive a reward or avoid punishment. Types of Operant Conditioning Procedures: - Positive Reinforcement - Positive Punishment (Or punishment in textbook) - Negative Reinforcement - Negative Punishment (Or omission in textbook) - Often referred to as the “four quadrants” of operant conditioning. - Positive = Something added - Negative = Taking something away eir t, will ibe r” of - Positive Punishment (Or punishment in textbook) - Negative Reinforcement - Negative Punishment (Or omission in textbook) - Often referred to as the “four quadrants” of operant conditioning. - Positive = Something added - Negative = Taking something away - Reinforcement = increase in behaviour - Punishment = decrease in behaviour Four Quadrants of Operant Conditioning: Modern Approaches to the Study of Operant Conditioning: - Involve Discrete-Trial Procedures or Free-Operant Procedures. - Discrete-Trial Procedures: Similar to Thorndike’s procedure in that each trial begins with putting animal in apparatus, and ends when they complete instrumental response. ○ Often involves maze learning. - Behaviour typically measured using running speed or latency - Free-Operant procedures: allow the animal to repeat the instrumental response without constraint over and over again, without being removed from the experimental apparatus until the experimental session is complete. - Invented by B.F. Skinner (1938) to study behaviour in a more continuous manner than is possible with mazes. - Allows experimenter to observe variations in responding across time. Measuring Operant Behaviour: - Unlike discrete-trial techniques for studying operant behaviour, free-operant methods permit continuous observation over long periods of time. - Therefore the subject, not the experimenter, determines the frequency of response. This allows us to observe changes in the likelihood of response over time. - The relationship between responding and reinforcement is determined by the reinforcem schedule Reinforcement Schedules - Ratio: - Reinforcement is based on number of responses - Fixed Ratio (FR): A set number of responses is required to obtain reinforcement. - Variable Ratio (VR): Number of required responses varies around a mean value. s s ment schedule Reinforcement Schedules - Ratio: - Reinforcement is based on number of responses - Fixed Ratio (FR): A set number of responses is required to obtain reinforcement. - Variable Ratio (VR): Number of required responses varies around a mean value. - Progressive Ratio (PR): Animal must make increasing number of responses. Reinforcement Schedules - Interval: - Reinforcement is based on how much time has elapsed - Fixed Interval (FI): Time subject must wait before response can result in reinforcement is same across trials. - Variable Interval (VI): Responding is reinforced after an average time interval has passed. Like the VR schedules, the average time required to set up the reinforcer is used to label (so if reinforcement occurs at 1min, 3min, and 2min, it will be a VI2). Comparison of Ratio and Interval Schedules: - While it may seem that interval and ratio schedules influence behaviour in the same way they are in fact very different! - With both FR and FI schedules, there is a post-reinforcement pause. - Both FR and FI schedules result in increased responding right before delivery of reinforce - In contrast, both VR and VI schedules maintain steady rates of responding without predictable pauses. Reynolds (1975) - Compared the rate of pecking in pigeons on VR and VI schedules. Reinforcement Schedules: - Animals are also more sensitive to the payoff of a given scenario. - Acquisition is more rapid and declines more quickly in continuous reinforcement than wit partial reinforcement. - Fixed schedules result in rapid responding up to the presentation of reinforcer (the ratio run), with post reinforcement pause after delivery of reinforcement. - Ratio strain: extremely large FR may result in animal stopping response altogether. the. so y, er. th - Acquisition is more rapid and declines more quickly in continuous reinforcement than wit partial reinforcement. - Fixed schedules result in rapid responding up to the presentation of reinforcer (the ratio run), with post reinforcement pause after delivery of reinforcement. - Ratio strain: extremely large FR may result in animal stopping response altogether. Superstition in Pigeons: - Skinner put pigeons in separate chambers and had food delivered every 15 seconds regardless of what the bird was doing. - Skinner noted that birds would turn counter-clockwise, “toss” their heads, and make othe random movements, even though they were being reinforced for nothing. - Skinner thought these behaviours had accidentally been reinforced and referred to them “supersititious” - Many researchers felt that temporal contiguity was the main factor for learning in operan conditioning at the time...these results from Skinner appeared to support them. Staddon & Simmelhag (1971) - Replicated Skinner’s superstition experiment, but made more systematic/extensive observations. - Defined a variety of responses, recorded when they occurred ○ Orienting to food hopper ○ Pecking response key ○ Wing flapping ○ Turning in circles ○ Preening - Data showed that clearly, certain behaviours occurred predominantly at the end of the interval between successive reinforcers. ○ These were labeled terminal responses. - Others occurred in the middle of the interval between food deliveries ○ These were labeled interim responses. - Staddon & Simmelhag’s data suggest that what Skinner observed weren’t accidental behaviours at all. - To draw on Behaviour Systems Theory (Timberlake & Lucas, 1985), the periodic deliveries food most likely activate species-typical foraging and feeding responses. ○ As time for food approaches, animal will engage in focal search behaviours. ○ Immediately after food is received they engage in post-food focal search ○ In-between, they engage in general search. th er m as nt s of behaviours at all. - To draw on Behaviour Systems Theory (Timberlake & Lucas, 1985), the periodic deliveries food most likely activate species-typical foraging and feeding responses. ○ As time for food approaches, animal will engage in focal search behaviours. ○ Immediately after food is received they engage in post-food focal search ○ In-between, they engage in general search. Differences Between Operant and Pavlovian Conditioning: - One of the simplest ways to remember the differences between classical and operant conditioning is to focus on whether the behavior is involuntary or voluntary. - Classical conditioning involves making an association between an involuntary response an stimulus, while operant conditioning is about making an association between a voluntary behaviour and a consequence. - In operant conditioning, the learner is also rewarded with incentives, while classical conditioning involves no explicit enticements. - Classical conditioning is passive on the part of the learner, while operant conditioning requires the learner to actively participate and perform some type of action in order to b rewarded or punished. Extinction: - In classical conditioning, extinction involves repeated presentations of the CS without the US. - In operant conditioning, extinction involves no longer providing reinforcement when the operant response occurs. - In both cases, conditioned behaviour will decline until it disappears. It therefore appears be the reverse of acquisition...but its not! - Association is never “erased” or lost, however...a CR will reappear after extinction if there a delay. This is known as spontaneous recovery. - If a novel stimulus is introduced during extinction, animal will be distracted and start responding to new stimulus. This is disinhibition. ○ Ex: Dog with salivation CR to bell CS is given extinction training. Bell then presented with without food, and salivation declines. When bell is presented with new light stimulus, salivation recovers. - Extinction trials are context specific—if extinction trials are conducted in a new context, response declines but will re- emerge when CS is presented in original context. This is Response Renewal. - Renewal typically occurs as follows: ○ Acquisition training conducted in presence of contextual cue A ○ Participant then moved to context B for extinction training ○ When participants are moved back to context A, conditioned responding returns. s of nd a y be e to e is d ○ Participant then moved to context B for extinction training ○ When participants are moved back to context A, conditioned responding returns.