PSYCH110 Exam 2 Notes PDF
Document Details

Uploaded by CoolestProtactinium7269
UCLA
Tags
Summary
These notes from PSYCH110 cover the factors influencing Pavlovian learning, including CS-US intervals and stimulus salience. They delve into how conditioning affects the nature of CRs, and explores theories like S-R and S-S* in the context of what is actually learned in this form of learning. Key concepts covered in these notes include contiguity and the Rescorla-Wagner Model.
Full Transcript
WEEK 4 Lecture 6: Pavlovian Conditioning – Factors Controlling Overview - Factors controlling the efficacy of Pavlovian learning: - CS-US interval - Stimulus salience - Latent inhibition/CS pre-exposure - US pre-exposure - CS-US belongingness...
WEEK 4 Lecture 6: Pavlovian Conditioning – Factors Controlling Overview - Factors controlling the efficacy of Pavlovian learning: - CS-US interval - Stimulus salience - Latent inhibition/CS pre-exposure - US pre-exposure - CS-US belongingness - Factors controlling the nature of the CR: - CS-US interval - US form - CS form Factors controlling efficacy of Pavlovian learning CS-US Interval - Common Conditioning Procedures - Delay and trace conditioning activate different brain regions - Hippocampus is crucial for trace conditioning – but not for delay conditioning - Relationship between CS-US interval (time between CS onset and US onset) and strength of learning - 5 different conditional responses above - Y axis = strength of response - X axis = CS-US interval - Short CS-US or very long– learning is not very good – there is some sweet spot in the middle - Ideal CS-US interval depends on specific learning being measured (shown in different x-axis scales for each different response) – this is functional not random - Eyeblink conditioning has very short CS-US intervals - Flavor aversion has very long CS-US intervals CS-US: ITI Ratio - Shorter CS-US interval compared to the ITI is ideal for learning Summary - CS-US Interval - Inverted U curve for optimal CR acquisition - Preparation-specific timing curves - CS-US: ITI interval – CS-US time relative to ITI matters - Neural circuits Stimulus Salience - US Magnitude -.28 MA – they don’t care -.49 MA – animals care but less scared over time - Closer to 1 MA – we see complete suppression - Stronger shocks = more fear = more suppression and takes longer to get rid of fear - US Biological Significance - More hungry they are, the quicker they learn that the tone is predictive of food and do that response - CS Intensity - Louder the tone, the faster they learn to fear it - CS Pre-exposure/ Latent Inhibition - Pre-exposure to CS makes it harder to learn about - Animals with pre-exposure to tone learn more slowly than when the tone is novel – the more novel something is, the more you will pay attention to it - This is latent inhibition CS-US Belongingness - A US can condition some CSs better than others – Some CSs are easier to pair with some USs, than others – this is belongingness - CS: taste, light, sound - Half get shocked when they lick spout (which also set off a light and a tone), half get sickness - Sickness paired with cue – they developed taste aversion to sweet water – but did care about light or noise - If US was shock, there was no taste aversion – but suppression during light and noise - This is a functional relationship due to our different behavior systems - Illness = flavors are good cues, light/sounds are bad cues - Pain = lights/sounds are good cues, flavors are bad cues Factors Controlling CR - Not stimulus substitution – CS enables prep for the US by eliciting multi-faceted conditional responses CS-US Interval - If CS and US are paired together (simultaneous) – developed response to shock will be different than if there’s a trace interval - Simultaneous conditioning does not create a fear response - Unpaired – they also do not freeze much - Forward – they freeze often - This does not indicate that simulataneous and unpaired do not indicate learning - Simultaneous conditioning animals are, instead, running around (flight) - Conditioned Fear - Modeling cues that predict predators and contact with predators - CS-US interval is 1 minute – measuring CS approach - Approach CS a lot – spend 80% of minute near the CS - CS-US interval is 20 minutes – measuring general locomotive behavior - They pace back and forth - Different response is appropriate for different CS-US interval – and they are functional US Form - Nature of peak depends on nature of US - Closed vs. open beak depending on US as food or water CS Form - Paired cues with food – but the CS was another rat - Social affiliative behavior – they start to play with the rat that predicts food - They do social behaviors when the rat is paired with food (CS+ group) - They do not do social behaviors when the rat is not paired with food (CS- group) In all cases… - Preparation for the anticipated US - CR relevance to US - But also CR relevance to CS - Absence of evidence is not evidence of absence!! - Just because you don’t see a CR does not mean learning did not occur, might have missed the most appropriate CR WEEK 5 Lecture 7: Pavlovian Conditioning – Processes Mechanisms of Pavlovian Conditioning - What is learned in Pavlovian conditioning? - S-R theory and evidence - S-S*/S-O Theory and evidence - So far, we have described situations and procedures that result in Pavlovian conditioning… - But what is actually being learned? - CS-US association is externally imposed…but that need not be what one actually learns! - What association is actually encoded in the brain? Ie. What are the psychological processes that support Pavlovian learning? What is learned during Pavlovian conditioning? - Nodes in the brain that might respond to our represent stimuli and responses - Different node patterns - US → UR; CS→ US; CS→CR - Which association is important for guiding behavior? Which one changes as a result of learning? S-R (Stimulus-Response) Theory - Based on Pavlov’s findings and work on reflex arcs - CS → US → UR - Perform UR after US - Eventually leads to CS → CR (CR happens in absence of US) - Stimulus (S) and response (R) neurons in the brain - Them happening at same time — connection between the two get stronger - Innate reflexes and everything else is a learned reflex of a stimulus-response stimulus where a previously weak association gets stronger - CS activates the motor program originally generated by the US - CS becomes linked to the response generated during conditioning - The US strengthens (reinforces) the S-R association - The US is not encoded in the learned association - Behaviorism – focus on observable variables - Simplicity - CR often matches UR - But this is not always the case… CR doesn’t always match UR - Eg. fear conditioning - If S-R was correct, CS→ jumping BUT CS → freezing - Eg. drug conditioning - If S-R was correct, CS → analgesia (feel less pain) BUT CS → hyperalgesia (feel more pain) - This is a compensatory response - Eg. higher order conditioning - CR to CS1 does not always match CR to CS2 - How can S-R association form if the R wasn’t present to be linked to the S during learning? - Higher order conditioning - CR to 2nd order CS do not always recapitulate the CR to the 1st order CS - If S-R was correct, LightY → goal tracking BUT LightY → sign tracking - S-R theories: - Light → Food because food → pecking so you learn that light → pecking - LightX → Light because Light → pecking so you learn that LightX → pecking - But no pecking to Tone when paired with food – this confuses SR theorists Sensory Pre-Conditioning - CS2 → CS1 - CS1 → Illness US - Phase 1: Take two CSs and pair them together a few times - Phase 2: Pair one CS with illness - Observe first-order conditioning to CS1 (taste aversion) - Previously done this pairing, sometimes you will get something that S-R theorists do not predict - S-R theorists would expect no taste aversion for CS2 - But we do see taste aversion for CS2 – in the same way we see taste aversion for CS1 - CS2 elicits CR without ever having the chance to be linked to CR – problem for S-R theorists - Sensory Pre-conditioning: CS2 → CS1 and CS1 → US - Second order conditioning: CS1 → US and CS2 → CS1 - These 2 are similar but in different orders - Another example of sensory pre-conditioning: - If Snoop Dogg does something you like (like support UCLA), you may like Martha Stewart by association — since you originally associate the two together S-S* (S-O) Theory - Stimulus-outcome theory - CS → US → UR - CS → S* → CR - When they see the CS, they think about the US – this causes appropriate preparatory response - CS gets linked to US node – causes UR/CR - CS becomes linked with a representation of the US (S*) - The CS can then trigger this US memory (S*) - The S* elicits the appropriate CR (flexible based on circumstances) - The US is encoded in the learned association S-R vs. S-S* Theory - First teach them to like the US (cheese) but eventually devalue the US – instead make them disgusted by the cheese - SR theorists would expect them to salivate to bell no matter what - S-S* theorists would expect them to think about the cheese in response to bell but they would not salivate to the tone (since they don’t like it) - Expect them to respond less to things they don’t like - This is what we see (see graph below) - Animals don’t goal track to the tone anymore if the US has been devalued for them - Support for the S-S* theory - Another study: - Two flavors paired with two USs (sugar and salt) - Cue predicting sugar – animals learn to like this cue - Trained animals under homeostasis or while sodium depleted - When animals depleted of sodium, they liked cue predicting salt water – even though they don’t like this cue when in homeostasis - When salt is more valuable they react to CS predicting salt as good - Changing the value of US after learning affects responding S-S* Theory - Evidence: - Post-training manipulations to the value of the US modify the CR - Learning occurs without ever any opportunity for the CS to be linked to the new CR - At least this is the primary learned association… Lecture 8: Pavlovian Conditioning– Mechanisms - How is CS-US association learned? - Contiguity? - Discrepancy? - Rescorla-Wagner Model Contiguity - Defined: whenever a CS is closely followed by a US (in time) then the connection between them is strengthened - This is dependent on specific type of learning we are looking at - Eg. taste aversion – you can have separation between CS and US (up to 12 hours) and they still learn - Seems like contiguity is not necessary for learning - Sufficient vs. necessary - Gas is sufficient for hybrid car but not necessary - Kamin (1968) – Blocking - Interested in the light – will the animals learn about the light? - Control groups - Group C – very afraid of light - Group O – afraid of light, but not as much as C - Suppression CR - Group B – not afraid of light at all (no suppression CR) - Knowing the noise predicted shock BLOCKED future learning in Phase 2 - SO CONTIGUITY IS NOT SUFFICIENT Discrepancy - Defined: whenever a CS is followed by an unexpected event the connection is strengthened (ie. Learning comes from discrepancy b/w obtained and expected) - You can only learn that that you are surprised to or attending to! - Blocking - Animals previously had tone predictive of food – don’t learn anything about light - Animals that had noise predictive of food – they learn about the light (since it’s new) - Since they did not previously associate tone with food - Another example - Early experience (Phase 1) - Pickles → illness - Later experience (Phase 2) - Pickles + jalapeno → illness - Attach illness to pickles – you don’t also attach the illness to jalapeno - Blocking procedure: - Phase 1: A→ UX - Phase 2: AX → UX - At test, CR to A are high, but CR to X are low - Discrepancy is necessary for learning - Magnitude unblocking - Do something in Phase 2 to add a discrepancy – unblocking effect - Unblocking effect: Changing the magnitude or identity of the US unblocks learning when you would otherwise expect the blocking effect - Can do this by changing the magnitude that is expected - Identity unblocking - Can also add discrepancy by changing the identity that is expected Surprise is important! Learning = discrepancy b/w obtained (how much US happened) & expected (how much US you expected) - Change in associative strength: - US surprise, Prediction Error - (Observed - Expected) - Stimulus salience Rescorla-Wagner Model - Model is agnostic about what association is formed – doesn’t choose S–R or S–S* - But regardless, it will follow these rules - Important for: - Machine learning - Reinforcement learning - Computational Psychiatry - Neural correlates of learning - CS-US acquisition - First trial: Expected 0 US but got 1 – they will learn (alpha beta) x 1 - Every successive trial that you predict the US more and more, you learn a little less – eventually learning is complete - Learning will continue until US is accurately predicted - In what point in learning is surprise the highest – the 1st trial! - Prediction error neural correlates - Action potentials across different situations - They are firing to reward – dopamine neurons - But only fire when they are not predicting (expecting) the reward - Or fire when they are expecting reward but don’t get it - More surprise/discrepancy = more dopamine - CS Salience - More salient something is = the more you learn about it - CS salience represented by alpha – maybe change in learning can be explained by this change in the alpha - US Salience - Can do the same thing for US – but with beta instead of alpha - Blocking explanation - Group Tone will not learn – since they are expecting the tone - Group Noise will learn about tone and light – since they are not expecting the tone - Over expectation - 2 grandmas – both good predictors of cookies - So you will expect 2 cookies if you are with both grandmas - Discrepancy if you only get 1 cookie - Expect 2 food pellets (since they got 1 for light and 1 for tone in Phase 1) - But only get 1 food pellet in Phase 2 – leading to score of -1 - Eventually learn to expect this less – correct initial discrepancy - Negative prediction error – reduces expectation overtime - Conditioned Inhibition - Excitatory learning (Trial A) + Inhibitory learning (Trial B) - CS+ alone = predict something — positive excitatory value - CS + & CS - = predict nothing (excitatory and inhibitory cancel each other out) - CS - alone = predict something – negative predictive value Rescorla-Wagner Model – Problems - Does not predict all well-known Pavlovian learning phenomena - Eg. can’t explain latent inhibition and other attention-based phenomena, can’t explain belongingness, makes predictions about extinction learning that aren’t correct (we will revisit extinction) - Pearce-Hall & Mackintosh Models - Updates to RW model that incorporate attention and CS associability - Does not explain CS-US timing and timing effects - Temporal Coding Hypothesis (Balsam, Gallistel) - Updated/related models which resolve around information processing and predictions of timing Lecture 9: Instrumental Conditioning – Introduction Pavlovian vs. Instrumental - Bekhterev (1913) - Same time as Pavlov, he tried to study the learning of new reflexes in dogs - Transfer locomotor reflex from a shock to a tone - Training: - Tone (CS) → Shock to paw (US) - Result: - CS → Active Leg Flexion (started lifting paw to the tone) - He thought this was similar to what Pavlov was reporting. It is now understood as avoidance learning. - Miller & Konorski (1928, see 1969) - Training: - CSa: Tone + CSb: Passive Leg Flexion → Food (US) - Tone + passively lifting dogs paw → then give food - Result: - CSa → Active Leg Flexion - Dogs picked up paw in response to tone - Type II conditioning - Skinner (1932) - Training: - Lever press → Food - Result: - Rats will learn to press a lever to earn food - Type II conditioning Associative Learning - Pavlovian Conditioning - The environment predicts events and one can learn these associations to predict events in a given situation and respond accordingly - The environment (and cues therein) control behavior - Describes how one adjusts to events in their environment that cannot be directly controlled - Environment and existing cues control behavior - Instrumental Conditioning - Learn between actions and their consequences – how voluntary actions occur - The individuals actions predict events - The individuals behavior controls the environment - Describes how one controls their environment to maximize rewarding and avoid aversive events Instrumental/Operant Conditioning - Definitions - Those actions whose acquisition and maintenance depend on their consequences; the action/behavior is instrumental in causing some outcome (Dickinson 1994) - The likelihood that a behavior changes as a result of its consequences (Colwill & Rescorla 1986) - The capability to learn to modify behavior and control responses instrumental to gaining access to source of benefit and avoiding events that can maim or kill (Balleine 2011) - Prediction & Control - Pervasive, fundamental component of how we interact with out environment and the objects and individuals in it - Major component of our decision making - Examples: - Studying improves test performance - Working at a job earns money - Throwing a switch activates a light - Learning skills and procedures - Development of behavior Pavlovian CRs do not equal instrumental - Hershberger (1986) – through the Looking Glass study - In a long, straight runaway track, an experiment was set up as follows: - A chick was placed in the middle of the runway, near one end, with a food bowl set in the middle of the runway, near the other end - If the chick moved towards the food, the food moved away at twice the rate - If the chick moved away from the food, the food moved closer to the chick at twice the rate – they could eat the food - Put approach – Pavlovian response at odds with the behavior they want to study - Chicks persistently chased the food cup away – they could not learn to do this - Not smart enough to override Pavlovian reflexive response - Similar effects have been shown with pigeons in sign-tracking experiments - Williams and Williams (1969) – negative auto-maintenance - Keylight CS → Food US - Pecks to the CS prevent/omit food US - Often pigeons will be unable to prevent pecking at the light, and will not receive the food - Hard for pigeons to override pavlovian responses if they are really hungry or the CS is very salient - But it’s possible to instrumentally learn how to inhibit pavlovian responses – by making pigeon not hungry – for example Early investigations of Instrumental conditioning - Thorndike: Law of Effect - Behavior changes because of its consequences - Cat got better at escaping over time - “Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recur, they will be more likely to recur those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.” (Thorndike) - Trial and error (not insight) - Associative – associate the environment (S) with a behavior (R) - Bidirectional – good things strengthen S-R associations; bad things weaken them - Sensitive to outcome magnitude - Methodically studied acquisition of new behaviors - New responses or skills acquired by trial and error - Law of Effect describes how the consequences of a behavior can increase or decrease the future possibility of that behavior - Learning of an S-R association - - Puzzle Box Example - Skinner: Instrumental Conditioning & Shaping - Reinforcement of successive approximations – start small - Identify starting point of behavior - Identify terminal goal - Divide the difference into shaping steps - Magazine training – reinforce successively close behaviors to train animal to eat food out of the magazine (food cup) - Skinner: Terms and Procedures of Instrumental Learning - Two factors of instrumental conditioning: - Response-outcome contingency - Positive (response causes outcome) - Negative (response prevents outcome) - Valence of outcome - Appetitive (food, sex, desirable outcomes) - Aversive (pain, shock, undesirable outcomes) - Positive response-outcome contingency - Positive reinforcement → addition of pleasant event - Increases the instrumental response - Eg. kids doing chores to get a toy, saying thank you so people will smile at you - Eg. train pigeon to peck a disk to receive food - Positive punishment → addition of unpleasant event - Decreases the instrumental response - Eg. getting scolded for bad behavior, extra chores for bad behavior - Eg. shock to foot if rat presses lever - Negative response-outcome contingency - Negative reinforcement → removal of unpleasant event - Increases the instrumental response - Eg. brush teeth to avoid cavities, putting on seat belt to turn off seatbelt alarm - Eg. train a rat to lever press to turn off a foot shock - Negative punishment → removal of pleasant event - Decreases the instrumental response - Eg. go to prison for bad behavior, grounding, time out - Eg. stop delivering free food if pigeon pecks disk - Skinner: Operant Conditioning (another way to think about it) - Reward/Reinforcer: increases performance of an instrumental action - Positive reinforcement → addition of pleasant event - Negative reinforcement → removal of unpleasant event - Punishment: reduces performance of an instrumental action - Positive punishment → addition of unpleasant event - Negative punishment → omission of pleasant event - We see this phenomenon in our use of social media and apps Reinforcement - Thorndike: any event that strengths a S-R bond - Skinner: any event following a beahvior that increases the probability of that behavior happening again - Tend to be biologically significant events: food, social, water, warmth, shelter, information, etc. - Removal of an aversive event - Appetitive CS+ - Other preferred behaviors - Conditioned reinforcement – eg. money, A+, clicker training for dogs - Appetitive CS can serve to reinforcer behavior - Action → appetitive CS+ - Study - Tone becomes a reinforcer because of Pavlovian conditioning WEEK 6 Lecture 10: Instrumental Conditioning – Procedures and Schedules Instrumental Conditioning Procedures - Discrete Trial - Discrete problem or choice - Perform instrumental response only once - Usually determined by experimenter, based on presentation of stimulus or context - Eg. ordering Postmates – you have one chance - Free Operant - Subject free to explore environment and interact with it at will - Naturally occurring on-going activity - Eg. going to a casino – you can play poker, blackjack, slots, etc. Discrete Trial - Thorndike’s Puzzle Box - Put cat in and measure how long it takes them to get out - Latency to escape - Maze Runaway - Put rat at start and measure how long it takes to get to the end - Speed/velocity/latency - Maze-T-Maze - Measure choice behavior Free Operant - Skinner Box - Skinner developed the free-operant procedure and the Skinner box (aka operant chamber) - Automated way to measure behavior in an animal - Skinner/conditioning boxes can be used to program all sorts of instrumental contingencies, including discrete trial - Eg. food, shocks, reinforcement, punishment, etc – you can do anything - Free-operant lever press training - Magazine Training - First must train subject to approach and eat from food magazine (also called a food cup or hopper) - Non-contingent reinforcer delivery - Pavlovian association between chamber/context and reinforcer - Shaping - Reinforcement of successive approximations - Build up behavior in small increments to eventually get to more complicated behavior - Identify starting point of behavior - Identify terminal goal - Divide the difference into shaping steps - We see shaping in many everyday behaviors – eg. learning language, learning humor, etc. - Pen moves according to animal behavior – draws out a curve that shows animal learning over time - More modern example – graphs made digitally Schedules of Reinforcement - How and when a behavioral consequence (ie. reinforcer) will be presented during free-operant instrumental conditioning - Schedules are defined by reinforcement rules – what behavior, and when, is necessary to get a reinforcer? - Schedules determine rate and pattern of behavior – consistent across different types of reinforcers and different species Continuous vs. Partial - Continuous reinforcement example: - Unlocking car, turning on/off a light, smoking a cigarette Ratio vs. Interval - Types of Reinforcement Schedules - Ratio vs. Interval: - Ratio schedule is response based. Reinforcement delivered following a set number of responses. - Interval schedule is time based. Reinforcement is delivered to the first response after a certain amount of time has passed - Fixed vs. Variable: - Fixed schedule: reinforcer delivered the same ratio or interval every time - Variable schedule: reinforcer delivered after a variable ratio or interval Schedules of Reinforcement - Fixed Ratio (FR) Schedule of Reinforcement - Reinforcers are delivered after a set number of responses are made - Continuous reinforcement (CRF) is an FR-1 – every response earns a reinforcer - Example above is an FR-3 – every third response earns a reinforcer - FR schedules produce a run of responses, followed by reinforcement, then a pause in responding - Pause-run pattern - Higher ratios produce faster response rates and longer pauses - Reinforcers are delivered after a set number of responses are made - Eg. froyo stamp card - Variable Ratio (VR) Schedule of reinforcement - Reinforcers are delivered after a variable number of responses are made – the number required varies from trail to trail - Example above is a VR-3 – on average, every third response earns a reinforcer - VR schedules produce high, steady rate of responses - No significant post-reinforcer pause - Eg. will I make the shot? Will I get a lot of likes? Will I win this slot machine? - Fixed Interval (FI) Schedule of Reinforcement - A response is reinforced after a set amount of time has elapsed - Example above is a F1-30s – the first response to occur at least 30s after the last reinforcer will earn another reinforcer - F1 schedules produce a gradually increasing response rate throughout the interval - Scalloped pattern - Eg. checking oven to see if cookies are ready - Variable Interval (VI) Schedule of reinforcement - A response is reinforced after a variable (thus unpredictable) amount of time has elapsed - Example above is a V1-30s – the first response to occur on average 30s after the last reinforcer will earn another reinforcer - VI schedules produce a steady, low rate of responding - No post-reinforcer pause or scallop - Eg. waiting for cable guy – he’ll be there from 8-5pm, waiting on hold Summary - Steady response rates for VR and VI - Post-reinforcer pauses for FR and FI - High response rate for VR and FR - Lower response rates for FI and VI - Interval Schedules - There is a reinforcer rate limit on interval schedules - Reinforcement of ‘waiting’/inter-response interval - Interval schedules reinforce waiting between responses - Ratio schedules reinforce busts of responses Interval vs. Ratio Schedule in Humans - Much faster rate for VR than VI Fixed Interval Schedule: Human Infants Lecture 11: Instrumental Conditioning – Choice & Determining Factors Determining Factors in Instrumental Conditioning - Factors that control instrumental learning and the performance of instrumental action - Response-reinforcer contiguity and contingency - Both important for learning – but neither are necessary nor sufficient - Response-reinforcer relevance - Reinforcer magnitude - Reinforcer contrast/shifts Instrumental Conditioning Factors - Response-Reinforcer continguity - Contiguity: how close in time the reinforcer follows the response - Temporal relation - Delay of reinforcement decreases response-reinforcer continguity, which results in poorer instrumental responding - Contingency: to what degree delivery of the reinforcer depends of the occurrence of the instrumental response - Causal relation The Credit Assignment Problem - Responses continue in an ongoing stream - - If R1 causes delivery of a reinforcer, R1 should be reinforced - If the reinforcer is not delivered until some delay, such as just after R6; then R6 may be reinforced (due to contiguity), rather than R1 - But R6 does not cause reinforcer delivery (ie. it is not noncontingent), and thus instrumental learning is hampered - How can impairments by delay of reinforcement be overcome? - One method is the Marking Procedure: mark each instance of a choice response (eg. with an auditory beep, brief flash of a light, etc.) - Eg. clicking training for dogs - Helps them pick up on contingency that exists even if contiguity is not there Marking Procedure - Important to pair cue with behavior – helps them pay attention to the behavior – makes learning better - Why does it work? - Increases salience (surprise?) of a response - Allows subject to make the response-reinforcer connection over longer delays - Contingency is important in instrumental learning, but it can be difficult to assign causal relationships with delays in reinforcement Contingency in learning: Contingency Degradation - Animals quickly sensitive to change in contingency – contingency is important - If you start giving free reinforcers, animals will stop lever pressing Instrumental Conditioning Factors – Response-Reinforcer Relevance - Gave food whenever they did a behavior to try to reinforce it - Learn to scrabble, dig, and open rear to get food - But won’t learn to groom or pee to get food - This is response-reinforcer relevance – some behaviors have different goals/relevance than other Misbehavior - Breland & Breland (1961) attempted to condition response chains in various animals for entertainment in zoos - Some responses, however, would change from the reinforced type to an unwanted form - Ex. Raccoon reinforced with food to drop coins in a piggy bank. Eventually, the raccoon would dunk the coin repeatedly in a container but not let go - Raccoon started rubbing coins together and wouldn’t stop – despite no reinforcement - Can't let go of food predictive cues (it’s instinctive) – racoons rub food together in the wild and dunk in lake - Instinctive drift: behaviors would “drift” into a form that was more natural for the response-reinforcer relationship Instrumental Conditioning Factors – Reinforcer Magnitude - If you reward an animal with more food, it will work harder - Animals get faster overtime – how fast they run depends on how many food pellets they get - Eg. you will choose to get 2 cookies over 1 – if all else is the same Instrumental Conditioning Factors – Reinforcer Contrast/Shifts - 16 pellets for both phases – they don’t change behavior - Got 1 or 4 and now 16 – they treat 16 like it’s a lot better than those that got 16 for both phases - Positive contrast - Got 16 and then 1 or 4 – they treat 1 or 4 worse than those who always got 1 or 4 - Negative contrast - Respond based on how magnitude of current reinforcer compares to past reinforcer Instrumental Choice Behavior Concurrent Schedule Choice - Matching Law - B = Behavior (# of response) - r = reinforcement Rate (max # of reinforcers per time) - Animals will perform behaviors at relative rates which match the relative rates of potential reinforcement of those behaviors - In other words: - Animals will perform behaviors more if they are more rewarding - Animals will perform behaviors less if there are other sources of reward - If lever is twice as reinforcing, we expect twice as much behavior (lever presses) - Everyday examples: - Social conversation - More likely to tell more jokes if you get reinforced (laughs) - Partner selection - Substance misuse - Athletic choices - Match behavior rate to reinforcement rate – Steph Curry should shoot 3s Instrumental Choice Behavior – Reward Value - Response choice determined by relative rate and value of earned rewards - Balancing value and risk - Value influenced by: - Reinforcer magnitude - Current biological significance - Value is weighed against cost - Cost is influenced by: - Effort (physical or cognitive) - Delay - Risk (punishment, loss) Instrumental Choice Behavior – Delay Discounting - Delay discounting decline in the value of a reward with delay to its receipt - Eg. $10 right now is more valuable to you than $10 in 2 years - You perceive reinforcers as more valuable to you if they are closer in time - - Left graph shows $10 after some delay or this value (shown in the graph) right now - Value of $10 declines as it gets further away in time - We see the same thing in right graph - Rats would rather take 1 food pellet immediately than wait 60 seconds for 4 food pellets - Ainslie (1975) & Rachin (1974) describe how delay discounting can explain self-contol and impulsiveness - The choice between a small immediate reward vs. a large delayed reward depends on which is more valuable to you at a particular time - Eg. partying (immediate rewards) vs. studying (delayed rewards) - Spontaneous decisions are more likely to be biased by immediate rewards - So make decisions ahead of time to make more informed, delayed reward decisions Instrumental Choice Behavior – Risk - Risk in order to achieve food from lever-pressing, rats need to cross a grid which may give a footshock - Risk of danger decreased food motivated begavuor – took a few big meals instead of many snacks - Risk goes up – animals are less likely to make that decision WEEK 7 Lecture 12: Actions & Habits Overview: - S-R Theory of instrumental action - R-O theory of instrumental action - Contingency criterion - Goal value criterion - Goal-directed instrumental action - Habits Content of instrumental learning - What is learned during instrumental conditioning? - What association(s) is/are formed in the brain? - What guides and produces instrumental behavior? - Thorndike: Law of Effect - Methodically studied acquisition of new behaviors - New responses or skills acquired by trail and error - Law of Effect describes how the consequences of a behavior can increase or decrease the future probability of that behavior - This is an S-R theory - Learning of an S-R association What is learned during Instrumental Conditioning: S-R Theory - Stimulus-Response (S-R) Theory - Learn an association between stimuli present (S) and the response (R) - Purpose of reinforcer is to reinforce this S-R bond - After learning, the stimuli present (S) comes to automatically elicit the response (R) - Because of strengthened S-R bond – not due to thinking about the reinforcer - Information about the reinforcer/outcome is NOT encoded in the learned association - This provides evidence against S-R theory What is learned during Instrumental Conditioning: R-O Theory - Response-Outcome (R-O) Theory - Goal-directed action – perform a response to get the reinforcer (they think about reinforcer) - Learn an association between responses (R) and their outcomes (O) - After learning, one considers the outcome when deciding whether to perform a response (cognitive) - If you want it – perform action to get it - Information about the reinforcer/outcome IS encoded in the learned association S-R vs. R-O Theory - Evaluate by changing value of US - S-R theory says it doesn't matter - R-O theory says it does Goal-directed Instrumental Action - Goal-directed instrumental actions require belief + desire - Have to show belief and desire to demonstrate goal-directed action - Belief: going to Starbucks to get coffee (response-outcome association) - Desire: Starbucks coffee is delicious (incentive value?) - Criterion for goal-directedness of actions - Contingency - Goal value Testing Contingency: contingency degradation - Lever press to get food – sometimes got free food - Lever press a lot if they are more likely to get food if they press - But if same likelihood to get free food – they will stop lever pressing - Animals are sensitive to contingency! Testing Desire: S-R vs. R-O - Learn to lever press for food – and then teach him to not like food (make them sick) - Woll they still like the food? - If it’s S-R, it shouldn’t matter - If it’s R-O, they should stop lever pressing Testing Desire: S-R vs. R-O: Devaluation (Illness) - Will press less for food compared to group that didn’t have devaluation Testing Desire: S-R vs. R-O: Devaluation (Satiety) - None are hungry – they will stil lever press if they got sucrose before but will not if they had chocolate - All animals showed devaluation effect – each line is a different rat - What happens if they don’t have belief + desire?? - You don’t always think about every choice you make Goal directed actions vs. habits - Goal-directed - Action-outcome, cognitive, reflective, prospective evaluation of action consequences, deliberative, flexible - Eg. first riding a bike – you think about balancing/everything you are doing - Habit - Automatic, reflexive, based on past experience, inflexible, resource efficient - Eg. once you know how to ride a bike you do not think about balancing anymore Goal-directed action vs. habitual responses - Goal-directed action - R → O - Cognitive, deliberative, prospective evaluation of action consequences, flexible - Stimulus-response habit - S → R - Automatic, based on past experience, inflexible, resource efficient Habits - 100 or 500 trials – then devalued sucrose or not - 100 – not responding if they got devalued, responding if they did not get devalued - 500 – resulted in habit that was not sensitive to devaluation - Behavior not guided by thinking about sucrose and its value - This behavior is guided by S-R theory - Evidence for S-R and S-O for instrumental learning - Only evidence for S-O for pavolvian conditioning - Habits: after extended training instrumental behavior becomes guided by a habitual S-R relationship rather than an R-O relationship - Insensitive to changes in O value - Instrumental response automatically triggered by antecedent events - Promotes efficient use of resources Actions & Habits - Balance between goal-directed actions and habits promotes behavior that is both adaptive but efficient - But balance can be disrupted to promote overreliance on habit - Disrupted as a result of chronic stress, addictive substances - But individual differences are important - Goal directed - Action-outcome, cognitive, reflective, prospective evaluation of action consequences, deliberative, flexible - Habit - Automatic, reflexive, based on past experience, inflexible, resource efficient - They both (R-O and S-R) work independently and all the time - R-O guides behavior early on - S-R guides behavior with over training Bella is goal directed Pumpkin is habitual