PSY2304 Biological Basis of Behaviour: Instrumental Learning PDF

PSY2304Biological Basis of BehaviourInstrumental LearningIan McLaren Two of the questions we want to be able to answer today…•Can Instrumental learning be distinguished from Pavlovian conditioning?•Is Instrumental learning a unitary phenomenon? Contents•Difference between Pavlovian conditioning and instrumental learning•Actions and Habits Instrumental learning: Can it be explained as a form of Pavlovian conditioning?•US = reward, e.g. food, freedom•UR = natural response, e.g. eating, approach•CS = starting condition, e.g. start box of maze, inside of puzzle box, sight of lever•CR = approach•So when the rat “learns to press”the lever it may simply find the lever attractive (stimulus substitution) and bump into it because of this. Is the apparent learning of the response simply an artifact brought about by Pavlovian conditioning? Stimulus Response ReinforcersOmission Schedule: Distinguishing between Pavlovian and instrumental conditioning. Omission:Unpaired: Grindley’s bidirectional control The fact that the animals will learn to turn their heads left or right when the buzzer has the same relationship with reward is evidence that this is not simple Pavlovian conditioning.Old responseNew responseSession A contemporary issue•Actions and Habits -is all instrumental learning the same? Is it all one thing?•We shall see that the answer is no.•In some circumstances the S->R account seems to be the correct one.•In others, there is clear evidence that the animal has some expectancy of an outcome and modifies it’s behaviour accordingly. Adams work (along with that of Tony Dickinson) provided some of the earliest evidence that animals had some representation of the outcome in instrumental learning. If the outcome is made aversive, they respond less in extinction. In fact, Adams was able to get both results. That is, he could show that they responded less in extinction to an outcome that had been made aversive, but only if they had not been overtrained(100), if they had (500) they continued to respond. The overtrained animals are exhibiting what Adams and Dickinson called a habit, something that an S->R account would expect, where the current outcome value has no impact on the probability of making a response in the presence of the discriminative stimulus. But what about the other group of animals? Instrumental Tr a i n i n g>Te s t¯<¯OutcomeDevaluationLiClThe evidence suggests some representation of the outcome is involved in determining their performance. An elegant experiment by Colwill and Rescorla confirms this suspicion. S1: R1-O1; R2-O2S1: R1 > R2O2-LiClS2: R1-O2; R2-O1S2: R1 < R2Colwill and Rescorla (1990).¯ As a consequence of results of this kind, Tony Dickinson suggested that there were two kinds of instrumental learning, Actions that required knowledge of the expected outcome, and Habits of the S->R kind. He then set about testing this idea by proposing what he now terms the “Castaways Dilemma”In this, someone who is castaway on a desert island is hungry but manages to find and eat coconuts. Then they become thirsty and there's no water available -what do they do?The answer is pretty obvious -they drink coconut milk -but would an animal have the ability to learn this? Tr a i n : W h e n Hungry?Te s t : W h e n ThirstyCastaway’s Dilemma -transferred to the laboratoryBoth outcomes are rewarding. Both actions are performedWhat will happen now the animal’s drive state has changed? Test is in extinction, so no further training. Irrelevant incentive effect: General activation1 2 35-min PeriodThe first time they tried it they got this...Actions per MinuteThey (Dawson and Dickinson) found no difference in performance of the two actions. Both actions were performed more than in a control group who’ d not been made thirsty, but they interpreted this as general activation of the available responses by thirst (which seems perfectly reasonable). There was no sign of any outcome specific activation of an action.But then they realised that they’ d missed something out, so they added it in and then...Sugar H2OPelletLp-> Pellet Cp-> Sugar H2O Lp-> 0or vsLp-> Sugar H2O Cp-> Pellet Cp-> 0Training: HungerTest: Thirst Irrelevant incentive effect: Dickinson and Watt (1997)Lp-> Pellet Cp-> Sugar H2O Lp-> 0or vsLp-> Sugar H2O Cp-> Pellet Cp-> 0Training: HungerTest: Thirst1 2 35-min PeriodActions per Minute0246Sugar H2OPelletThey can solve the Castaway’s dilemma! The animals now respond more for the sugar water under thirst. But onlyif you let the animal learn that one of the reinforcers (in this case sugar water) is valuable under the new drive state (thirst) before test (not shown in the design above). This was the new idea they incorporated in their revised design. Here’s the full experiment... Irrelevant incentive effect: Full designPretrainingHunger ThirstLp -> pellet Cp -> sugar H2O Lp -> 0or vsLp -> sugar H2O Lp -> pellet Cp -> 0Training: HungerTest: ThirstGrouppellet &sugar H2O -----------------pellet &sugar H2OThirstHunger1 2 3Actions per Minute0246sugar H2Opellet1 2 35-min PeriodThirstHunger Analysis•It appears that incentive learning is needed to support drive-related action on the basis of the available outcomes.•Tony Dickinson has argued for a model of instrumental performance that requires inference on the basis of these results. Thus the animal is postulated to reason that:•1. I’m thirsty•2. If I pull the chain I get sugar water•3. Sugar water is good when I’m thirsty•4. I’ll pull the chain then.•And for this inference to be possible -each step in the chain has to be available, so the animal must know that sugar water is valued under thirst. Conclusions I•Instrumental learning cannot be explained purely in terms of Pavlovian conditioning, (though there is evidence that both are often involved in the control of behaviour).•It appears that there are two forms of instrumental learning, one where the animal has knowledge of the consequences of its actions, and one where an S->R reflex supports habitual responding. Overtraining leads to the latter.•Why is this important? Consider addiction, and the role of reinforcement in maintaining drug seeking behaviour. The implication is that over time this could lead to habit formation causing the drug seeking behaviour to become independent of the value of the drug –an automatic response that is quite literally out of control. Conclusions II•Instrumental performance that is not habit (i.e. S->R) based may well differ from habits and Pavlovian conditioning in a number of important respects.•If the animal knows the consequences of its actions (i.e. what outcome it can expect), then it must also represent that outcome and it’s relationship to the action performed.•And it would appear that it can use this knowledge to make inferencesin combination with other knowledge. If the animal knows that an outcome is valuable under a certain drive state, and also knows that that outcome is produced by a given action (that has never been performed under that drive state), then it can combine this knowledge productively to give the appropriate response. This goes beyond simple association –and it’s our first encounter with something that does. ReadingAdditional readings:Books:Dickinson, A. (1980). Contemporary animal learning theory. CUP. Esp. pages 102-116.Some key experiments:Colwill, R.M., and Rescorla, R.A. (1990). Evidence for the hierarchical structure of instrumental learning. Animal Learning and Behavior, 18, 71 82.Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34, 77-98.Balleine, B. (1992). Instrumental performance following a shift in primary motivation depends on incentive learning. Journal of experimental psychology: Animal Behavior Processes, 18, 236-250.

PSY2304 Biological Basis of Behaviour: Instrumental Learning PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue