Learning Psychology (PSYC-2502) Lecture Notes PDF

Learning Psychology PSYC-2502 Dr. Emel Erdogdu [email protected] [email protected] 1 - 06 - Operant Conditioning (İnstrumental Conditioning) Classical conditioning reflects how organisms adjust to events in their environment that they do not directly control →Association of events → CS followed by US Operant Conditioning is about..... →.... learning situations in which the stimuli (outcome) an organism encounters are a result or consequence of its behavior. →Such behavior is commonly referred to as goal- directed or instrumental because responding is necessary to produce a desired environmental outcome. →Behavior produces outcome ! →behavior occurs because similar actions produced the same type of outcome in the past Indeed, most behaviors that concern us each day are motivated by some consequence. Behavior that occurs because it was effective in producing certain consequences is called instrumental behavior. 1.Operant behaviors are influenced by their...................... 2.Elicited behavior is a function of what (precedes/follows)..................it; operant behavior is a function of what (precedes/follows) it. Early Investigations of Instrumental Conditioning E. L. Thorndike Original intent: study animals intelligence (based on Darwins Evolutionary Theory) Puzzle Boxes: Hungry young cat in box Food in plane view How to get out of the box? Pull a ring or press lever Check latency! (sec) The cat gets better with each trial (faster) from 160 to 6 sec! →various behaviors in the beginning → Behavior that is successful is increased, useless behavior decreased Box (s) and effective escape response (R) Thorndike interpreted the results of his studies as reflecting the learning of a new S–R association. Law Of Effect The consequence of a successful escape response strengthened the association between the box stimuli and that response. What was the consequence? → Food If a response R in the presence of a stimulus S is followed by a satisfying event, the association between the stimulus S and the response R becomes strengthened. If the response is followed by an annoying event, the S– R association is weakened A key feature of Thorndike’s S–R mechanism is that it compels the organism to make response R whenever stimulus S occurs. This feature has made the law of effect an attractive mechanism to explain compulsive habits that are difficult to break, such as biting one’s nails, snacking, or smoking cigarettes Once you start on a bucket of popcorn while watching a movie, you cannot stop eating because the sight and smell of the popcorn (S) compels you to grab some more popcorn and eat it (R). The compulsive nature of eating popcorn is such that you continue to eat beyond the point of enjoying the taste. Once learned, habitual responses occur because they are triggered by an antecedent stimulus and not because they result in a desired consequence. Modern Approaches to the Study of Instrumental Conditioning A: Discrete-Trial Procedures B: Free-Operant Procedures A: Discrete-Trial Procedures Training-trial begins with Latency: time to leave Start Running speed: time from putting animal in Start to Goal apparatus and ends with Gets faster with trials removal of animal after instrumental response has been performed. Mazes runway, or straight-alley Start box → barrier removal → goal box (reinforcer such as food or water) B: Free-Operant Procedures Animals are not taken out after successfull trial Are allowed to repeat the instrumental response Allows to study behavior in a more continuous manner B.F. Skinner (1938) Lever can be pushed to deliver food Paw, tail,... Outcome is important! No specific movement required Magazine Training and Shaping How to get from novice to expert? 1. Learn when food is available Classical conditioning: repeatedly pair sound of food- delivering device (food magazine) and food pallets in cup Sound elicits approach response Preliminary phase of conditioning is called magazine training 2. Response shaping Food after certain responses Bring closer to desired response 1. reinforcement of successive approximations 2. withholding reinforcement for earlier response forms Shaping and new behavior A: İt is not really new behavior you want to teach İts more about the synthesis of already existing behaviors to perform a new task Press lever (combination behavior the rat does in other situations) B: New behavior gradual iterations of behavior that have not been done before Throw a ball (first 10 m, 15m..) Sports, dancing.... Differences Discrete-operant techniques Free-operant techniques Discrete observations Continuous observation of behavior over long periods Put in / take out provide a special opportunity to observe changes in the likelihood of behavior over time Latency Response Rate as a Running speed Measure of Operant Behavior Instrumental Conditioning Procedures money Appetetive stimuli pleasant event Money, car Aversive stimuli Unpleasant event Rain sound, yelling → Instrumental response may produce or prevent such stimuli Instrumental Conditioning Procedures Likelyhood of occurance of behavior is increased via Reinforcement and decreased via Punishment Reinforcement can be positive or negative Positive: reward – money for good grades will increase studying Negative: removal of aversive stimulus – increasing studying behaviour to avoid scolding Punishment: introduction of aversive behaviour to reduce unwanted behaviour Positive punishment: Behavior is followed by aversive stimulus (slap) Negative punishment: Behavior is followed by removal of appetitive stimulus (no TV) Omission-training procedures are also called differential reinforcement of other behavior (DRO). Negative punishment → remove appetetive stimulus DRO as Treatment for Self-Injurious Behavior and Other Behavior Problems Self-injurious behavior is a problematic habit that is evident in some individuals with developmental disabilities Head banging of subject was maintained by the attention she received Discourage → omission training procedure → ignore Attention is a very powerful reinforcer for human behavior! Fundamental Elements of Instrumental Conditioning Instrumental conditioning fundamentally involves three elements: 1. the instrumental response 2. the outcome of the response (the reinforcer) 3. the relation or contingency between the response and the outcome 1- The instrumental response: Behavioral Variability Versus Stereotypy if response variation is a requirement for reinforcement, novel response forms can be readily produced by instrumental conditioning variability Degree response variability along three dimensions of drawing a rectangle (size, shape, and location) Draw a for human participants who were triangle reinforced for varying the type of rectangles they drew (VARY) or received reinforcement on the same trials but without any requirement to vary the nature of their drawings (YOKED). Higher values of U indicate greater variability in responding Relevance or Belongingness in Instrumental Conditioning Natural behavior is more easily learned depending on the reinforcer Biting as operant behavior when male fish is reinforcer/ not to female → biting for territorial behavior Swimming through rings better learned when reinforcer is female fish compared to male fish The Misbehavior of Organisms,” by K. Breland and M Breland, 1961 It is difficult to condition raccoons with food reinforcement to drop a coin into a slot instinctive drift: As the term implies, the extra responses that developed in these food reinforcement situations were activities the animals instinctively perform when obtaining food 2 – The Instrumental Reinforcer Quantity and Quality of the Reinforcer İn straight alley runways, faster running with larger and more palatable reinforcers Also influences the rate of free-operant responding Chad 5 yrs Autist Press button for social attention press: 2,5,10,20,30,40 seconds to produce click Reward for click is social attention: hug, song, play etc Usage of different reinforecer durations: 10, 105, 120 s. longer reinforcer was much more effective in maintaining instrumental responding. Magnitude of the reinforcer is important too Individuals who are drug addicted can be treated successfully using principles of instrumental conditioning (Higgins, Heil, & Sigmon, 2012). Absence from drug use as verified by drug tests Reinforcement is provided in the form of vouchers that can be exchanged for money Higher than 10 dollar more successful! Shifts in Reinforcer Quality or Quantity Previous knowledge influences value of reinforcer Receiving 50 dollar in the beginning but later only 25 dollar? Very similar with Rescorla–Wagner model If the US is larger (or more intense) than expected, it will support excitatory conditioning. if it is smaller (or weaker) than expected, the US will support inhibitory conditioning. →Positive and negative behavioral contrast effects →Great reward better learned! →Less than before → low learning Negative behavioral contrast example Two groups rats Licking behavior (drinking time) Sugar water: 4% or 32 % Change 32 to 4 % after 10 days 3: The Response–Reinforcer Relation İs the action really important for the reinforcer to come? Work hard for sun to rise? → no Work hard for exam to get good grades → yes Call friend several times before he opens the phone Wear some lucky charm to get good grades? → probably no Efficient instrumental behavior requires sensitivity to the response–reinforcer relation. Temporal relation (time between resp-reinf) Special form: Temporal contiguity (reinf immediately after resp) Causal relation or response–reinforcer contingency to which the instrumental response is necessary and sufficient to produce the reinforcer Time between reinforcer and response is important Effects of delayed reinforcement on learning to press a response lever in laboratory rats Dickinson ve ark. (1992) Rats press lever for food Short delay (2-4 sec) Long delay (64 sec) Why is instrumental conditioning so sensitive to a delay of reinforcement? Free-operant → credit-assignment problem Secondary or conditioned reinforcer Conditioned reinforcer can fix the credit-assignment problem Pairing of reinforcer with another stimulus For example Clicker-training methodology Click – food clicker then can be delivered immediately after a desired response even if the primary food reinforcer is delayed Coaches: «very good, keep going» Marking procedure learning with delayed reinforcer Acquisition of lever pressing in rats with a 30- second delay of reinforcement. For the marking group, a light was presented for 5 seconds at the beginning of the delay interval, right after the instrumental response. For the blocking group, the light was introduced at the end of the delay interval, just before the delivery of food (based on Williams, 1999). Another way to explain the «learned helplessness» Triadic design two phases: an exposure phase and a conditioning phase E: has a button to terminate the shock during phase 1; Y: doesn’t learned-helplessness effect is that the impact of aversive stimulation during the exposure phase depends on whether or not the shock is escapable Animal exposed to inevitable shocks learned that no behavior can control shocks or help to escape They learn that reinforcers are independent from behavior and cannot be influenced Major differences of classical and operant (instrumental) conditioning ? Some questions to think 1. Difference of free-operant and discrete trial methods 2. What are the similarities and differences between positive and negative reinforcement? 3. How does the current status of a reinforcer depend on prior experience with that or other reinforcers? 4. What are the effects of a delay of reinforcement on instrumental learning, and what causes these effects?

Learning Psychology (PSYC-2502) Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue