L&M Lecture 7 Operant Conditioning PDF
Document Details
Uploaded by EnthusiasticErbium3720
L&M
Tags
Summary
This document describes operant conditioning, a type of learning where behavior is influenced by consequences. It discusses concepts like reinforcement, punishment, and shaping. It includes examples and comparisons to classical conditioning.
Full Transcript
Operant Conditioning Learning what to do when ○ Even in absence of explicit instruction, we use feedback signals (reward and punishment) to learn how to behave ○ Operant conditioning: the process whereby organisms learn to act in order to obtain or avoid importan...
Operant Conditioning Learning what to do when ○ Even in absence of explicit instruction, we use feedback signals (reward and punishment) to learn how to behave ○ Operant conditioning: the process whereby organisms learn to act in order to obtain or avoid important consequences Also referred to as instrumental conditioning, behaviors are instrumental in producing an outcome Edward Thorndike ○ Discovered foundations of operant conditioning through measurement of trial and error learning Law of effect: actions that are rewarded become more likely, actions that are punished become less likely Thorndike’s law of effect ○ Cats in puzzle boxes initially explore by trying out many different behaviors ○ Actions that led to positive outcomes tended to be repeated in the future Operant conditioning ○ Learning is driven by feedback Reinforcement: the process of providing outcomes for a behavior that increase the probability of that behavior occurring again in the future Punishment: the process of providing outcomes for a behavior that decrease the probability of that behavior occurring again in the future ○ Discriminative stimulus (Sd): set of environmental cues (the situation the organism is in) ○ Response (R): actions taken in the presence of Sd ○ Outcome (O): Primary: stimuli that have intrinsic biological value, either positive (primary reinforcers) or negative (primary punishers) Secondary: stimuli that have no intrinsic value but which predict or provide access to primary outcomes (money) ○ Learning specific associations between stimulus, response, and outcomes Reinforcement of a response doesn’t increase rate of that behavior in any situation, only in presence of the right stimulus cues Expectation of a specific outcome: if outcome changes, the rate of response will change with it (switching to a less desirable food reward will lead to reduction in response) ○ How is it different from classical conditioning Classical conditioning: outcome (US) occurs regardless of whether agent performs US Operant conditioning: outcome depends on whether the agent performs the response Building complex behaviors ○ Shaping: an operant conditioning technique in which successive approximations to a desired response are reinforced First reinforce a simple behavior that is spontaneously produced by an agent When that behavior is consistent, start reinforcing a more complex version Continue until agent is performing a complex response (that they wouldn’t produce on their own) ○ Chaining: organisms are gradually trained to execute complicated sequences of discrete responses Decompose a complex behavior into action steps Start by reinforcing first action, then first two actions, etc. Types of operant conditioning ○ Positive reinforcement: add pleasant outcome to increase/maintain behavior Cleaning room leads to getting an allowance ○ Positive punishment: add aversive outcome to decrease behavior Teasing sibling leads to scolding ○ Negative reinforcement: remove aversive outcome to increase/maintain behavior Taking aspirin leads to headache going away ○ Negative punishment: remove pleasant outcome to decrease behavior Fighting with classmate leads to time-out from play Reinforcement vs. punishment ○ Although punishment can be effective in reducing an unwanted behavior, often has unwanted side effects and/or is difficult to implement effectively Leads to more variable behavior (switching to other undesirable actions) Can be undermined by concurrent reinforcement (attention, social feedback) More severe forms can lead to unhealthy behavioral patterns (decreased self esteem, modeling of aggressive behavior, resentment) ○ Different reinforcement of alternative behaviors (DRA): a method to decrease frequency of unwanted behaviors instead of reinforcing preferred alternative behaviors Desired alternative behavior should be incompatible with the unwanted behavior (DRI) Timing of reinforcement/punishment ○ When should desirable behavior be reinforced? ○ Immediate reinforcement leads to faster acquisition of response than delayed reinforcement Reinforcement schedules ○ Reinforcement schedule: a schedule determining how often reinforcement is delivered in an operant conditioning paradigm ○ Continuous reinforcement: every response is followed by a reinforcer ○ Partial reinforcement: some responses are not followed by reinforcer ○ Continuous reinforcement leads to faster acquisition then partial reinforcement, and faster extinction Behavior persists for longer period of time in absence of reinforcement Reinforcement schedules: by ratio ○ Fixed ratio schedule: a specific number of responses are required before reinforcer is delivered ○ Variable ratio schedule: a certain number of responses, on average, are required before reinforcer is delivered ○ Higher ratio leads to higher response rate Reinforcement schedules: by interval ○ Fixed interval schedule: the first response after a certain period of time is reinforced ○ Variable interval schedule: the first response after a certain period of time, on average, is reinforced ○ Shorter intervals lead to higher response rates Reinforcement schedule examples ○ Fixed ratio: being paid by the amount produced ○ Variable ratio: slot machine ○ Fixed interval: being paid by the hour ○ Variable interval: fishing Reinforcement schedules ○ Compared to continuous reinforcement, partial reinforcement leads to slower acquisition, but more resistance to extinction ○ Within partial reinforcement schedules, increased variability leads to the most consistent rate of response and slowest extinction From voluntary to habitual responding ○ Operant conditioning sometimes referred to as voluntary behavior (because the outcome depends on choosing to take an action ○ But for strongly conditioned behaviors, are we making an active, voluntary choice? ○ Habit merge from repeated reinforcement of behaviors in a given context ○ For well learned habits, execution of behavior is automatic: triggered directly by environmental cues without need for attention or deliberation People fall back on well learned habits even when they conflict with current intentions or goals (action slips) Habits unaffected by time, pressure, stress, fatigue Change in contect can disrupt seemingly well learned habits (eating habits at home vs during travel)