Chapter 14 Differential Reinforcement Procedures PDF
Document Details
Uploaded by ExceptionalCurl
The University of Kansas
Jeffrey H. Tiger and Gregory P. Hanley
Tags
Summary
This chapter describes differential-reinforcement procedures, a process in behavior analysis used to strengthen desirable behaviors and weaken undesirable ones. It explains how these procedures can be used to improve various aspects of behavior, focusing on different types of behaviors and their applications in educational and care settings. Methods like differential reinforcement of alternative behavior and shaping are discussed.
Full Transcript
# CHAPTER 14 ## Differential-Reinforcement Procedures ### Jeffrey H. Tiger and Gregory P. Hanley The term reinforcement describes the process in which a behavior strengthens when its occurrence is followed by some improvement in the environment. By strengthened, we mean that the behavior is more l...
# CHAPTER 14 ## Differential-Reinforcement Procedures ### Jeffrey H. Tiger and Gregory P. Hanley The term reinforcement describes the process in which a behavior strengthens when its occurrence is followed by some improvement in the environment. By strengthened, we mean that the behavior is more likely to occur in the future in similar environmental conditions. The process of reinforcement is fundamental to the way people interact with and learn from their environment. For instance, children repeat phrases that made their parents laugh; teenagers wear the same clothes that made their friends take notice; and adults swing a golf club with a particular form when doing so has produced long and accurate drives. We can understand much of early human learning by acknowledging the regular, natural, and often accidental reinforcement and punishment contingencies that infants experience (Bijou, 1996; Schlinger, 1995). For example, an infant girl may experience reinforcers for grasping her food only when she applies the appropriate amount of grip strength. Grasping too hard will squash the food or cause it to slip from her hands. Grasping too softly will not capture the food. Such gradual and natural reinforcement processes may at least partially account for learning to reach, grasp, and then chew, and other important behaviors such as babbling, standing, and walking. Although natural contingencies may account for much of human learning, alone they may change behavior in a slow and inefficient manner, particularly when reinforcers for engaging in important behavior are delayed or intermittent, or when a chain of behavior is necessary to produce reinforcement. Imagine trying to learn to drive a manual-transmission car based solely on the natural consequences of that behavior. Two distinguishing capacities of humans are the abilities to relay personal learning histories to other people through verbal behavior, such as speech and writing, and to arrange contingencies to develop and refine important behaviors in others. Thus we can increase the speed at which important behavior develops and eventually contacts natural reinforcement contingencies. In this regard, differential reinforcement is applicable as a procedural term to describe the act of increasing the occurrence of a desirable behavior in others by arranging for improvements to follow such behavior. By arranging for reinforcers to occur more often following one behavior than following another, differential reinforcement has two effects: It strengthens the target behavior and weakens other behavior that is functionally similar. Given this latter effect, investigators have used differential reinforcement to reduce problem behavior (see Vollmer, Athens, & Fernand, Chapter 19, this volume). By many accounts, differential-reinforcement procedures have revolutionized the educational and care practices for young children, especially children with intellectual developmental disorder and severe problem behaviors (Risley, 2005). However, the accelerative effects of differential reinforcement are also valuable for designing teaching and habilitative environments, and our chapter focuses primarily on the use of differential reinforcement to develop and refine new behavior and to maintain this behavior in many settings. Differential reinforcement as a procedure is deceptively simple: Identify a behavior you would like to occur more often, arrange reinforcers to follow the occurrence of the behavior or features of the behavior, and do not present these same reinforcers following occurrences of other behaviors. Socially important behavior change, however, is often not that simple. Behavior analysts have developed a comprehensive technology for increasing desirable behavior through differential reinforcement and have used this technology since the inception of the field in the early 1960s. We review those technological developments in this chapter. Specifically, we provide descriptions and examples of features of behavior that behavior analysts may strengthen through differential reinforcement and highlight considerations for analysts designing differential-reinforcement-based interventions. In addition, we highlight the diverse array of applications with differential reinforcement at their core. ## FEATURES OF BEHAVIOR TO TARGET WITH DIFFERENTIAL REINFORCEMENT In this section, we define features of behavior that are sensitive to differential reinforcement and provide illustrative examples of how differential reinforcement has modified these features. ### Topography Common uses of differential reinforcement involve reinforcement of appropriate behavior in lieu of problem behavior. We often refer to this procedure as differential reinforcement of alternative behavior (DRA). Pinkston, Reese, LeBlanc, and Baer (1973) provided an example of DRA for appropriate peer interactions in lieu of aggression. In baseline, teachers typically responded to instances of peer aggression with reprimands (e.g., "You can't do that here!") and responded infrequently to appropriate social interaction, resulting in relatively high rates of aggression. The investigators then taught the teachers to withhold attention following aggression and to provide attention when the children engaged in desirable peer interactions. This simple manipulation produced increased appropriate peer interactions and decreased occurrences of aggression. In **differential reinforcement of other behavior (DRO)**, by contrast, reinforcement is arranged for periods in which target behavior does not occur, and this may produce shifts from one topography to another. For instance, Protopopova, Kisten, and Wynne (2016) delivered food remotely to dogs that historically engaged in high rates of nuisance barking after periods in which no barking occurred. This DRO schedule eliminated nuisance barking for four of five dogs. DRO is not as precise as DRA for strengthening target behavior, and response topographies that the omission contingency did not target may emerge and be strengthened (Jessel, Borrero, & Becraft, 2015; Jessel & Ingvarsson, 2016). ### Rate Rate is the number of responses emitted in a certain period. Some responses must occur repeatedly in a period to be useful or functional (e.g., typing speed, answering math facts). Much differential-reinforcement research focuses on increasing the rate of various socially important behaviors. In a recent creative example, Stasolla et al. (2017) used automated differential reinforcement to increase the ambulation rate of two girls with multiple disabilities. When optic sensors detected a forward step, the automated device provided brief access to music, lights, or tactile vibration, and this arrangement produced large increases in ambulation rate. Furthermore, these girls showed higher indices of happiness when ambulation produced reinforcement than when the same reinforcers were available noncontingently. These findings are like those of studies in which children demonstrated a preference for differential over noncontingent reinforcement during concurrent-chain schedules (Hanley, Piazza, Fisher, Contrucci, & Maglieri, 1997; Luczynski & Hanley, 2009, 2010). When the base rate of a behavior is insufficient, **differential reinforcement of high-rate behavior (DRH)** can produce acceleration in the behavior's base rate. A DRH schedule arranges reinforcement delivery if the participant emits a minimum number of responses before the end of a specified interval. Ingham and Andrews (1973) used a procedure to treat stuttering that we can conceptualize as a DRH schedule. The investigators treated participants for stuttering with auditory feedback in which a tone sounded when the participant stuttered. This treatment produced stutter-free speech, but the speech was slow and unnatural, according to the investigators. Ingham and Andrews then delivered token reinforcement for progressively higher rates of spoken words. This DRH maintained stutter-free speech and increased the rate and naturalness of the spoken words. In other cases, certain behaviors are socially acceptable only when they occur at moderate to low rates. For instance, recruiting teacher attention is a common and desirable behavior of young children and is a common target for children who do not demonstrate this skill (e.g., Stokes, Fowler, & Baer, 1978). However, children who make frequent bids for attention can be disruptive to typical classroom environments. In **differential reinforcement of low-rate behavior (DRL)**, reinforcement is arranged when a behavior occurs below a certain threshold. Investigators have used DRL schedules to maintain moderate or low rates of behavior, frequently as an initial treatment for problem behavior. For instance, Austin and Bevan (2011) described what they called a full-session DRL procedure with three elementary school students. The classroom teacher set a maximum-response criterion for each student, such as nine responses in a 20-minute session, and each student who made fewer requests than their individualized maximum received a point that he or she could use in the classroom behavior management system. This procedure reduced requesting behavior to levels more appropriate for the classroom. Unlike forms of differential reinforcement that target zero or near-zero levels of a behavior, DRL schedules may maintain behavior at low rates, but see Jessel and Borrero (2014) and Becraft, Borrero, Davis, Mendres-Smith, and Castillo (2018) for laboratory-based studies including variations of DRL schedules that produced response maintenance relative to response suppression. ### Duration Duration is the amount of time a participant performs a behavior. For behaviors such as completing homework, exercising, or reading, the number of instances of behavior is less informative than the amount of time a participant performs the behavior. For instance, knowing that a student studied for 3 hours in the past week is likely more informative than knowing that the student studied on three occasions, particularly if the three occasions lasted only 30 seconds each. Thus response duration may be a more important target than response frequency in these cases. Our previous examples of differential reinforcement target increased frequency or speed of behaviors, but behavior analysts can also use differential reinforcement to sustain responding. Miller and Kelley (1994) taught parents to use differential reinforcement to sustain the homework engagement of four school-age children. After the parent and child set a goal for such engagement, the parent provided access to preferred activities when the child met or exceeded the goal. Investigators also have used DRO schedules to increase the duration of other important behavior. For instance, Cox, Virues-Ortega, Julio, and Martin (2017) arranged DRO schedules to reduce the excessive motion of children with autism spectrum disorder during magnetic resonance imaging (MRI). The DRO schedule arranged reinforcement for movement-free intervals, and participants learned to lie still up to 5 minutes during mock MRI sessions. These findings are important because excessive movement produces unusable MRI results, with the subsequent need to repeat this expensive procedure to obtain usable results. ### Intensity Intensity is the physical force or magnitude of the target response. For instance, the volume at which an individual emits speech is integral to a conversation partner's ability to respond. An individual who speaks too softly may not be heard, and excessively loud speech may be aversive to the listener. Fleece et al. (1981) demonstrated the use of differential reinforcement of response intensity to increase the speech volume of two preschool children with intellectual developmental disorder. Investigators used a sound-sensitive apparatus that they calibrated to respond to participant vocalizations that exceeded a minimum threshold by producing red- and green-colored lights in the shape of a Christmas tree, which was a presumed reinforcer. The investigators increased the minimum threshold for reinforcement as the children successfully activated the device. The speech volume of the participants increased but did not exceed the speech volume of their peers. ### Latency Latency is the amount of time that passes between the occurrence of some event and the completion of behavior. For instance, we might define latency to awakening as the time between an alarm clock's sounding and a person's getting out of bed. Tiger, Bouxsein, and Fisher (2007) used differential reinforcement of response latencies with an adult with Asperger syndrome who displayed delayed responding to questions. The investigators asked the participant for information, such as his siblings' names and their addresses, during baseline. The participant responded accurately, but mean response latency was 24 seconds. The investigators then arranged a differential-reinforcement contingency in which the participant earned tokens exchangeable for access to a movie for each question he answered within an identified latency. By the third differential-reinforcement session, the participant's mean latency to respond was 5 seconds. ### Interresponse Time Interresponse time is the time between two instances of a response. Differential reinforcement of short interresponse times produces rapid responding (i.e., short pausing between similar responses) and differential reinforcement of longer interresponse times produces slow responding (i.e., greater pausing between similar responses). For instance, Lennox, Miltenberger, and Donnelly (1987) reduced the rapid eating of three adults with profound intellectual developmental disorder by differentially reinforcing long interresponse times between consumption of bites of food. The investigators used baseline data to set a target interresponse time of 15 seconds. They blocked participants' attempts to place food in the mouth more often than once every 15 seconds. Lennox et al. also prompted the participant to engage in an incompatible response during the 15-second interval, and participants' rate of bite consumption decreased. ## CONSIDERATIONS FOR DIFFERENTIAL-REINFORCEMENT PROCEDURES Behavior analysts can implement differential reinforcement in many ways. Several parameters of the response-reinforcer relation affect the likelihood of differential reinforcement's effectiveness. These include the effort of the target response and the immediacy, schedule, magnitude, type, and quality of reinforcement. We discuss each of these parameters below. ### Response Effort The response effort is likely to affect the rate at which an individual learns a response. Individuals acquire responses with lower effort more quickly than those with higher effort, and simple responses more quickly than complex responses. Horner, Sprague, O'Brien, and Heathfield (1990) showed the importance of response effort when teaching alternative communicative responses to two participants who engaged in socially mediated problem behavior. Acquisition was slow and incomplete, and problem behavior persisted when the investigators required participants to type a full sentence on an augmentative-communication device to access reinforcement. Participants learned and maintained a less effortful alternative (pressing a key to generate the same sentence) more quickly, and problem behavior decreased and remained low. When speed of acquisition is critical, decreasing response effort is an important tactic to consider. The behavior analyst may still teach more effortful and complex responses by first arranging differential-reinforcement contingencies for less effortful or more simple responses, and then gradually increasing the response effort and response complexity required to access reinforcement (see Hernandez, Hanley, Ingvarsson, & Tiger, 2007, for an example of this strategy). ### Immediacy of Reinforcers Reinforcer immediacy or reinforcer contiguity is the time between an instance of behavior and reinforcement delivery (Vollmer & Hackenberg, 2001). Individuals may acquire responses when considerable time expires between the response and a reinforcing event (i.e., acquisition under delayed-reinforcement conditions; Gleeson & Lattal, 1987), and short delays may sometimes increase response persistence for primary reinforcers (Leon, Borrero, & DeLeon, 2016). The acquisition process is usually substantially longer or incomplete, however, even with brief delays (Carroll, Kodak, & Adolf, 2016; Gleeson & Lattal, 1987). The contingency-weakening effects of delayed reinforcement are well documented (Fisher, Thompson, Hagopian, Bowman, & Krug, 2000; Hanley, Iwata, & Thompson, 2001), and sometimes a single instance of immediate reinforcement will strengthen a response (Skinner, 1948). Thus ensuring the immediate delivery of reinforcement following a target behavior is critical for rapidly increasing the behavior through differential reinforcement (Hanley et al., 2001). Delays to social and tangible reinforcement are inevitable outside of highly resourced teaching conditions, however. Differential reinforcement is still essential for generating and maintaining important behavior under these conditions. In an early example, Lalli, Casey, and Kates (1995) used differential reinforcement of progressively increasing chains of responses to strengthen task completion and maintain functional communication, despite consistent delay to reinforcement. Ghaemmaghami, Hanley, and Jessel (2016) extended this work by showing that socially important behavior such as functional communication, tolerance, and compliance with instructions maintained despite long delays to reinforcement by (1) providing immediate reinforcement for each behavior type at least intermittently, and (2) progressively strengthening chains of appropriate behavior with the contingent termination of the delay versus time-based termination of the delay. This process strengthens initial behaviors in the response chain, even though initial behaviors do not contact much immediate reinforcement. The process also mitigates resurgence of problem behavior during delays by strengthening appropriate behavior during the delay. Thus the appropriate behavior that occurs during the delay is available for reinforcement when the delay ends. Investigators also have shown that procedures that develop behavior chains mitigate the untoward effects of delays to automatic reinforcement. For instance, Slaton and Hanley (2016) showed that chained schedules produced more consistent item engagement and lower levels of stereotypy. ### Reinforcement Schedules Reinforcement schedules specify the number and type of responses required to produce reinforcement or the time that must elapse before reinforcement is available. The reinforcement schedule specifies the rules for reinforcement delivery. Because Mace, Pritchard, and Penney (Chapter 4, this volume) describe reinforcement schedules more fully, we only briefly review them here. #### Ratio Schedules Ratio schedules arrange reinforcement delivery based on number of responses, which may be constant, variable, or progressive. In a fixed-ratio (FR) schedule, the number of responses required to produce reinforcement remains constant. For instance, every response produces a reinforcer in an FR 1 schedule; every fifth response produces a reinforcer in an FR 5 schedule. Behavior analysts use FR 1 schedules commonly to establish and strengthen behavior. FR schedules may produce a pause-and-run pattern in which responding occurs at consistent high rates until reinforcement delivery; the organism then pauses for a period before high-rate responding resumes (Ferster & Skinner, 1957; Orlando & Bijou, 1960). Variable-ratio (VR) schedules arrange reinforcement delivery around a mean number of responses that changes from trial to trial. For instance, reinforcement delivery would occur after a mean of five responses in a VR 5 schedule. Thus, reinforcement delivery might occur after one, three, five, seven, or nine responses. VR schedules tend to produce high response rates without postreinforcement pauses, and behavior analysts often use them for response maintenance (Ferster & Skinner, 1957; Schlinger, Derenne, & Baron, 2008). Progressive-ratio schedules arrange reinforcement delivery on a schedule that changes across reinforcer deliveries. These schedules progress either by the addition of a fixed number of responses (arithmetic increases) or by multiplying each progressive-schedule value by a constant (geometric increases). For instance, an investigator might use a geometric progressive-ratio schedule in which the number of responses required to produce reinforcement doubles after each reinforcer delivery. Investigators use progressive-ratio schedules to compare the strength of two or more stimuli as reinforcers (see DeLeon et al., Chapter 7, this volume). For instance, Roane, Lerman, and Vorn-dran (2001) demonstrated that progressive-ratio schedules were more sensitive to differences in the reinforcing efficacy of stimuli than traditional preference assessments. #### Interval Schedules Interval schedules arrange for reinforcement delivery for the first response occurring after a specified interval and may be either fixed or variable. In a fixed-interval (FI) schedule, reinforcement delivery occurs for the first response that occurs after interval lengths that remain constant. For instance, the first response after 60 seconds will produce a reinforcer in an FI 60-second schedule. FI schedules may generate high rates of responding, especially with low-effort responses and relatively small schedule values (e.g., Hanley et al., 2001). These schedules tend to produce a scalloped behavior pattern in which little responding occurs early in the interval, but responding gradually accelerates as the interval progresses (Ferster & Skinner, 1957; Weiner, 1969). A variable-interval (VI) schedule arranges reinforcement for the first response occurring after a specified interval that varies around a defined mean. For instance, reinforcement delivery might occur for the first response after 10, 30, 80, or 90 seconds in a VI 60-second schedule. VI schedules tend to produce steady response rates with little pausing (Orlando & Bijou, 1960). ### Reinforcer Magnitude Reinforcer magnitude is the amount or duration of a reinforcer. Social or practical constraints often influence a reinforcer's magnitude, such as when a teacher is available only for 5 minutes, or when someone wants to limit the amount of candy a child consumes. These constraints, however, may influence the efficacy of differential-reinforcement procedures. For instance, Trosclair-Lasserre, Lerman, Call, Addison, and Kodak (2008) showed that larger amounts of attention and toys maintained responding at higher schedule values than did smaller amounts for three children diagnosed with autism spectrum disorder. Like most functional relations, there are relevant boundary conditions. For instance, delivering copious amounts of reinforcement may produce reinforcer satiation and limit the effectiveness of the differential-reinforcement procedure. Therefore, behavior analysts should base their selection of reinforcement amount or magnitude on practicality, social acceptability, and effectiveness. ### Types of Reinforcers Behavior analysts generally distinguish between positive and negative reinforcement and social and nonsocial reinforcement. A positive reinforcer is one that a behavior analyst presents contingent on a response, which increases the future probability of the response. A negative reinforcer is one the behavior analyst removes contingent on a response, which increases the future probability of a response. Social reinforcers are ones that we can control, such as saying, "Nice work!" or giving a child a cookie. By contrast, nonsocial or automatic reinforcers are events that occur as a direct result of the behavior (e.g., obtaining a cookie from a vending machine; Vaughan & Michael, 1982). ### Positive versus Negative Reinforcers Most reported applied-behavior-analytic studies of differential reinforcement have used positive reinforcers, such as vocal and physical attention, edible items, or leisure activities. Although investigators use differential negative reinforcement less often (Iwata, 1987), examples include studies by Piazza et al. (1997) and Lalli et al. (1999). They provided negative reinforcement in the form of a break when the participant complied with a task demand. Error correction is a common differential-negative-reinforcement procedure that behavior analysts incorporate into teaching programs. Error correction involves prompting additional responding when a learner makes an error. For instance, the teacher might point to the correct picture, say, "That is the elephant," and then prompt the child to "Point to the elephant" in an error correction trial for receptive identification. Research has shown that learners will acquire novel skills to avoid these additional prompts (Kodak et al., 2016; McGhan & Lerman, 2013; Rodgers & Iwata, 1991). ### Automatic Reinforcers Most studies of differential reinforcement in the literature have used social reinforcers. Nevertheless, programming nonsocial or automatic reinforcers following the occurrence of target behavior is possible and may be useful. For instance, Linscheid, Iwata, Ricketts, Williams, and Griffin (1990) described a device to treat severe self-injurious behavior that could detect occurrences of head banging and deliver a preferred event, such as music or visual stimulation, when head banging did not occur for a specified period. Behavior analysts also can arrange differential automatic negative reinforcement for target responses. For instance, Azrin, Ruben, O'Brien, Ayllon, and Roll (1968) engineered a device that emitted a quiet tone, followed in 3 seconds by a loud tone when participants engaged in slouching. Participants could correct their posture after the quiet tone and avoid the loud tone, or could remain erect and avoid both tones. Investigators have shown that providing access to automatically reinforced stereotypical behavior can function as reinforcement for other target responses, such as academic discriminations and play skills (Charlop-Christy & Haymes, 1996; Charlop, Kurtz, & Casey, 1990; Wolery, Kirk, & Gast, 1985). For instance, Hanley, Iwata, Thompson, and Lindberg (2000) showed that participants' stereotypic behavior persisted in the absence of social consequences during an experimental functional analysis, suggesting that the consequences produced by the behavior functioned as automatic reinforcement. Hanley et al. provided access to stereotypic behaviors contingent on play with leisure materials, which increased participants' play with leisure materials. Potter, Hanley, Augustine, Clay, and Phelps (2013) used a similar arrangement to teach complex, multistep play to adolescents with autism spectrum disorder. The participants could engage in stereotypy by completing progressively complicated play routines. Slaton and Hanley (2016) taught participants to inhibit stereotypy and engage appropriately with items, using access to stereotypy as reinforcement. A chained schedule of reinforcement produced higher levels of item engagement and stimulus control of stereotypy than a schedule in which access to stereotypy was time-based. Using automatic reinforcers in differential-reinforcement contingencies may be desirable for several additional reasons. First, the delivery of social reinforcers commonly requires a caregiver to continuously monitor and document participant behavior. The procedures described by Linscheid et al. (1990) and Azrin et al. (1968) require neither, which may increase their utility. Second, automated delivery of reinforcers is likely to be more precise and immediate than delivery of reinforcers by humans if the device functions properly. Third, individuals may acquire skills more readily when the consequence of responding results directly from the behavior (Thompson & Iwata, 2000). ### Reinforcer Quality Reinforcer quality is a participant's subjective valuation of a reinforcing stimulus. Results of multiple studies have shown that attention to quality improves the efficacy of reinforcement programs (e.g., Johnson, Vladescu, Kodak, & Sidener, 2017; Mace, Neef, Shade, & Mauro, 1996) and their acceptability to the participants (e.g., Johnson et al., 2017). Presumably, effective procedures rely on reinforcers of sufficient quality, and reinforcer value is idiosyncratic and may change over time. ## STRATEGIES TO INCREASE OR MAINTAIN THE EFFECTIVENESS OF REINFORCERS ### Motivating Operations A motivating operation is an event that alters the effectiveness of a stimulus as reinforcement. There are two broad categories of motivating operations (Laraway, Snycerski, Michael, & Poling, 2003): those that temporarily increase the value of a reinforcer, called establishing operations, and those that temporarily diminish the value of a reinforcer, called abolishing operations. The most common establishing operation is deprivation, and the most common abolishing operation is satiation. Control and manipulation of establishing operations can increase the effectiveness of differential-reinforcement procedures. For instance, Goh, Iwata, and DeLeon (2000) showed that no participants acquired a novel mand when the reinforcer for the mand was available on a dense schedule of non-contingent reinforcement (NCR). When the investigators made the NCR schedule progressively leaner, participants acquired the novel mand, presumably because the relevant establishing operation increased with decreases in time-based reinforcer deliveries. Satiation is a serious challenge when a behavior analyst is arranging reinforcement contingencies, because each reinforcer delivery serves as an abolishing operation for the reinforcer that subsequent responses produce. For instance, each sip of water decreases the establishing operation for subsequent sips of water over the near term. Using the smallest amount of reinforcement necessary to maintain responding is one way to mitigate satiation. Another is to restrict the reinforcer to the environment for the contingency arrangement. For instance, Roane, Call, and Falcomata (2005) demonstrated that responding persisted more when they restricted the reinforcer to the progressive-ratio-schedule arrangement than when the reinforcer was available outside the progressive-ratio-schedule arrangement. In some cases, a behavior analyst cannot ethically or legally restrict a potential reinforcer. In these cases, the behavior analyst can schedule training in ways that maximize the effectiveness of reinforcers. For instance, the analyst might schedule a training session before the participant's regularly scheduled lunch and use food as a reinforcer (e.g., North & Iwata, 2005; Vollmer & Iwata, 1991). ### Reinforcer Variation Varying reinforcers for responding may delay satiation and prolong the effectiveness of differential reinforcement (Bowman, Piazza, Fisher, Hagopian, & Kogan, 1997; Egel, 1981; Koehler, Iwata, Roscoe, Rolider, & O'Steen, 2005). For instance, Bowman et al. (1997) showed that five of seven participants preferred varied delivery of three lesser preferred items to constant delivery of a more preferred item, and Egel (1981) showed that varying reinforcers produced more stable levels of correct responding and on-task behavior for several children diagnosed with autism spectrum disorder. ### Reinforcer Choice Providing a choice of reinforcers may be a simple yet highly effective means of improving the efficacy of differential-reinforcement procedures (Ackerlund Brandt, Dozier, Juanico, Laudont, & Mick, 2015; Dunlap et al., 1994; Dyer, Dunlap, & Winterling, 1990; Fisher, Thompson, Piazza, Crosland, & Gotjen, 1997; Sellers et al., 2013; Thompson, Fisher, & Contrucci, 1998; Tiger, Hanley, & Hernandez, 2006; Toussaint, Kodak, & Vladescu, 2016). Choice making may be effective because it produces reinforcer variation, which minimizes satiation secondary to the repeated delivery of the same reinforcer. In addition, choice making capitalizes on establishing operations that produce momentary fluctuations in the value of reinforcers, because the participant can choose the reinforcer he or she prefers at that moment. There also is evidence that the opportunity to choose adds value to differential reinforcement beyond the value of obtaining the most preferred item. For instance, we (Tiger & Hanley, 2006) showed that six of seven preschoolers preferred to engage in academic seatwork when they could choose a single edible from an identical edible array for correct responding, rather than when the teacher provided the same amount and type of edible from the same type of array for correct responding. Note that reinforcer amount, quality, and type were identical in the two conditions; the only difference was the choice component. We also showed that children engaged in 12 to 16 times more academic work in the choice condition. These data show that programming opportunities to make choices may enhance the efficacy of differential reinforcement. ### Token Reinforcement Systems Using conditioned reinforcers that an individual can trade later for preferred items, known as backup reinforcers, is another strategy to decrease satiation. Token economies, for instance, involve providing arbitrary items, such as tickets, tokens, stickers, or points, following the occurrence of target behaviors. Later the individual can exchange the tokens for preferred items (see Reitman, Boerke, & Vassilopoulos, Chapter 22, this volume, or reviews by Hackenberg, 2018; Kazdin, 1982; Kazdin & Bootzin, 1972). Token systems allow caregivers to deliver multiple reinforcers contingent on desirable behavior without adversely affecting the value of the primary or backup reinforcers, and without interrupting learning tasks or complex behaviors for reinforcer consumption. For instance, Krentz, Miltenberger, and Valbuena (2016) used token reinforcement to increase the distance walked by overweight and obese adults with intellectual developmental disorder at an adult day training center. ## COMPLEMENTARY PROCEDURES TO DEVELOP NEW BEHAVIOR Although differential reinforcement alone can produce new behavior, combining it with other procedures when teaching new behavior is often more effective. This section describes procedures to complement differential reinforcement to develop new behavior. ### Prompting Behavior analysts often pair prompting with differential reinforcement to teach new behavior. The general sequence involves prompting the individual to engage in a response, providing reinforcement for the prompted response, and gradually eliminating the prompt over time. The behavior analyst can provide prompts in many forms (such as vocal, visual, or physical-response prompts; within-stimulus prompts; or extrastimulus prompts) and can choose the prompt based on the modality of the target behavior and the individual's capabilities. For instance, Thompson, McKerchar, and Dancho (2004) used delayed physical prompts and differential reinforcement to teach three infants to emit the manual signs Please and More with food as the reinforcer. By contrast, behavior analysts cannot prompt nonmotor target behavior, such as vocalizations; therefore, they must pair alternative prompting procedures with differential reinforcement. Bourret, Vollmer, and Rapp (2004) used vocal and model prompts to teach vocalizations to two children with autism spectrum disorder. The therapist vocally prompted the participant to emit the target vocalization (e.g., "Say tunes"). Correct vocalizations produced access to music. If the participant did not emit the target vocalization, the therapist modeled progressively shorter vocalizations (e.g., changing "Say tunes" to "Say tuh"). As the participant successfully emitted the modeled vocalization, the therapist required the participant to emit a vocalization that more closely approximated the target vocalization before receiving reinforcement. One disadvantage is that prompting may produce prompt dependence. The behavior analyst can pair differential reinforcement with various tactics to fade and to eliminate prompts eventually (see Halle, 1987, for a discussion of spontaneity). Thompson et al. (2004) and Bourret et al. (2004) eliminated prompts by increasing the delay between the presentation of the evocative event (such as placing a toy in a participant's reach) and the prompts, so that reinforcement was more immediate for independent responses. Other tactics include withholding reinforcement for prompted responses (Touchette & Howard, 1984) or decreasing the physical intensity of the prompts (see Wolery & Gast, 1984). ### Shaping and Percentile Schedules When prompting is not appropriate to increase responding, we recommend shaping as an alternative tool. Shaping involves differential reinforcement of successive approximations of a behavior. To initiate a shaping procedure, a behavior analyst must (1) identify a behavior the individual currently emits that approximates the target behavior; (2) provide reinforcement for that behavior; and (3) require closer approximations to the terminal behavior, such as more complex forms or different rates or durations of behavior, for reinforcer delivery. Investigators have used shaping to teach many complex behaviors, including eye contact in children with autism spectrum disorder (e.g., McConnell, 1967), vocal speech in mute adults diagnosed with psychosis (Sherman, 1965), and limb use in patients after strokes (Taub et al., 1994). Although shaping is among behavior analysts' oldest and most celebrated tools, there are few formalized rules for shaping. Galbicka (1994) described a formalized shaping system using percentile schedules, and investigators have published studies in which percentile schedules are the cornerstones of their behavior change procedures (Athens, Vollmer, & St. Peter Pipkin, 2007; Lamb, Kirby, Morral, Galbicka, & Iguchi, 2004; Lamb, Morral, Kirby, Iguchi, & Galbicka, 2004). Percentile schedules dictate rules for the timing of reinforcement delivery, and these rules can be adjusted based on recent or local rates, durations, or types of responding. The behavior analyst rank-orders responses from the simplest to the most complex to arrange percentile schedules for complex behavior. The behavior analyst keeps a running stream of the temporal order and form of the behavior, with a focus on the most recent responses. The behavior analyst delivers a reinforcer for a response if it exceeds the formal qualities of the most recent subset of responses. Behavior analysts may use percentile schedules to shape higher rates or durations of responding. For instance, Athens et al. (2007) used percentile schedules to increase the academic-task engagement of four students with intellectual developmental disorder. The investigators measured the duration of task engagement for each participant. During the percentile-schedule phase, engagement produced a token exchangeable for food if engagement duration exceeded the median duration of the previous 5, 10, or 20 engagement durations, depending on the experimental condition. Thus the reinforcement criterion constantly shifted, given the participant's recent engagement duration. The percentile schedule produced increased engagement durations, with the biggest increases in conditions in which the participant's previous behavior determined the momentary criterion for reinforcement. For instance, Athens et al. observed higher engagement durations when they used the previous 20 versus the previous 5 engagement durations to determine the reinforcement criterion. ### Response Chaining and Task Analysis Commonly taught behaviors are often not single, unitary responses; instead, they include a series of topographically distinct behaviors that a participant must complete in sequence. Behavior analysts often refer to these behaviors as response chains and to each component behavior as a link in the chain. Providing reinforcement for a single response in a chain or for the entire response chain may not be an efficient or effective way to shape behavior. Therefore, behavior analysts typically use prompting and differential reinforcement or shaping to establish individual links of the response chain, then differentially reinforce sequences of links until a participant produces an entire response chain. Behavior analysts use one of two general procedures, called forward chaining and backward chaining, to teach response chains. Forward chaining involves teaching the response chain in the same order in which the participant will ultimately perform it. That is, the behavior analyst differentially reinforces emission of the first behavior in the chain, then the first and second behaviors, and so forth. By contrast, the behavior analyst provides differential reinforcement for the last behavior in the chain and adds behaviors of the chain to the differential-reinforcement contingency in reverse order, in backward chaining. Task analysis, or identifying individual behaviors in the response chain, is necessary before teaching a complex behavior. For instance, Neef, Parrish, Hannigan, Page, and Iwata (1989) demonstrated the importance of task analysis. They taught self-catheterization skills to two young girls with spina bifida by identifying each step of self-catheterization and then partitioning the task into 6-11 component steps. They taught each step to each participant, using prompting and differential reinforcement, until the two girls could independently self-catheterize (see also Noell, Call, Ardoin, & Miller, Chapter 15, this volume). ### RESPONSE MAINTENANCE AND SCHEDULE THINNING Although immediate, dense schedules of reinforcement are important for establishing responses, caregivers may have difficulty implementing such schedules with high integrity over the long term. Therefore, thinning of a reinforcement schedule is an important part of response maintenance. One method of reinforcement schedule thinning is to deliver reinforcement intermittently by progressively increasing response requirements for reinforcement. For instance, Van Houten and Nau (1980) used FR- and VR-like reinforcement schedules to increase the attending behaviors of