Neuronal Basis of Sequential Foraging Decisions in a Patchy Environment PDF

a r t ic l e s Neuronal basis of sequential foraging decisions in a patchy environment Benjamin Y Hayden1,2, John M Pearson1,2 & Michael L Platt1–...

a r t ic l e s Neuronal basis of sequential foraging decisions in a patchy environment Benjamin Y Hayden1,2, John M Pearson1,2 & Michael L Platt1–3 Deciding when to leave a depleting resource to exploit another is a fundamental problem for all decision makers. The neuronal mechanisms mediating patch-leaving decisions remain unknown. We found that neurons in primate (Macaca mulatta) dorsal anterior cingulate cortex, an area that is linked to reward monitoring and executive control, encode a decision variable signaling the relative value of leaving a depleting resource for a new one. Neurons fired during each sequential decision to stay in a patch and, for each travel time, these responses reached a fixed threshold for patch-leaving. Longer travel times reduced the gain of neural © 2011 Nature America, Inc. All rights reserved. responses for choosing to stay in a patch and increased the firing rate threshold mandating patch-leaving. These modulations more closely matched behavioral decisions than any single task variable. These findings portend an understanding of the neural basis of foraging decisions and endorse the unification of theoretical and experimental work in ecology and neuroscience. Resources are rarely distributed uniformly in the environment. Food, c orresponded to remaining in the patch, and choosing it yielded a water and other vital commodities more often occur in spatially local- juice reward that declined each time it was chosen. The other target ized and temporally ephemeral patches1. Patchy environments force corresponded to leaving the patch, and choosing it yielded only a animals to balance the benefits of staying in a depleting patch and delay before the opportunity to choose again at a replenished patch. leaving for a richer one2. According to the marginal value theorem Monkeys’ behavior closely matched the predictions of the MVT. We (MVT) of behavioral ecology, animals should leave patches when then recorded activity of neurons in the dorsal anterior cingulate cor- their intake rate diminishes to the average intake rate for the overall tex (dACC) while they performed the task. environment2,3. Organisms as diverse as worms, bees, wasps, spiders, dACC has been linked to reward outcome monitoring and behavioral fish, birds, seals and even plants obey the MVT3–6. Ethnographic adjustment14–16, as well as to signaling reward outcomes and predicting evidence demonstrates that human subsistence foragers also obey the changes in behavior17–24. Notably, ACC dysfunction attends clinical predictions of the MVT in their hunting behavior7, and laboratory disorders that are associated with difficulty in abandoning maladap- findings suggest that monkeys may do so as well 8. The generality tive patterns of behavior or cognition, including depression, addiction, of the MVT solution to the patch-leaving problem suggests that the obsessive-compulsive disorder and Tourette Syndrome25–27. underlying mechanism is fundamental to the way organisms make We found that dACC neurons responded each time monkeys decisions4. The neuronal basis of patch-leaving decisions, however, made a choice and that these responses increased with time spent remains unknown. in the current patch. Monkeys abandoned a patch when neuronal Building on recent progress toward understanding the neuronal responses reached a threshold associated with a particular travel mechanisms mediating perceptual decisions9, we hypothesized that time. When travel time between patches was high, the gain of the brain maintains a decision variable specifying the current relative neuronal responses with each decision to remain in the patch was value of leaving a patch. Conceptually, a decision variable is an analog smaller and the threshold for patch abandonment was higher than quantity that incorporates all sources of information—in this case, when travel time was short. Overall, neuronal response gain and reward size, handling time, search time and travel time—evaluated threshold jointly predicted patch-leaving decisions. These findings by the decision policy to generate a behavioral choice9. The hypoth- suggest that dACC mediates patch-leaving decisions using a common esized decision variable gives rise to a decision via comparison with a integrate-to-threshold mechanism. specific threshold. For simplicity, we assume that this process is analo- gous, although not necessarily isomorphic, to the neural integrate- RESULTS to-threshold processes thought to mediate perceptual judgments9–13. For each choice there were two options (Fig. 1a). Choosing the stay We further conjecture that travel time between patches influences (short blue) target led to a juice reward in 0.4 s (handling time). The leaving decisions by changing the rate at which the decision variable value of this reward declined by 19 µl ± ε (s.e.m., ε = 1.9 µl) each time grows, the threshold or both10,11. it was chosen, mirroring the diminishing returns common to patchy To test these hypotheses, we developed a virtual foraging task foraging environments (Fig. 1b). Choosing the leave (tall gray) target in which rhesus monkeys chose one of two targets. One target led to no reward and a long delay that was fixed in a patch and varied 1Department of Neurobiology, Duke University School of Medicine, Durham, North Carolina, USA. 2Center for Cognitive Neuroscience, Duke University, Durham, North Carolina, USA. 3Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, USA. Correspondence should be addressed to B.Y.H. ([email protected]). Received 9 November 2010; accepted 28 March 2011; published online 5 June 2011; doi:10.1038/nn.2856 nature NEUROSCIENCE VOLUME 14 | NUMBER 7 | JULY 2011 933 a r t ic l e s a Saccade Handling time delay (0.4 s) Reward reduces by 0.02 ml in patches slightly longer than predicted by the MVT (mean 2.2 s longer, P < 0.01, Student’s t test). This slight over-staying may reflect Fixation Hold (500 ms) Stay ITI a weak preference for immediate small rewards over delayed large (1 s) rewards29–31, a slight over-estimate of travel times or even a status Travel time delay Reward resets to quo bias32. Leaving time was not influenced by travel time on the Saccade (0.5–10.5 s) 0.305 ml previous patch (regression of residence time against previous travel ITI Leave time, P = 0.44; Supplementary Data 3 and Supplementary Fig. 2). (1 s) Monkeys attempting to maximize local intake rates over the long- term should consider handling time as well as travel time2,3. To con- b 2.5 c 0.06 firm that monkeys do so, we performed an additional behavioral experiment in which handling times, but not travel times, were varied Reward harvest rate Reward harvested 2.0 0.04 from patch to patch (11 sessions, 6 in monkey E, 5 in monkey O). In (ml juice) (ml s –1) 1.5 each patch, handling time took one of ten values: 0.1, 0.4 , 0.6, 1.1, 1.0 Short travel time 0.02 Medium travel time 1.6, 2.1, 2.6, 3.1, 3.6 or 4.1 s. We cued handling time by varying the 0.5 Long travel time Maximal rate height of the blue rectangle; travel time was held constant at 5 s. We 0 0 0 10 20 30 40 0 10 20 30 40 performed these experiments after the monkeys had learned the task, Time in patch (s) Time in patch (s) but before we began physiological data collection; monkeys received Figure 1 Patch-leaving task. (a) Task design. After fixation, two eccentric one full day of training each. As predicted, patch residence times targets, a large gray and a small blue rectangle, appear. Monkey chooses declined with increased handling time (regression, β = −3.71 for both, one of two targets by shifting gaze to it. Choice of blue rectangle (stay in −3.81 for E and −3.62 for O, P < 0.001 in all cases; Fig. 2b). Leaving © 2011 Nature America, Inc. All rights reserved. patch) yields a short delay (0.4 s, handling time) and reward whose value times did not differ systematically from the rate-maximizing predic- diminishes by 19 µl per trial. Choice of gray rectangle (leave patch) yields tions at any of the ten points (P > 0.05 in all cases, Student’s t test). As no reward and a long delay (travel time) whose duration is indicated by the height of the bar, and resets the value of the blue rectangle at 306 µl. an additional control, during these sessions, we interleaved standard Travel time varies randomly from patch to patch and ranges from 0.5 fixed handling time patches with variable travel times. The average to 10.5 s. (b) Plot of the cumulative reward available in this task as a residence time on these trials was consistent with those obtained in function of time in patch, given the search times associated with animals’ the handling time control (Fig. 2b). performance in the task (black line). Data are generated on the basis of A natural question is whether the monkeys’ foraging behavior average times associated with performance. (c) Plot of reward intake rate may be explained in a delay-discounting framework30,33. In such a derived from a range of patch residence times (x axis: range of residence times). Data are shown for each of ten travel times (1-s intervals from 0.5 framework, each leave/stay decision is regarded as a choice between to 10.5 s). Rate-maximizing time in patch (the curves’ maxima, shown by a smaller-sooner stay reward and larger-later leave reward (that is, the black line) increases with increasing travel time. Data are generated the first reward in the new patch). Such a decision model would based on average times associated with actual animal performance. naturally account for the monkeys’ observed tendency to stay longer in patches when faced with longer travel times, as the delay associ- between patches, and reset the value of the stay target to its initial ated with patch-leaving would lead to discounting of the larger-later high value (306 µl). We defined search time as any additional time reward. To test this idea, we compared an empirically derived sequen- spent in the patch not explicitly waiting and residence time as total tial foraging model inspired by the MVT against a standard delay- time from arrival at a new patch, including handling times, search discounting model in which the hyperbolic discount parameter k time and intertrial intervals (ITIs). Travel time was explicitly cued on all trials by the height of the gray bar and was reset to a new random value each time it was chosen (0.5 to 10.5 s, uniform distribution). a Observed n = 94 sessions Optimal b Observed (average) Observed (fit) Optimal The blue and gray bars alternated sides each time the leave target 60 Time in patch (s) Time in patch (s) 80 was chosen; any potential laterality in neural tuning functions was 40 60 assumed to average out (Supplementary Data 1). In contrast to some 40 20 natural foraging decisions, there was no physical travel during the 20 travel time, nor was any action required during this delay; the only 0 1 2 3 4 5 6 7 8 9 10 0 0 1.0 2.0 3.0 4.0 explicit cost of delay was opportunity cost. These simplifications are, Travel time (s) Handling time (s) we believe, not critical, as most other laboratory foraging tasks eschew Figure 2 Monkeys obey the marginal value theorem in a virtual patchy effort requirements (for example, see refs. 8,28). foraging task. (a) Monkeys remain in the patch longer as travel time rises, as predicted by the marginal value theorem (MVT). Each dot indicates a Monkeys approximate rate maximization according to MVT single patch-leaving decision (n = 2,834 patch-leaving events). The time at which the monkey chose to leave the patch (y axis) was defined relative As travel time between patches grows, so does the rate-maximizing to the beginning of foraging in that patch. Travel time was kept constant residence time (Fig. 1c). Consistent with the MVT, monkeys’ patch- in a patch (x axis). Data from both monkeys is shown. Behavior (average residence times rose with increasing travel time and were nearly rate is traced by the blue line) closely followed the rate-maximizing leaving maximizing (P < 0.0001, β = 1.247, regression of residence time (s) time (red line), albeit delayed by 0–2 s. (b) Performance of two monkeys against travel time (s); Fig. 2a). These effects were found in both on handling time variant of patch-leaving task. In this control experiment, monkeys individually (P < 0.0001 for both individuals, β = 1.11 travel time was held constant (5 s) and handling time was randomly reset for monkey E, β = 1.47 for monkey O; Supplementary Data 2 and between each patch to have one of ten values. Patch residence time fell as handling time rose, consistent with the MVT. Observed times are Supplementary Fig. 1). Overall, both monkeys obtained 97.2% of the shown with black dots; averages are shown with solid blue line. Best-fit reward obtained by the best-fit rate-maximization algorithm (note that line (dashed blue line) is nearly identical to rate-maximizing (red line). this is a measure of reward obtained versus maximal obtainable, not a Average patch residence time on the interleaved standard travel time measure of variance in behavior explained). Both monkeys remained version of the task was consistent with this curve as well (red dots). 934 VOLUME 14 | NUMBER 7 | JULY 2011 nature NEUROSCIENCE a r t ic l e s a Trial start End of saccade Reward b 0 < time in 7.5 < time in 15 < time in 22.5 < time in c patch < 7.5 patch < 15 patch < 22.5 patch < 30 Cell E090926 Cell E090926 Firing rate (spikes per s) 40 40 Firing rate (spikes per s) n = 802 trials Firing rate (spikes per s) 25 30 20 20 15 20 10 10 5 0 0 10 20 30 40 50 Time in patch ( ± 2.5 s) 0 –1 0 1 –1 0 1 –1 0 1 –1 0 1 –2 –1 0 1 2 Time (s) Time (s) Figure 3 Firing rates of dACC neurons integrate patch residence time n = 102 cells d Significant e 2.5 30 Normalized response and travel time in computations occurring over multiple actions. Not significant Number of neurons (a) Average reward-aligned peri-stimulus time histograms (PSTHs) 2.0 for example cell. Neuronal responses were briefly enhanced around 20 the time of saccades and then fell to a baseline level between trials. 1.5 Time zero indicates end of saccade, indicating choice. Dark gray box, 10 pre-saccadic epoch. Light gray box, post-saccadic epoch. Black rectangle 1.0 indicates the average duration of the trial. (b) The firing rate during the 0 0 10 20 30 40 50 © 2011 Nature America, Inc. All rights reserved. peri-saccadic period rose with time in patch. Each panel indicates –0.2 0 0.2 0.4 0.6 0.8 Time in patch ( ± 2.5 s) Regression coefficient responses selected from one range of patch residence times. (c,d) Average responses of example neuron (c) and population of neurons (d) occurring in a series of 5-s analysis epochs (gray box in a). Firing rates increased as time in patch increased. Error bars represent s.e.m. (e) Histogram of regression coefficients relating firing rate in pre-saccadic epoch to time in patch for each neuron in the population (n = 102). Significant effects are indicated with gray boxes (P < 0.05). was estimated by maximum likelihood (best fit, k = 1.26 s−1). The a significant negative slope in 10 (average β = −0.09) and no signi Akaike weights for the two models (a measure of goodness of fit that ficant slope in the remainder (P > 0.05, n = 43, average β = 0.041). The accounts for different numbers of parameters in different models; 49 neurons with positive slopes constitute the focus of subsequent ana wMVT = weight for MVT model, wDD = weight for delay-discounting lyses (Supplementary Data 7 and Supplementary Figs. 6 and 7). model) clearly favored the sequential-trial foraging model, endorsing We next performed the same analysis on two later epochs. In the the MVT account of decision making in our task (wMVT/wDD = 1.31 post-saccadic epoch, we measured firing rates during the 400-ms × 1052) (Supplementary Data 4 and Supplementary Fig. 3). handling time period beginning at saccade termination and ending with the reward. In the ITI epoch, we measured firing rates during Neurons integrate information over multiple actions the 1-s period beginning after reward delivery and ending when We recorded the activity of 102 single neurons in dACC in two mon- the next set of choice options was presented to the monkey. For the keys performing this task (52 neurons in monkey E, 50 in monkey O; post-saccadic epoch, we observed a positive regression coefficient Fig. 3). For an example neuron, neural activity was aligned to the end in 44 neurons (P < 0.05, average β = 0.15 in significantly modulated of the choice saccade (time zero; Fig. 3a). Firing rate rose to a peak cells), a negative coefficient in 8 (average β = −0.1), and no significant around the time of the choice saccade and then returned to a baseline effect in the remainder (P > 0.05, 50 neurons, average β = 0.010). Nor value between trials. Such brief, peri-saccadic responses, often modu- was there much evidence of an effect in the ITI epoch at the popu lated by reward size and task context, are characteristic of neurons in lation level; we observed a significant correlation between firing rates dACC18,21,22,24. We focused on neuronal activity in the 500-ms epoch in the ITI and patch residence time in only four neurons (average preceding saccade onset (pre-saccadic epoch). For most analyses, we β = 0.02), all positive in sign. These numbers are similar to what focused on neural data associated with choosing to remain in the patch would be expected by chance (Supplementary Data 8). This result and excluded neural data associated with choosing to leave the patch suggests that dACC does not maintain a representation of time spent (exceptions are noted). Data for individual subjects matched the com- foraging in a patch across multiple actions; the locus of this trace in bined data (Supplementary Data 5 and Supplementary Fig. 4). the brain remains to be determined. Overall, we observed weak or no We next examined the responses of the example neuron from four effect of saccade direction on responses (Supplementary Data 1). time periods relative to the beginning of foraging in the patch (t < 7.5 s, 7.5 < t < 15, 15 < t < 22.5, and 22.5 < t; Fig. 3b). For this neuron, Threshold-crossing of dACC firing predicts patch-leaving responses rose with cumulative time spent foraging in the patch. To The gradual rise in neural responses across decisions to stay in a quantify this enhancement, we measured pre-saccadic responses in a patch resembles the within-trial rise-to-threshold processes observed series of non-overlapping 5-s time bins (Fig. 3c). We included in each in lateral intraparietal area, frontal eye fields (FEFs) and superior time bin all of the decisions in which the end of the saccade occurred colliculus during motor preparation and decision-making10–12,34,35. in that bin. We found that firing rates rose with increasing patch resi- We wondered whether a similar rise-to-threshold model might dence time (β = 0.31, P < 0.0001, linear regression of firing rate (spikes also account for the relationship between firing rates in dACC and per s) against time in patch (s)). The same effects were observed in patch-leaving decisions. To test this idea, we performed an ana the population average firing rates (β = 0.18, P < 0.0001, regres- lysis modeled on a previously developed method11 for probing the sion; Fig. 3d, Supplementary Data 6 and Supplementary Fig. 5). relationship between the firing rates of FEF neurons and saccade We observed a significant (P < 0.05) positive regression coefficient initiation. Although FEF firing rates in that study11 rose gradually in 49 neurons (average β = 0.24 in significantly modulated cells), to a fixed threshold on a single trial, the analogous rise in our study nature NEUROSCIENCE VOLUME 14 | NUMBER 7 | JULY 2011 935 a r t ic l e s Figure 4 Firing rates of dACC neurons rise to a threshold associated with patch a Latest Late Early Earliest b Saccade Reward n = 94 sessions, 2,834 blocks abandonment. (a) Plot of patch-leaving 60 Firing rate (spikes per s) 20 Cell E090921b times, separated by whether they were Time in patch (s) earlier or later than the average leaving time. 40 15 We divided patch-leaving decisions into four categories: earliest (black), early (red), 20 10 late (cyan) and latest (magenta). These variables are independent of travel time 0 5 and time in patch, meaning that, for example, 1 2 3 4 5 6 7 8 9 10 –2 –1 0 1 earliest trials are equally likely to occur at Travel time (s) Time (s) any travel time (x axis) and any time in cCell E090921b d n = 49 cells e Earliest departure patch (y axis). (b) PSTH for an example 25 2.0 2.2 Firing rate (spikes per s) Firing rate (normalized) Firing rate (normalized) Early departure neuron separated by earliness level. dACC 1.8 2.1 Late departure Latest departure neurons responded sooner and more strongly 20 1.6 2.0 on earlier trials than on later trials. Black 1.4 rectangle indicates the average duration of 15 1.2 1.9 the trial. (c,d) Average firing rates of example 1.0 1.8 neuron (c) and population (d) separated 10 0.8 1.6 n = 43 neurons by earliness level. Firing rates rose faster for earlier patches but asymptoted at the same 0 10 20 30 0 10 20 30 3 2 1 0 Time in patch (s) Time in patch (s) Trials before switch level. Error bars represent s.e.m. (e) Plot of average firing rate of population of neurons, aligned to final trial in patch (x = 0 on graph) and showing the final three trials before switch (x = 1, © 2011 Nature America, Inc. All rights reserved. 2 and 3). Firing rates rose to the same level on final trial, as well as preceding trials. Error bars represent s.e.m. was associated with changes in firing rate occurring in discrete bouts We then examined whether firing rates rose to the same threshold over multiple actions. Moreover, firing rates of FEF neurons gradually in each of the four earliness bins. For this analysis, we only exam- rise to threshold over tens of milliseconds, whereas the amplitudes ined firing rates for patch-leaving choices, which we had ignored in of discrete dACC neuronal responses in our task increased over tens previous analyses. We took these firing rates to be a proxy for the of seconds, orders of magnitude longer. Nonetheless, these analytical patch-leaving threshold. Firing rates did not depend on earliness for methods in principle generalize readily to our task. Our analysis asked the example neuron (regression of firing rate against earliness level, two questions. First, does variability in patch-leaving times correlate P = 0.45; Fig. 4c) or the population (P = 0.88; Fig. 4d), which is con- with variability in the rate at which neural activity rises? Second, do sistent with the threshold hypothesis. We observed no relationship firing rates rise to the same level regardless of the precise time mon- between earliness and threshold for the population average response keys choose to leave the patch for a given travel time? (regression, P = 0.79). We observed a significant effect of earliness We first divided all patch-leaving choices into residence time on threshold for only a small number of cells, 6 out of 49 signifi- quartiles in each travel time (medians of each set for the aggregate cantly modulated neurons (12.2%, P < 0.05). Finally, we found that response, earliest = 14.1 s, early = 19.2 s, late = 23.5 s, latest = 32.2 s; neuronal responses aligned to the patch-leaving trial for different Fig. 4a). We refer to this classification of trials as the ‘earliness’ for earliness levels overlapped (Fig. 4e). For the population of neurons, each patch. We repeated the classification of leaving times into four firing rates rose to approximately the same level on the last decision different earliness bins for each neuron separately. By design, earli- to stay in the patch before choosing to leave the patch (regression, ness is orthogonal to travel time, and so neural correlates of earliness β = −0.003, P = 0.51). and travel time are independent. For our example neuron, firing rates were significantly higher for the earliest than for the early patches Travel time influences gain and threshold of neuronal responses (difference = 1.8 spikes per s, P = 0.02; Fig. 4b), significantly higher Assuming that patch-leaving decisions are governed by a threshold for the early than for the late patches (difference = 2.0 spikes per s, process, variation in travel times should influence the threshold. P < 0.01) and significantly higher for the late than for the latest patches There are three basic mechanisms by which travel time could influ- (difference = 4.7 spikes per s, P < 0.01). ence the accumulation-to-threshold process (Fig. 5a). First, travel We next calculated the average neural responses in a series of 5-s time could increase the rate of rise of the decision variable. Second, time bins encompassing multiple choices separated by earliness level. travel time could adjust the threshold level. Third, travel time could For both the example neuron and the focal population of neurons influence the baseline. Our next analysis was designed to determine (n = 49 of 102, see above), we found that the rate of rise of firing which of these processes are implemented in dACC. rates was positively correlated with earliness (Fig. 4c,d). We quan- We first examined the relationship between travel time and tified these effects by calculating the regression weight for firing response gain (Fig. 5b,c). For each neuron, we divided patches into rate as a function of patch residence time separately for each of ten travel time deciles (equal-sized sequentially classified bins, 1 to the four earliness bins. The slope for the earliest patches (β = 0.71) 10 ± 0.5 s). In each decile, we calculated the slope of firing rate versus was greater than the slope for the early patches ( β = 0.52, P < 0.01, time in patch. We found a significant negative correlation between bootstrap test), which was greater than the slope for the late patches travel time and regression slope for our example neuron (regression, (β = 0.44, P < 0.01), which was in turn greater than the slope for the β = −0.04, P < 0.01; Fig. 5b) and for the 49 cells in the analysis population latest patches (β = 0.39, P < 0.01). The same effects were observed (β = −0.033, P < 0.01; Fig. 5c). We observed significant negative effects in the population average responses (P < 0.005 for each compari- in 29 of 49 of the focal population of neurons, positive correlations son). Slopes decreased monotonically from earliest to latest quartiles in 6, and no effect in 14 (P < 0.05). Across the entire population, we for 32 of 49 neurons (65%). The remainder (n = 17) showed no found a significant negative correlation in 41 of 102 neurons (P > 0.05), such effect. a positive correlation in 7 neurons and no effect in 54 neurons. 936 VOLUME 14 | NUMBER 7 | JULY 2011 nature NEUROSCIENCE a r t ic l e s a b c difference between 1- and 10-s travel times 0.8 Cell E090904a1 n = 49 neurons was 0.09 spikes per s). Moreover, of these Beta weight (×10–2) 8 0.6 nine neurons, six showed increasing fir- Beta weight Possible 6 mechanisms 0.4 ing rates with longer travel times. Our data 4 therefore do not endorse the idea that travel 0.2 Faster rate of rise 2 times influence baseline neuronal firing rates 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 in ACC. Firing rate Travel time (s) Travel time (s) d Cell E090904a1 e DISCUSSION n = 49 neurons Normalized firing rate 55 Firing rate on switch trial (spikes per s) 50 Choosing when to leave a depleting resource 2.2 Time Reduced threshold 2.1 45 patch is a ubiquitous natural decision prob- 2.0 40 lem that is central to foraging theory and 1.9 35 behavioral ecology2,3. Although the brains 1.8 1.6 30 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 of animals have undoubtedly been shaped Elevated baseline Travel time (± 0.5 s) Travel time (± 0.5 s) by evolutionary pressures for foraging effi- ciency, the neural processes that mediate Figure 5 Travel time governs both neuronal response gain and threshold. (a) Schematic of three possible mechanisms by which exogenous factors may govern a rise-to-threshold process. Shorter the simple decision to give up on one patch travel times can hasten patch-leaving (leftward movement on x axis) by increasing the rate of rise, and move to another remain obscure. Our reducing the threshold or elevating the baseline. (b,c) Evidence that travel times change rate of findings suggest that, during foraging, the rise. Example neuron (b) and population (c) average regression slopes (beta weights) for firing rate primate brain computes a decision variable © 2011 Nature America, Inc. All rights reserved. as a function of time in patch. Beta weights fell as travel times rose, indicating that shorter travel whose magnitude corresponds to the relative time increases neuronal response gain. Error bars represent s.e.m. (d,e) Evidence that travel time value of leaving a patch, that this value rises influences firing threshold for patch abandonment. Firing rate on patch-leaving trial was taken as a proxy for threshold level. Example neuron (d) and population (e) show increasing firing rates on to a threshold associated with patch leaving, patch-leaving trial as travel time increases (black dots). Firing rate on penultimate trial also rose and that travel time between patches governs with travel time, consistent with a multi-trial integration process. Error bars represent s.e.m. both the threshold and the rate at which this decision variable rises. This decision variable is represented in the firing rates of neurons in We next examined the relationship between travel time and thresh- the dACC, a frontal lobe structure associated with reward monitoring old (Fig. 5d,e). As above, we assumed that the firing rate in the epoch and behavioral adjustment14–16. immediately preceding patch-leaving provides a proxy for thresh- If the thresholding process that we propose were to occur at the old. For our example neuron, the threshold rose with travel time level of dACC or its inputs, we would expect to see the outcome of (β = 1.53, P < 0.01, regression of firing rates (spikes per s) against the threshold process in the responses of dACC neurons. Instead, we travel time (s); Fig. 5d). There was also a significant, but weaker, observed a signal that varied continuously with time in patch, sug- correlation between travel time and firing rate on the last choice to gesting that the thresholding process occurs downstream of dACC, stay in the patch before patch-leaving, consistent with the idea of a perhaps in FEFs or some other premotor structure. Notably, our gradual rise-to-threshold process (β = 1.32, P < 0.025); the whole results imply that a downstream neuron cannot judge whether it is population showed similar effects (Fig. 5e). For the focal population time to leave the patch solely by querying the output of dACC; it needs (n = 49), threshold rose with travel time (β = 0.92, P < 0.01), as did to have information about travel time, likely in the form of the change previous-trial responses (β = 0.69, P < 0.001). We observed a rise in in its threshold. We speculate that such control may be implemented threshold in 37 of 102 neurons (regression, P < 0.05, a negative effect through neuromodulatory inputs to the dACC, perhaps via dopamine in one neuron, no effect in the rest). or norepinephrine36,37. If travel time influences threshold levels, do thresholds actually Responses of dACC neurons do not uniquely represent any single remain constant for different leaving times, after accounting for variable in the MVT equations. The MVT is a description of forag- travel time? To answer this question, we measured neural responses ing behavior at the computational level, whereas our data support a on patch-leaving trials in five groups of travel times (0.5 to 2.5, 2.5 to particular mechanism by which the decision process could be imple- 4.5, 4.5 to 6.5, 6.5 to 8.5 and 8.5 to 10.5 s). Two facts suggest that they mented38. Thus, although we claim that our hypothesized decision were constant across earliness level (see above) after accounting for variable encodes the relative value of leaving a patch, we could just as travel time. First, in each of these travel time groups, earliness did not easily argue that it encodes the negative value of staying (indeed, this influence firing rate in the pre-saccadic epoch of the patch-leaving would be consistent with error-related theories of ACC function 16). trial (P > 0.2 in all cases). Second, there was no significant interaction Because these functional mechanisms resemble those known to sup- between earliness and travel time (P = 0.67, two-way ANOVA, four port basic perceptual and mnemonic decisions10–13,34, our findings levels of earliness and five travel times, main effect of earliness = 0.06 endorse the idea that the brain uses a small suite of common mecha- spikes per s per bin). nisms to solve diverse problems in multiple domains. Finally, we considered whether travel times influence baseline firing. The broad applicability of the MVT to such a wide array of organ- We reasoned that such an effect would be most apparent on the first isms underscores the fact that dACC is unlikely to be the sole neural few choices in a new patch. We found no effects of travel time on firing locus of the decision process, even in organisms with brains similar rate responses occurring in the first 5 s of a patch (ANOVA of normal- to ours. Indeed, these mechanisms may not even be limited to brains. ized firing rate against travel time, P = 0.41) and a significant effect in Many organisms that lack brains, including amoebas, slime molds nine neurons individually (P < 0.05). The frequency of these effects is and plants, exhibit behavior that is consistent with the MVT6,39,40. not much more than would be expected by chance (n = 5.1 neurons, We conjecture that such organisms solve the patchy foraging problem P = 0.067, binomial test), and the size of the effect was weak (average in much the same way that monkeys do, namely by maintaining and nature NEUROSCIENCE VOLUME 14 | NUMBER 7 | JULY 2011 937 a r t ic l e s controlling a representation of the relative value of leaving a patch. ences likely reflect task design. We used eight targets in the previous Thus, diverse organisms may solve common problems using similar study, but only two saccade targets here, thus weakening our sensitivity algorithms that are implemented very differently38. Even in organisms to spatial selectivity, especially bimodal tuning common in dACC21. with brains similar to ours, given the high redundancy of decision Also, the task used in the prior study demanded that monkeys carefully signals across brain areas in primates41, we predict that similar signals distinguish adjacent, physically similar targets to evaluate the associated might be observed in other regions, including the dorsolateral pre- reward outcomes, whereas the two targets were widely separated and frontal cortex, lateral intraparietal area and posterior cingulate cortex, physically distinct in the current study. We hypothesize that the greater although such signals might be convolved with other information demand for attentional resources associated with spatial locations in such as target location or movement metrics. that task accentuated spatial tuning in the earlier study. Relation to previous studies Conclusion Our findings are broadly consistent with prior studies showing that Our virtual foraging task is an ersatz idealization of a real patchy dACC monitors reward information from many sources and signals the foraging environment. Given that foraging often involves physical need to adjust behavior in some manner17,18,22,23,42–46. Our results cor- effort, these results are only a first step on the path to understanding roborate these earlier results and extend them in four important ways. real foraging decisions. Because of the clear links between ACC func- First, we found that dACC responses vary continuously with the extent tion and effortful choices, dACC seems particularly well positioned to which circumstances favor the decision to move on, even if leaving to guide real-world foraging choices and is likely involved in these does not occur. This observation supports the idea that dACC neurons choices. Thus, we believe that our results provide a useful advance represent a scalar decision variable reflecting the relative value of leav- toward understanding natural value–based decisions and forge a criti- ing. The relative value of switching behavior was not manipulated in cal link between systems neurobiology and behavioral ecology. © 2011 Nature America, Inc. All rights reserved. a previous study in which all non-switch trials were, in all important Animals’ bodies are have evolved to efficiently exploit the resources respects, the same23. Second, we found and measured a specific thresh- in their environments. Natural selection has also acted on the nervous old at which leaving, and by extension switching, occurs. Third, we systems of these animals to enable the adaptive action of their bod- identified two mechanisms by which exogenous factors govern patch- ies. Few studies have linked neural computations to specific types of leaving behavior. Finally, we found that neuronal activity in dACC naturally occurring foraging decisions. Our study portends a more promotes disengagement in a relatively natural task that is directly general understanding of prey selection, diet selection and more modeled on real-world foraging situations. These results directly link complex foraging problems3,47. Ultimately, these results endorse the dACC neuronal activity to behavior in a naturalistic context and may unification of theoretical and experimental work in the ecological extend to situations outside the laboratory in which dACC dysfunc- and neural sciences48. tion has been implicated, including addiction, depression, obsessive- compulsive disorder and Tourette syndrome25–27. Methods At first glance, our results appear to contradict those obtained by Methods and any associated references are available in the online an earlier study that reported increasing firing rates of dACC neurons version of the paper at http://www.nature.com/natureneuroscience/. with increasing proximity to reward24, whereas we found increasing Note: Supplementary information is available on the Nature Neuroscience website. firing rates in anticipation of sequentially smaller rewards. We believe that the two sets of findings are fully concordant. In our study, firing Acknowledgments rates rose as the monkey approached the decision to abandon the We thank S. Heilbronner for comments on design, analysis and writing. This research was supported by US National Institutes of Health grant R01EY013496 current patch for a new one. In the earlier study, firing rates increased (M.L.P.), a Fellowship from the Tourette Syndrome Association (B.Y.H.) and US as the monkey neared the rewarded action. In both cases, firing rates National Institutes of Health grant K99 DA027718-01 (B.Y.H.). of dACC neurons marked progression through a sequence of actions toward a salient behavioral event: the reward in their task, patch- AUTHOR CONTRIBUTIONS B.Y.H. designed the experiment and collected the data. B.Y.H. and J.M.P. leaving in ours. Together, our results suggest a broader view, namely contributed to data analysis. B.Y.H., J.M.P. and M.L.P. wrote the manuscript. that dACC neurons do not signal reward value per se, but rather that their responses encode an abstract decision variable that is suitable COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. for guiding a variety of different modifications in behavior, whether generated endogenously or exogenously. Published online at http://www.nature.com/natureneuroscience/. Our findings may also initially appear to contradict results from our Reprints and permissions information is available online at http://www.nature.com/ earlier studies of dACC neurons21,22. Previously, we found that the reprints/index.html. firing rates of dACC neurons reflected both real and fictive rewards, and generally did so with higher firing rates for larger rewards22. In 1. Prins, H.H.T. Ecology and Behavior of the African Buffalo: Social Inequality and that study, however, large rewards, both real and fictive, promoted Decision Making (Chapman and Hall, London, 1996). 2. Charnov, E.L. Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 9, a behavioral strategy that led to potentially larger rewards. Indeed, 129–136 (1976). trial-to-trial variations in firing rate in that study positively covaried 3. Stephens, D.W. & Krebs, J.R. Foraging Theory (Princeton University Press, Princeton, with likelihood of adjusting behavior on the next trial. In other words, NJ, 1986). 4. Bendesky, A., Tsunozaki, M., Rockman, M.V., Kruglyak, L. & Bargmann, C.I. Catecholamine higher firing rates in both studies predicted the likelihood that the receptor polymorphisms affect decision-making in C. elegans. Nature 472, monkey would successfully incorporate new information about the 313–318 (2011). 5. Thompson, D. & Fedak, M.A. How long should a dive last? A simple model of world into an ongoing decision to change behavior. foraging decisions by breath-hold divers in a patchy environment. Anim. Behav. 61, In another study, we found that neuronal activity in dACC showed 287–296 (2001). weak, but significant, selectivity for saccade direction, in addition to 6. McNickle, G.G. & Cahill, J.F. Jr. Plant root growth and the marginal value theorem. Proc. Natl. Acad. Sci. USA 106, 4747–4751 (2009). anticipated real and fictive reward size21. In contrast, here we found 7. Smith, E.A. & Winterhalder, B. Evolutionary Ecology and Human Behavior (de no evidence for spatial selectivity in neuronal responses. These differ- Gruyer, New York, 1992). 938 VOLUME 14 | NUMBER 7 | JULY 2011 nature NEUROSCIENCE a r t ic l e s 8. Agetsuma, N. Simulation of patch use by monkeys using operant conditioning. 30. Kim, S., Hwang, J. & Lee, D. Prefrontal coding of temporally discounted values J. Ethology 16, 49–55 (1999). during intertemporal choice. Neuron 59, 161–172 (2008). 9. Gold, J.I. & Shadlen, M.N. The neural basis of decision making. Annu. Rev. 31. Louie, K. & Glimcher, P.W. Separating value from choice: delay discounting activity Neurosci. 30, 535–574 (2007). in the lateral intraparietal area. J. Neurosci. 30, 5498–5507 (2010). 10. Gold, J.I. & Shadlen, M.N. Banburismus and the brain: decoding the relationship 32. Kahneman, D., Knetsch, J.L. & Thaler, R.H. Anomalies: the endowment effect, loss between sensory stimuli, decisions, and reward. Neuron 36, 299–308 (2002). aversion, and the status quo bias. J. Econ. Perspect. 5, 193–206 (1991). 11. Hanes, D.P. & Schall, J.D. Neural control of voluntary movement initiation. Science 274, 33. Mazur, J.E. An adjusting procedure for studying delayed reinforcement. in 427–430 (1996). Quantitative Analyses of Behavior, vol 5. The Effect of Delay and Intervening Events 12. Schall, J.D. On building a bridge between brain and behavior. Annu. Rev. Psychol. 55, on Reinforcement Value (eds. Commons, M.L., Mazur, J.E., Nevin, J.A. & Rachlin, H.) 23–50 (2004). (Erlbaum, Mahway, New Jersey, 1987). 13. Carpenter, R.H.S. Movements of the Eyes (Pion, London, 1988). 34. Roitman, J.D. & Shadlen, M.N. Response of neurons in the lateral intraparietal 14. Kennerley, S.W., Walton, M.E., Behrens, T.E., Buckley, M.J. & Rushworth, M.F. Optimal area during a combined visual discrimination reaction time task. J. Neurosci. 22, decision making and the anterior cingulate cortex. Nat. Neurosci. 9, 940–947 (2006). 9475–9489 (2002). 15. Rushworth, M.F. & Behrens, T.E. Choice, uncertainty and value in prefrontal and 35. Horwitz, G.D., Batista, A.P. & Newsome, W.T. Representation of an abstract cingulate cortex. Nat. Neurosci. 11, 389–397 (2008). perceptual decision in macaque superior colliculus. J. Neurophysiol. 91, 16. Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: reinforcement 2281–2296 (2004).

Neuronal Basis of Sequential Foraging Decisions in a Patchy Environment PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue