The Now-or-Never Bottleneck: A Fundamental Constraint on Language PDF

BEHAVIORAL AND BRAIN SCIENCES (2016), Page 1 of 72 doi:10.1017/S0140525X1500031X, e62 The Now-or-Never bottleneck: A fundamental constraint on language Morten H. Christiansen...

BEHAVIORAL AND BRAIN SCIENCES (2016), doi:10.1017/S0140525X1500031X, e62 The Now-or-Never bottleneck: A fundamental constraint on language Abstract: Memory is ﬂeeting. New material rapidly obliterates continual deluge of linguistic input? We argue that, to recode linguistic input as rapidly as possible. This observation language system must “eagerly” recode and compress the language system must build a multilevel linguistic information predictively to ensure that local linguistic there is no way for the language system to recover. This in the here and now, which implies that language acquisition perspective provides a cognitive foundation for grammaticalization also helps explain a variety of core properties of language, This approach promises to create a direct relationship framework within which to integrate often disconnected change and evolution. Keywords: chunking; grammaticalization; incremental interpretation; online learning; prediction; processing bottleneck; psycholingui 1. Introduction Language is ﬂeeting. As we hear a sentence unfold, we rapidly lose our memory for preceding material. Speakers, too, soon lose track of the details of what they have just said. Language processing is therefore “Now-or-Never”: If lin- guistic information is not processed rapidly, that informa- tion is lost for good. Importantly, though, while fundamentally shaping language, the Now-or-Never bottle- neck1 is not speciﬁc to language but instead arises from general principles of perceptuo-motor processing and memory. The existence of a Now-or-Never bottleneck is relatively uncontroversial, although its precise character may be debated. However, in this article we argue that the conse- quences of this constraint for language are remarkably far-reaching, touching on the following issues: 1. The multilevel organization of language into sound- based units, lexical and phrasal units, and beyond; 2. The prevalence of local linguistic relations (e.g., in phonology and syntax); 3. The incrementality of language processing; 4. The use of prediction in language interpretation and production; © Cambridge University Press 2016 0140-525X/16 https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never be incremental on functional grounds, to extract the linguis- tic message as rapidly as possible. Such counterfactuals of course, difﬁcult to evaluate. By contrast, the Now-or-Never bottleneck arise from basic information processing limitations that are directly testable by experi- ment. Moreover, the Now-or-Never bottleneck should, we suggest, have methodological priority to the extent that it provides an integrated framework for explaining many aspects of language structure, acquisition, processing, and evolution that have previously been treated separately. In Figure 1, we illustrate the overall structure of the argu- ment in this article. We begin, in the next section, by brieﬂy making the case for the Now-or-Never bottleneck as a general constraint on perception and action. We then discuss the implications of this constraint for language pro- cessing, arguing that both comprehension and production involve what we call “Chunk-and-Pass” processing: incre- mentally building chunks at all levels of linguistic structure as rapidly as possible, using all available information predic- tively to process current input before new information arrives (sect. 3). From this perspective, language acquisition involves learning to process: that is, learning and use chunks appropriately for the language being learned (sect. 4). Consequently, short-term language change and longer-term processes of language evolution arise through variation in the system of chunks and their composition, sug- gesting an item-based theory of language change (sect. 5). This approach points to a processing-based interpretation of construction grammar, in which constructions corre- spond to chunks, and where grammatical structure is funda- mentally the history of language processing operations within the individual speaker/hearer (sect. 6). We conclude by brieﬂy summarizing the main points of our argument. MORTEN H. CHRISTIANSEN is Professor of and Co-Director of the Cognitive Science Cornell University as well as Senior Scientist at the Haskins Labs and Professor of Child Language at the Interacting Minds Centre at Aarhus University. He is the author of more than 170 scientiﬁc papers and has written or edited ﬁve books. His research focuses on the interaction of biological and environmental con- straints in the processing, acquisition, and evolution of language, using a combination of computational, behav- ioral, and cognitive neuroscience methods. He is a Fellow of the Association for Psychological and he delivered the 2009 Nijmegen Lectures. NICK CHATER is Professor of Behavioural Science at Warwick Business School, United Kingdom. He is the author of more than 250 scientiﬁc publications in psychology, philosophy, linguistics, and cognitive science, and he has written or edited ten books. He has served as Associate Editor for Cognitive Science, Psychological Review, Psychological Science, and Man- agement Science. His research explores the cognitive and social foundations of human rationality, on formal models of inference, choice, He is a Fellow of the Cognitive Science Society, the Association for Psychological Science, and the British Academy. 2 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Figure 1. The structure of our argument, in which implicational bottleneck provides a fundamental constraint on perception hence outside the diamond in the ﬁgure). Speciﬁc implications bottleneck’s necessitating of Chunk-and-Pass language Now-or-Never bottleneck on both processing and acquisition claims concerning Chunk-and-Pass processing, acquisition shaded upper triangle) combine to shape the structure of multiple inputs arriving in quick succession. Similar limita- tions apply to the production of behavior: The cognitive system cannot plan detailed sequences of movements – a long sequence of commands planned far in advance would lead to severe interference and be forgotten before it could be carried out (Cooper & Shallice 2006; Miller et al. 1960). However, the cognitive system adopts several processing strategies to ameliorate the effects of the Now-or-Never bottleneck on perception and action. First, the cognitive system engages in eager processing: It must recode the rich perceptual input as it arrives to capture the key elements of the sensory information as eco- nomically, and as distinctively, as possible (e.g., Brown et al. 2007; Crowder & Neath 1991); and it must do so rapidly, before new input overwrites or interferes with the sensory information. This notion is a traditional one, dating back to early work on attention and sensory memory (e.g., Broadbent 1958; Coltheart 1980; Haber 1983; Sperling 1960; Treisman 1964). The resulting com- pressed representations are lossy: They provide only an ab- stract summary of the input, from which the rich sensory input cannot be recovered (e.g., Pani 2000). Evidence from the phenomena of change and inattentional blindness suggests that these compressed representations can be very selective (see Jensen et al. 2011 for a review), as exempli- ﬁed by a study in which half of the participants failed to notice that someone to whom they were giving directions, https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never (Cowan 2000), into a single unit (corresponding to running times, dates, or human ages), and then grouped sequences of three to four of those chunks into larger chunks. Inter- estingly, SF also verbally produced items in overtly discern- ible chunks, interleaved with pauses, indicating how action also follows the reverse process (e.g., Lashley 1951; Miller 1956). The case of SF further demonstrates that low-level information is far better recalled when organized into higher-level structures than merely coded as an unorga- nized stream. Note, though, that lower-level information is typically forgotten; it seems unlikely that even SF could recall the speciﬁc visual details of the digits with which he was presented. More generally, the notion that percep- tion and action involve representational recoding at a suc- cession of distinct representational levels also ﬁts with a long tradition of theoretical and computational models in cognitive science and computer vision (e.g., Bregman 1990; Marr 1982; Miller et al. 1960; Zhu et al. 2010; see Gobet et al. 2001 for a review). Our perspective on repeat- ed multilevel compression is also consistent with data from functional magnetic resonance imaging (fMRI) and intra- cranial recordings, suggesting cortical hierarchies across vision and audition – from low-level sensory to high-level perceptual and cognitive areas – integrating information at progressively longer temporal windows (Hasson et al. 2008; Honey et al. 2012; Lerner et al. 2011). Third, to facilitate speedy chunking and hierarchical compression, the cognitive system employs anticipation, using prior information to constrain the recoding of current perceptual input (for reviews see Bar 2007; Clark 2013). For example, people see the exact same collection of pixels either as a hair dryer (when viewed as part of a bathroom scene) or as a drill (when embedded in a picture of a workbench) (Bar 2004). Therefore, using prior information to predict future input is likely to be es- sential to successfully encoding that future input (as well as helping us to react faster to such input). Anticipation allows faster, and hence more effective, recoding when on- coming information creates considerable time urgency. Such predictive processing will be most effective to the extent that the greatest possible amount of available infor- mation (across different types and levels of abstraction) is integrated as fast as possible. Similarly, anticipation is im- portant for action as well. For example, manipulating an Table 1. Summary of the Strategies Mechanisms Eager processing Lossy chunking Multiple levels of Hierarchical compression representation Anticipation Predictive processing 4 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & to speciﬁc processing limitations, such as the Now-or- Never bottleneck (for this style of approach, see, e.g., Chater et al. 1998; Levy 2008). Here, though, our focus is primarily on mechanism rather than rationality. 3. Chunk-and-Pass language processing The ﬂeeting nature of linguistic input, in combination with the impressive speed with which words and signs are pro- duced, imposes a severe constraint on the language system: the Now-or-Never bottleneck. Each new incoming word or sign will quickly interfere with previous heard and seen input, providing a naturalistic version of the masking used in psychophysical experiments. How, then, is language comprehension possible? Why doesn’t interference between successive sounds (or signs) obliterate linguistic input before it can be understood? The answer, we suggest, is that our language system rapidly recodes this input into chunks, which are immediately passed to a higher level of linguistic representation. The chunks at this higher level are then themselves subject to the same Chunk-and-Pass procedure, resulting in progressively larger chunks of increasing linguistic abstraction. Crucially, given that the chunks recode increasingly larger stretches of input from lower levels of representation, the chunking process enables input to be maintained over ever-larger temporal windows. It is this repeated chunking of lower- level information that makes it possible for the language system to deal with the continuous deluge of input that, if not recoded, is rapidly lost. This chunking process is also what allows us to perceive speech at a much faster rate than nonspeech sounds (Warren et al. 1969): We have learned to chunk the speech stream. Indeed, we can easily understand (and sometimes even repeat back) sentences consisting of many tens of phonemes, despite our severe memory limitations for sequences of nonspeech sounds. What we are proposing is that during comprehension, the language system – similar to SF – must keep on chunk- ing the incoming information into increasingly abstract levels of representation to avoid being overwhelmed by the input. That is, the language system engages in eager processing when creating chunks. Chunks must be built right away, or memory for the input will be obliterated by interference from subsequent material. If a phoneme or syllable is recognized, then it is recoded as a chunk and passed to a higher level of linguistic abstraction. And once recoded, the information is no longer subject to inter- ference from further auditory input. A general principle of perception and memory is that interference arises primarily between overlapping representations (Crowder & Neath 1991; Treisman & Schmidt 1982); crucially, recoding avoids such overlap. For example, phonemes interfere with each other, but phonemes interfere very little with words. At each level of chunking, information from the pre- vious level(s) is compressed and passed up as chunks to the next level of linguistic representation, from sound-based chunks up to complex discourse elements.3 As a conse- quence, the rich detail of the original input can no longer be recovered from the chunks, although some key informa- tion remains (e.g., certain speaker characteristics; Nygaard et al. 1994; Remez et al. 1997). In production, the process is reversed: Discourse-level chunks are recursively broken down into subchunks of https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never example, do classic models of parsing ﬁt within this frame- work? A wide range of psychologically inspired models in- volves some degree of incrementality of syntactic analysis, which can potentially support incremental interpretation (e.g., Phillips 1996; 2003; Winograd 1972). For example, the sausage machine parsing model (Frazier & Fodor 1978) proposes that a preliminary syntactic analysis is carried out phrase-by-phrase, but in complete isolation from semantic or pragmatic factors. But for a right-branch- ing language such as English, chunks cannot be built left- to-right, because the leftmost chunks are incomplete until later material has been encountered. Frameworks from Kimball (1973) onward imply “stacking up” incomplete constituents that may then all be resolved at the end of the clause. This approach runs counter to the memory con- straints imposed by the Now-or-Never bottleneck. Recon- ciling right-branching with incremental chunking and processing is one motivation for the ﬂexible constituency of combinatory categorial grammar (e.g., Steedman 1987; 2000; see also Johnson-Laird 1983). With respect to comprehension, considerable evidence supports incremental interpretation, going back more than four decades (e.g., Bever 1970; Marslen-Wilson 1975). The language system uses all available information to rapidly integrate incoming information as quickly as pos- sible to update the current interpretation of what has been said so far. This process includes not only sentence-internal information about lexical and structural biases (e.g., Farmer et al. 2006; MacDonald 1994; Trueswell et al. 1993), but also extra-sentential cues from the referential and pragmatic context (e.g., Altmann & Steedman 1988; Thornton et al. 1999) as well as the visual environment and world knowledge (e.g., Altmann & Kamide 1999; Tanenhaus et al. 1995). As the incoming acoustic informa- tion is chunked, it is rapidly integrated with contextual in- formation to recognize words, consistent with a variety of data on spoken word recognition (e.g., Marslen-Wilson 1975; van den Brink et al. 2001). These words are then, in turn, chunked into larger multiword units, as evidenced by recent studies showing sensitivity to multiword sequenc- es in online processing (e.g., Arnon & Christiansen 2007b; Siyanova-Chanturia et al. 2011; Trem- blay & Baayen 2010; Tremblay et al. 2011), and subse- quently further integrated with pragmatic context into discourse-level structures. Turning to production, we start by noting the powerful intuition that we speak “into the void” – that is, that we plan only a short distance ahead. Indeed, experimental studies suggest that, for example, when producing an utter- ance involving several noun phrases, people plan just one (Smith & Wheeldon 1999), or perhaps two, noun phrases ahead (Konopka 2012), and they can modify a message during production in the light of new perceptual input (Brown-Schmidt & Konopka 2015). Moreover, speech- error data (e.g., Cutler 1982) reveal that, across representa- tional levels, errors tend to be highly local: Phonological, morphemic, and syntactic errors apply to neighboring chunks within each level (where material may be moved, swapped, or deleted). Consequently, speech planning appears to involve just a small number of chunks – the number of which may be similar across linguistic levels – but which covers different amounts of time depending on the linguistic level in question. For example, planning in- volving chunks at the level of intonational bursts stretches 6 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & for example, by increased reading times (e.g., Trueswell et al. 1994) and distinctive patterns of brain activity (as measured by ERPs; Swaab et al. 2003). Accordingly, when the input is ambiguous, the language system may require later input to recognize previous elements of the speech stream successfully. The Now-or-Never bottleneck requires that such online “right-context effects” be highly local because raw perceptual input will be lost if it is not rapidly identiﬁed (e.g., Dahan 2010). Right-context effects may arise where the language system can delay res- olution of ambiguity or use underspeciﬁed representations that do not require resolving the ambiguity right away. Sim- ilarly, cataphora, in which, for example, a referential pronoun occurs before its referent (e.g., “He is a nice guy, that John”) require the creation of an underspeciﬁed entity (male, animate) when he is encountered, which is re- solved to be coreferential with John only later in the sen- tence (e.g., van Gompel & Liversedge 2003). Overall, the Now-or-Never bottleneck implies that the processing system will build the most abstract and complete represen- tation that is justiﬁed, given the linguistic input.6 Of course, outside of experimental studies, background knowledge, visual context, and prior discourse will provide powerful cues to help resolve ambiguities in the signal, allowing the system rapidly to resolve many apparent ambiguities without incurring a substantial danger of “garden-pathing.” Indeed, although syntactic and lexical ambiguities have been much studied in psycholinguistics, increasing evidence indicates that garden paths are not a major source of processing difﬁculty in practice (e.g., Fer- reira 2008; Jaeger 2010; Wasow & Arnold 2003).7 For example, Roland et al. (2006) reported corpus analyses showing that, in naturally occurring language, there is gen- erally sufﬁcient information in the sentential context before Figure 2. Chunk-and-Pass processing across a variety of increasingly abstract levels of linguistic representations which information can be maintained increases, as indicated This process is reversed in production planning, in which units, from a discourse-level message to the motor commands representations correspond to longer chunks of linguistic abstraction. Production processes may further serve as down information in comprehension. (Note that the names https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never next level up: Local groups of words must be chunked into larger units, possibly phrases or other forms of multiword sequences. Subsequent chunking then recodes these repre- sentations into higher-level discourse structures (that may themselves be chunked further into even more abstract representational structures beyond that). Similarly, produc- tion requires running the process in reverse, starting with the intended message and gradually decoding it into in- creasingly more speciﬁc chunks, eventually resulting in the motor programs necessary for producing the relevant speech or sign output. As we discuss in section 3.3, the pro- duction process may further serve as the basis for predic- tion during comprehension (allowing higher-level information to inﬂuence the processing of current input). More generally, our account is agnostic with respect to the speciﬁc characterization of the various levels of linguis- tic representation8 (e.g., whether sound-based chunks take the form of phonemes, syllables, etc.). What is central for the Chunk-and-Pass account: some form of sound-based level of chunking (or visual-based in the case of sign lan- guage), and a sequence of increasingly abstract levels of chunked representations into which the input is continually recoded. A key theoretical implication of Chunk-and-Pass pro- cessing is that the multiple levels of linguistic representa- tion, typically assumed in the language sciences, are a necessary by-product of the Now-or-Never bottleneck. Only by compressing the input into chunks and passing them to increasingly abstract levels of linguistic representa- tion can the language system deal with the rapid onslaught of incoming information. Crucially, though, our perspective also suggests that the different levels of linguistic represen- tations do not have a true part–whole relationship with one another. Unlike in the case of SF, who learned strategies to perfectly unpack chunks from within chunks to reproduce the original string of digits, language comprehension typi- cally employs lossy compression to chunk the input. That is, higher-level chunks will not in general contain complete copies of lower-level chunks. Indeed, as speech input is encoded into ever more abstract chunks, increasing amounts of low-level information will typically be lost. Instead, as in perception (e.g., Haber 1983), there is greater representational underspeciﬁcation with higher levels of representation because of the repeated process of lossy compression.9 Thus, we would expect a growing in- volvement of extralinguistic information, such as perceptu- al input and world knowledge, in processing higher levels of linguistic representation (see, e.g., Altmann & Kamide 2009). Whereas our account proposes a lossy hierarchy across levels of linguistic representation, only a very small number of chunks are represented within a level: other- wise, information is rapidly lost due to interference. This has the crucial implication that chunks within a given level can interact only locally. For example, acoustic infor- mation must rapidly be coded in a non-acoustic form, say, in terms of phonemes; but this is only possible if phonemes correspond to local chunks of acoustic input. The process- ing bottleneck therefore enforces a strong pressure toward local dependencies within a given linguistic level. Impor- tantly, though, this does not imply that linguistic relations are restricted only to adjacent elements but, instead, that they may be formed between any of the small number of elements maintained at a given level of representation. 8 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & 3.3. Implications of Strategy 3: Predictive language processing We have already noted that, to be able to chunk incoming information as fast and as accurately as possible, the lan- guage system exploits multiple constraints in parallel across the different levels of linguistic representation. Such cues may be used not only to help disambiguate pre- vious input, but also to generate expectations for what may come next, potentially further speeding up Chunk-and-Pass processing. Computational considerations indicate that simple statistical information gleaned from sentences provides powerful predictive constraints on language com- prehension and can explain many human processing results (e.g., Christiansen & Chater 1999; Christiansen & MacDonald 2009; Elman 1990; Hale 2006; Jurafsky 1996; Levy 2008; Padó et al. 2009). Similarly, eye-tracking data suggest that comprehenders routinely use a variety of sources of probabilistic information – from phonological cues to syntactic context and real-world knowledge – to an- ticipate the processing of upcoming words (e.g., Altmann & Kamide 1999; Farmer et al. 2011; Staub & Clifton 2006). Results from event-related potential experiments indicate that rather speciﬁc predictions are made for upcoming input, including its lexical category (Hinojosa et al. 2005), grammatical gender (Van Berkum et al. 2005; Wicha et al. 2004), and even its onset phoneme (DeLong et al. 2005) and visual form (Dikker et al. 2010). Accordingly, there is a growing body of evidence for a substantial role of prediction in language processing (for reviews, see, e.g., Federmeier 2007; Hagoort 2009; Kamide 2008; Kutas et al. 2014; Pickering & Garrod 2007) and evidence that such language prediction occurs in children as young as 2 years of age (Mani & Huettig 2012). Importantly, as well as exploiting statistical relations within a representational level, predictive processing allows top-down information from higher levels of linguistic representation to rapidly constrain the processing of the input at lower levels.10 From the viewpoint of the Now-or-Never bottleneck, prediction provides an opportunity to begin Chunk-and- Pass processing as early as possible: to constrain represen- tations of new linguistic material as it is encountered, and even incrementally to begin recoding predictable linguistic input before it arrives. This viewpoint is consistent with recent suggestions that the production system may be pressed into service to anticipate upcoming input (e.g., Pickering & Garrod 2007; 2013a). Chunk-and-Pass pro- cessing implies that there is practically no possibility for going back once a chunk is created because such backtrack- ing tends to derail processing (e.g., as in the classic garden path phenomena mentioned above). This imposes a Right- First-Time pressure on the language system in the face of linguistic input that is highly locally ambiguous.11 The con- tribution of predictive modeling to comprehension is that it facilitates local ambiguity resolution while the stimulus is still available. Only by recruiting multiple cues and integrat- ing these with predictive modeling is it possible to resolve local ambiguities quickly and correctly. Right-First-Time parsing ﬁts with proposals such as that by Marcus (1980), where local ambiguity resolution is delayed until later disambiguating information arrives, and models in which aspects of syntactic structure may be underspeciﬁed, therefore not requiring the ambiguity to be resolved (e.g., Gorrell 1995; Sturt & Crocker 1996). https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never abstract levels of representation, can deal with the on- slaught of linguistic input in the face of the severe memory constraints of the Now-or-Never bottleneck. We suggest that a productive line of future work is to consider the extent to which existing models of language are compat- ible with these constraints, and to use these properties to guide the creation of new theories of language processing. 4. Acquisition is learning to process If speaking and understanding language involves and-Pass processing, then acquiring a language learning how to create and integrate the right rapidly, before current information is overwritten by new input. Indeed, the ability to quickly process linguistic input – which has been proposed as an indicator of chunk- ing ability (Jones 2012) – is a strong predictor of language acquisition outcomes from infancy to middle childhood (Marchman & Fernald 2008). The importance of this process is also introspectively evident to anyone acquiring a second language: Initially, even segmenting the speech stream into recognizable sounds can be challenging, let alone parsing it into words or processing morphology and grammatical relations rapidly enough to build a seman- tic interpretation. The ability to acquire and rapidly deploy a hierarchy of chunks at different linguistic scales is parallel to the ability to chunk sequences of motor movements, numbers, or chess positions: It is a skill, built up by contin- ual practice. Viewing language acquisition as continuous with other types of skill learning is very different from the standard formulation of the problem of language acquisition in lin- guistics. There, the child is viewed as a linguistic theorist who has the goal of inferring an abstract grammar from a corpus of example sentences (e.g., Chomsky 1957; 1965) and only secondarily learning the skill of generating and un- derstanding language. But perhaps the child is not a mini- linguist. Instead, we suggest that language acquisition is nothing more than learning to process: to turn meanings into streams of sound or sign (when generating language), and to turn streams of sound or sign back into meanings (when understanding language). If linguistic input is available only ﬂeetingly, then any learning must occur while that information is present; that is, learning must occur in real time, as the Chunk- and-Pass process takes place. Accordingly, any modiﬁca- tions to the learner’s cognitive system in light of processing must, according to the Now-or-Never bottleneck, occur at the time of processing. The learner must learn to chunk the input appropriately – to learn to recode the input at succes- sively more abstract linguistic levels; and to do this requires, of course, learning the structure of the language being spoken. But how is this structure learned? We suggest that, in language acquisition, as in other areas of perceptual-motor learning, people learn by processing, and that past processing leaves traces that can facilitate future processing. What, then, is retained, so that language processing gradually improves? We can consider various possibilities: For example, the weights of a connectionist network can be updated online in the light of current pro- cessing (Rumelhart et al. 1986a); in an exemplar-based model, traces of past examples can be reused in the future (e.g., Hintzman 1988; Logan 1988; Nosofsky 10 BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & processing time: Phonological information that is not chunked at the morphological level and beyond will be obliterated by oncoming phonological material.12 So, if learning is shaped by the Now-or-Never bottle- neck, then linguistic input must, when it is encountered, be recoded successively at increasingly abstract linguistic levels if it is to be retained at all – a constraint imposed, we argue, by basic principles of memory. Crucially, such in- formation is not, therefore, in a suitably “neutral” format to allow for the discovery of previously unsuspected linguistic regularities. In a nutshell, the lossy compression of the lin- guistic input is achieved by applying the learner’s current model of the language. But information that would point toward a better model of the language (if examined in ret- rospect) will typically be lost (or, at best, badly obscured) by this compression, precisely because those regularities are not captured by the current model of the language. Suppose, for example, that we create a lossy encoding of language using a simple, context-free phrase structure grammar that cannot handle, say, noun-verb agreement. The lossy encoding of the linguistic input produced using this grammar will provide a poor basis for learning a more sophisticated grammar that includes agreement – precisely because agreement information will have been thrown away. So the Now-or-Never bottleneck rules out the possibility that the learner can survey a neutral database of linguistic material, to optimize its model of the language. The emphasis on online learning does not, of course, rule out the possibility that any linguistic material that is re- membered may subsequently be used to inform learning. But according to the present viewpoint, any further learn- ing requires reprocessing that material. So if a child comes to learn a poem, song, or story verbatim, the child might extract more structure from that material by mental re- hearsal (or, indeed, by saying it aloud). The online learning constraint is that material is learned only when it is being processed – ruling out any putative learning processes that involve carrying out linguistic analyses or compiling statistics over a stored corpus of linguistic material. If this general picture of acquisition as learning-to- process is correct, then we should expect the exploitation of memory to require “replaying” learned material, so that it can be re-processed. Thus, the application of memory itself requires passing through the Now-or- Never bottleneck – there is no way of directly interrogating an internal database of past experience; indeed, this view- point ﬁts with our subjective sense that we need to “bring to mind” past experiences or rehearse verbal material to process it further. Interestingly, there is now also substan- tial neuroscientiﬁc evidence that replay does occur (e.g., in rat spatial learning, Carr et al. 2011). Moreover, it has long been suggested that dreaming may have a related function (here using “reverse” learning over “ﬁctional” input to elim- inate spurious relationships identiﬁed by the brain, Crick & Mitchison 1983; see Hinton & Sejnowki 1986, for a closely related computational model). Deﬁcits in the ability to replay material would, in this view, lead to consequent def- icits in memory and inference; consistent with this view- point, Martin and colleagues have argued that rehearsal deﬁcits for phonological pattern and semantic information may lead to difﬁculties in the long-term acquisition and re- tention of word forms and word meanings, respectively, and their use in language processing (e.g., Martin & He 2004; Martin et al. 1994). In summary, then, language https://doi.org/10.1017/S0140525X1500031X Published online by Cambridge Christiansen & Chater: The Now-or-Never By contrast, the principle of local learning is respected by other approaches. For example, item-based (Tomasello 2003), connectionist (e.g., Chang et al. 1999; Elman 1990; MacDonald & Christiansen 2002),15 exemplar- based (e.g., Bod 2009), and other usage-based (e.g., Arnon & Snider 2010; Bybee 2006) accounts of language acquisition tie learning and processing together – and assume that language is acquired piecemeal, in the absence of an underlying Bauplan. Such accounts, based on local learning, provide a possible explanation of the fre- quency effects that are found at all levels of language pro- cessing and acquisition (e.g., Bybee 2007; Bybee & Hopper 2001; Ellis 2002; Tomasello 2003), analogous to exemplar- based theories of how performance speeds up with practice (Logan 1988). The local nature of learning need not, though, imply that language has no integrated structure. Just as in perception and action, local chunks can be deﬁned at many different levels of abstraction, including highly abstract patterns, for example, governing subject, verb, and object; and gen- eralizations from past processing to present processing will operate across all of these levels. Therefore, in generating Page 1 of 72 Morten H. Christiansen Department of Psychology, Cornell University, Ithaca, NY 14853 The Interacting Minds Centre, Aarhus University, 8000 Aarhus C, Denmark Haskins Laboratories, New Haven, CT 06511 [email protected] Nick Chater Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, United Kingdom [email protected] previous material. How, then, can the brain deal successfully with the deal with this “Now-or-Never” bottleneck, the brain must compress and has strong implications for the nature of language processing: (1) the linguistic input; (2) as the bottleneck recurs at each new representational level, representation; and (3) the language system must deploy all available ambiguities are dealt with “Right-First-Time”; once the original input is lost, is “Chunk-and-Pass” processing. Similarly, language learning must also occur is learning to process, rather than inducing, a grammar. Moreover, this and other aspects of language change. Chunk-and-Pass processing including its multilevel representational structure and duality of patterning. between psycholinguistics and linguistic theory. More generally, we outline a inquiries into language processing, language acquisition, and language language acquisition; language evolution; language processing; stics 5. The nature of what is learned during language acquisition; 6. The degree to which language acquisition involves item-based generalization; 7. The degree to which language change proceeds item- by-item; 8. The connection between grammar and lexical knowledge; 9. The relationships between syntax, semantics, and pragmatics. Thus, we argue that the Now-or-Never bottleneck has fundamental implications for key questions in the language sciences. The consequences of this constraint are, more- over, incompatible with many theoretical positions in lin- guistic, psycholinguistic, and language acquisition research. Note, however, that arguing that a phenomenon arises from the Now-or-Never bottleneck does not necessarily undermine alternative explanations of that phenomenon (although it may). Many phenomena in language may simply be overdetermined. For example, we argue that incrementality (point 3, above) follows from the Now-or- Never bottleneck. But it is also possible that, irrespective of memory constraints, language understanding would still 1 University Press bottleneck: A fundamental constraint on language 2. The Now-or-Never bottleneck are, the properties of Language input is highly transient. Speech sounds, like other auditory signals, are short-lived. Classic speech per- ception studies have shown that very little of the auditory trace remains after 100 ms (Elliott 1962), with more recent studies indicating that much acoustic information already is lost after just 50 ms (Remez et al. 2010). Similar- ly, and of relevance for the perception of sign language, studies of visual change detection suggest that the ability to maintain visual information beyond 60–70 ms is very limited (Pashler 1988). Thus, sensory memory for language input is quickly overwritten, or interfered with, by new in- coming information, unless the perceiver in some way pro- cesses what is heard or seen. The problem of the rapid loss of the speech or sign signal is further exacerbated by the sheer speed of the incoming linguistic input. At a normal speech rate, speakers produce about 10–15 phonemes per second, corresponding to roughly 5–6 syllables every second or 150 words per minute (Studdert-Kennedy 1986). However, the resolution rapidly to create of the human auditory system for discrete auditory events is only about 10 sounds per second, beyond which the sounds fuse into a continuous buzz (Miller & Taylor 1948). Conse- quently, even at normal rates of speech, the language system needs to work beyond the limits of auditory tempo- ral resolution for nonspeech stimuli. Remarkably, listeners can learn to process speech in their native language at up to twice the normal rate without much decrement in compre- hension (Orr et al. 1965). Although the production of signs appears to be slower than the production of speech (at least when comparing the production of ASL signs and spoken English; Bellugi & Fischer 1972), signed words are still very brief visual events, with the duration of an ASL syllable being about a quarter of a second (Wilbur & Nolkn 1986).2 Making matters even worse, our memory for sequences of auditory input is also very limited. For example, it has been known for more than four decades that naïve listeners are unable to correctly recall the temporal order of just four Psychology Program at distinct sounds – for example, hisses, buzzes, and tones – even when they are perfectly able to recognize and label each individual sound in isolation (Warren et al. 1969). Our ability to recall well-known auditory stimuli is not sub- stantially better, ranging from 7 ± 2 (Miller 1956) to 4 ± 1 (Cowan 2000). A similar limitation applies to visual memory for sign language (Wilson & Emmorey 2006). The poor memory for auditory and visual information, com- bined with the fast and ﬂeeting nature of linguistic input, imposes a fundamental constraint on the language Science, system: the Now-or-Never bottleneck. If the input is not processed immediately, new information will quickly over- write it. Importantly, the Now-or-Never bottleneck is not unique to language but applies to other aspects of perception and action as well. Sensory memory is rich in detail but decays rapidly unless it is further processed (e.g., Cherry 1953; Coltheart 1980; Sperling 1960). Likewise, short-term memory for auditory, visual, and haptic information is also limited and subject to interference from new input focusing and language. (e.g., Gallace et al. 2006; Haber 1983; Pavani & Turatto 2008). Moreover, our cognitive ability to respond to sensory input is further constrained in a serial (Sigman & Dehaene 2005) or near-serial (Navon & Miller 2002) manner, severely restricting our capacity for processing University Press Chater: The Now-or-Never bottleneck: A fundamental constraint on language relations between claims are denoted by arrows. The Now-or-Never and action that is independent of its application to the language system (and for language (indicated inside the diamond) stem from the Now-or-Never processing, with key consequences for language acquisition. The impact of the together further shapes language change. All three of these interlinked as processing, and item-based language change (grouped together in the language itself. face-to-face, was surreptitiously exchanged for a complete- ly different person (Simons & Levin 1998). Information not encoded in the short amount of time during which the sensory information is available will be lost. Second, because memory limitations also apply to recoded representations, the cognitive system further chunks the compressed encodings into multiple levels of representation of increasing abstraction in perception, and decreasing levels of abstraction in action. Consider, for example, memory for serially ordered symbolic infor- mation, such as sequences of digits. Typically, people are quickly overloaded and can recall accurately only the last three or four items in a sequence (e.g., Murdock 1968). But it is possible to learn to rapidly encode, and recall, long random sequences of digits, by successively chunking such sequences into larger units, chunking those chunks into still larger units, and so on. Indeed, an extended study of a single individual, SF (Ericsson et al. 1980), showed that repeated chunking in this manner makes it possible to recall with high accuracy sequences containing as many as 79 digits. But, crucially, this strategy requires learning to encode the input into multiple, successive, and distinct levels of representations – each sequence of chunks at one level must be shifted as a single chunk to a higher level before more chunks interfere with or overwrite the initial chunks. Indeed, SF chunked sequences of three or four digits, the natural chunk size in human memory BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 3 University Press bottleneck: A fundamental constraint on language object requires anticipating the grip force required to deal with the loads generated by the accelerations of the object. Grip force is adjusted too rapidly during the manip- ulation of an object to rely on sensory feedback (Flanagan & Wing 1997). Indeed, the rapid prediction of the sensory consequences of actions (e.g., Poulet & Hedwig 2006) sug- gests the existence of so-called forward models, which allow the brain to predict the consequence of its actions in real time. Many have argued (e.g., Wolpert et al. 2011; see also Clark 2013; Pickering & Garrod 2013a) that forward models are a ubiquitous feature of the computational ma- chinery of motor control and more broadly of cognition. The three processing strategies we mention here – eager processing, computing multiple representational levels, and anticipation – provide the cognitive system with impor- tant means to cope with the Now-or-Never bottleneck. Next, we argue that the language system implements similar strategies for dealing with the here-and-now nature of linguistic input and output, with wide-reaching and fundamental implications for language processing, ac- quisition and change as well as for the structure of language itself. Speciﬁcally, we propose that our ability to deal with sequences of linguistic information is the result of what we call “Chunk-and-Pass” processing, by which the lan- guage system can ameliorate the effects of the Now-or- Never bottleneck. More generally, our perspective offers a framework within which to approach language compre- hension and production. Table 1 summarizes the impact of the Now-or-Never bottleneck on perception/action and language. The style of explanation outlined here, focusing on pro- cessing limitations, contrasts with a widespread interest in rational, rather processing-based, explanations in cognitive science (e.g., Anderson 1990; Chater et al. 2006 Grifﬁths & Tenenbaum 2009; Oaksford & Chater 1998; 2007; Tenen- baum et al. 2011), including language processing (Gibson et al. 2013; Hale 2001; 2006; Piantadosi et al. 2011). Given the fundamental nature of the Now-or-Never bottle- neck, we suggest that such explanations will be relevant only for explaining language use insofar as they incorporate processing constraints. For example, in the spirit of rational analysis (Anderson 1990) and bounded rationality (Simon 1982), it is natural to view aspects of language processing and structure, as described below, as “optimal” responses Now-or-Never bottleneck’s implications for perception/action and language Perception and action Language Chunking in memory and action (Lashley Incremental interpretation (Bever 1970) 1951; Miller 1956); lossy descriptions and production (Meyer 1996); multiple (Pani 2000) constraints satisfaction (MacDonald et al. 1994) Hierarchical memory (Ericsson et al. Multiple levels of linguistic structure 1980), action (Miller et al. 1960), (e.g., sound-based, lexical, phrasal, problem solving (Gobet et al. 2001) discourse); local dependencies (Hawkins 2004) Fast, top-down visual processing (Bar Syntactic prediction (Jurafsky 1996); 2004); forward models in motor multiple-cue integration (Farmer et al. control (Wolpert et al. 2011); 2006); visual world (Altmann & predictive coding (Clark 2013) Kamide 1999) University Press Chater: The Now-or-Never bottleneck: A fundamental constraint on language decreasing linguistic abstraction until the system arrives at chunks with sufﬁcient information to drive the articula- tors (either the vocal apparatus or the hands). As in com- prehension, memory is limited within a given level of representation, resulting in potential interference between the items to be produced (e.g., Dell et al. 1997). Thus, higher-level chunks tend to be passed down immedi- ately to the level below as soon as they are “ready,” leading to a bias toward producing easy-to-retrieve utterance com- ponents before harder-to-retrieve ones (e.g., Bock 1982; MacDonald 2013). For example, if there is a competition between two possible words to describe an object, the word that is retrieved more ﬂuently will immediately be passed on to lower-level articulatory processes. To further facilitate production, speakers often reuse chunks from the ongoing conversation, and those will be particularly rapidly available from memory. This phenomenon is re- ﬂected by the evidence for lexical (e.g., Meyer & Schvane- veldt 1971) and structural priming (e.g., Bock 1986; Bock & Loebell 1990; Pickering & Branigan 1998; Potter & Lom- bardi 1998) within individuals as well as alignment across conversational partners (Branigan et al. 2000; Pickering & Garrod 2004); priming is also extensively observed in text corpora (Hoey 2005). As noted by MacDonald (2013), these memory-related factors provide key constraints on the production of language and contribute to cross-linguis- tic patterns of language use.4 A useful analogy for language production is the notion of “just-in-time”5 stock control, in which stock inventories are kept to a bare minimum during the manufacturing process (Ohno & Mito 1988). Similarly, the Now-or-Never bottle- neck requires that, for example, low-level phonetic or artic- ulatory decisions not be made and stored far in advance and then reeled off during speech production, because any buffer in which such decisions can safely be stored would quickly be subject to interference from subsequent materi- al. So the Now-or-Never bottleneck requires that once de- tailed production information has been assembled, it be executed straightaway, before it can be obliterated by the oncoming stream of later low-level decisions, similar to what has been suggested for motor planning (Norman & Shallice 1986; see also MacDonald 2013). We call this pro- posal Just-in-Time language production. 3.1. Implications of Strategy 1: Incremental processing Chunk-and-Pass processing has important implications for comprehension and production: It requires that both take place incrementally. In incremental processing, representa- tions are built up as rapidly as possible as the input is en- countered. By contrast, one might, for example, imagine a parser that waits until the end of a sentence before begin- ning syntactic analysis, or that meaning is computed only once syntax has been established. However, such process- ing would require storing a stream of information at a single level of representation, and processing it later; but given the Now-or-Never bottleneck, this is not possible because of severe interference between such representa- tions. Therefore, incremental interpretation and produc- tion follow directly from the Now-or-Never constraint on language. To get a sense of the implications of Chunk-and-Pass processing, it is interesting to relate this perspective to spe- ciﬁc computational principles and models. How, for BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 5 University Press bottleneck: A fundamental constraint on language over considerably longer periods of time than planning at the syllabic level. Similarly, processes of reduction to facil- itate production (e.g., modifying the speech signal to make it easier to produce, such as reducing a vowel to a schwa, or shortening or eliminating phonemes) can be observed across different levels of linguistic representation, from in- dividual words (e.g., Gahl & Garnsey 2004; Jurafsky et al. 2001) to frequent multiword sequences (e.g., Arnon & Cohen Priva 2013; Bybee & Schiebman 1999). Some may object that the Chunk-and-Pass perspective’s strict notion of incremental interpretation and production leaves the language system vulnerable to the rather sub- stantial ambiguity that exists across many levels of linguistic representation (e.g., lexical, syntactic, pragmatic). So-called garden path sentences such as the famous “The horse raced past the barn fell” (Bever 1970) show that people are vul- nerable to at least some local ambiguities: They invite com- prehenders to take the wrong interpretive path by treating raced as the main verb, which leads them to a dead end. Only when the ﬁnal word, fell, is encountered does it become clear that something is wrong: raced should be in- terpreted as a past participle that begins a reduced relative clause (i.e., the horse [that was] raced past the barn fell). The difﬁculty of recovery in such garden path sentences in- dicates how strongly the language system is geared toward incremental interpretation. Viewed as a processing problem, garden paths occur when the language system resolves an ambiguity incorrect- ly. But in many cases, it is possible for an underspeciﬁed representation to be constructed online, and for the ambi- guity to be resolved later when further linguistic input arrives. This type of case is consistent with Marr’s (1976) proposal of the “principle of least commitment,” that the perceptual system resolves ambiguous perceptual input only when it has sufﬁcient data to make it unlikely that such decisions will subsequently have to be reversed. Given the ubiquity of local ambiguity in language, such underspeciﬁcation may be used very widely in language processing. Note, however, that because of the severe con- straints the Now-or-Never bottleneck imposes, the lan- Snider 2010; Reali & guage system cannot adopt broad parallelism to further minimize the effect of ambiguity (as in many current prob- abilistic theories of parsing, e.g., Hale 2006; Jurafsky 1996; Levy 2008). Rather, within the Chunk-and-Pass account, the sole role for parallelism in the processing system is in deciding how the input should be chunked; only when con- ﬂicts concerning chunking are resolved can the input be passed on to a higher-level representation. In particular, we suggest that competing higher-level codes cannot be ac- tivated in parallel. This picture is analogous to Marr’s prin- ciple of least commitment of vision: Although there might be temporary parallelism to resolve conﬂicts about, say, correspondence between dots in a random-dot stereogram, it is not possible to create two conﬂicting three-dimensional surfaces in parallel, and whereas there may be parallelism over the interpretation of lines and dots in an image, it is not possible to see something as both a duck and a rabbit simultaneously. More broadly, higher-level representations are constructed only when sufﬁcient evidence has accrued that they are unlikely later to need to be replaced (for stimuli outside the psychological laboratory, at least). Maintaining, and later resolving, an underspeciﬁed rep- resentation will create local memory and processing demands that may slow down processing, as is observed, University Press Chater: The Now-or-Never bottleneck: A fundamental constraint on language the occurrence of an ambiguous verb to specify the correct interpretation of that verb. Moreover, eye-tracking studies have demonstrated that dialogue partners exploit both con- versational context and task demands to constrain interpre- tations to the appropriate referents, thereby side-stepping effects of phonological and referential competitors (Brown-Schmidt & Konopka 2011) that have otherwise been shown to impede language processing (e.g., Allo- penna et al. 1998). These dialogue-based constraints also mitigate syntactic ambiguities that might otherwise disrupt processing (Brown-Schmidt & Tanenhaus 2008). This information may be further combined with other probabilistic sources of information such as prosody (e.g., Kraljic & Brennan 2005; Snedeker & Trueswell 2003) to resolve potential ambiguities within a minimal temporal window. Finally, it is not clear that undetected garden path errors are costly in normal language use, because if communication appears to break down, the listener can repair the communication by requesting clariﬁcation from the dialogue partner. 3.2. Implications of Strategy 2: Multiple levels of linguistic structure The Now-or-Never bottleneck forces the language system to compress input into increasingly abstract chunks that cover progressively longer temporal intervals. As an example, consider the chunking of the input illustrated in Figure 2. The acoustic signal is ﬁrst chunked into higher- level sound units at the phonological level. To avoid interference between local sound-based units, such as pho- nemes or syllables, these units are further recoded as rapidly as possible into higher-level units such as mor- phemes or words. The same phenomenon occurs at the linguistic levels in spoken language. As input is chunked and passed up to in comprehension, from acoustics to discourse, the temporal window over by the shaded portion of the bars associated with each linguistic level. chunks are broken down into sequences of increasingly short and concrete for producing a speciﬁc articulatory output. More-abstract material, with greater look-ahead in production at higher levels of the basis for predictions to facilitate comprehension and thus provide top- and number of levels are for illustrative purposes only.) BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 7 University Press bottleneck: A fundamental constraint on language Such representational locality is exempliﬁed across differ- ent linguistic levels by the local nature of phonological pro- cesses from reduction, assimilation, and fronting, including more elaborate phenomena such as vowel harmony (e.g., Nevins 2010), speech errors (e.g., Cutler 1982), the imme- diate proximity of inﬂectional morphemes and the verbs to which they apply, and the vast literature on the processing difﬁculties associated with non-local dependencies in sen- tence comprehension (e.g., Gibson 1998; Hawkins 2004). As noted earlier, the higher the level of linguistic represen- tation, the longer the limited time window within which in- formation can be chunked. Whereas dealing with just two center-embeddings at the sentential level is prohibitively difﬁcult (e.g., de Vries et al. 2011; Karlsson 2007), we are able to deal with up to four to six embeddings at the multi-utterance discourse level (Levinson 2013). This is because chunking takes place at a much longer time course at the discourse level compared with the sentence level, providing more time to resolve the relevant depend- ency relations before they are subject to interference. Finally, as indicated by Figure 2, processing within each level of linguistic representation takes place in parallel – but with a clear temporal component – as chunks are passed between levels. Note that, in the Chunk-and-Pass frame- work, it is entirely possible that linguistic input can simulta- neously, and perhaps redundantly, be chunked in more than one way. For example, syntactic chunks and intona- tional contours may be somewhat independent (Jackendoff 2007). Moreover, we should expect further chunking across different “channels” of communication, including visual input such as gesture and facial expressions. The Chunk-and-Pass perspective is compatible with a number of recent theoretical models of sentence compre- hension, including constraint-based approaches (e.g., Mac- Donald et al. 1994; Trueswell & Tanenhaus 1994) and certain generative accounts (e.g., Jackendoff’s paral- lel architecture). Intriguingly, fMRI data from adults (Dehaene-Lambertz et al. 2006a) and infants (Dehaene- Lambertz et al. 2006b) indicate that activation responses to a single sentence systematically slows down when moving away from the primary auditory cortex, either back toward Wernicke’s area or forward toward Broca’s area, consistent with increasing temporal windows for chunking when moving from phonemes to words to phrases. Indeed, the cortical circuits processing auditory input, from lower (sensory) to higher (cognitive) areas, follow different temporal windows, sensitive to more and more abstract levels of linguistic information, from pho- nemes and words to sentences and discourse (Lerner et al. 2011; Stephens et al. 2013). Similarly, the reverse process, going from a discourse-level representation of the intended message to the production of speech (or sign) across parallel linguistic levels, is compatible with several current models of language production (e.g., Chang et al. 2006; Dell et al. 1997; Levelt 2001). Data from intracranial recordings during language production are consistent with different temporal windows for chunk decoding at the word, morphemic, and phonological levels, separated by just over a tenth of a second (Sahin et al. 2009). These results are compatible with our proposal that incremental processing in comprehension and produc- tion takes place in parallel across multiple levels of linguis- tic representation, each with a characteristic temporal window. University Press Chater: The Now-or-Never bottleneck: A fundamental constraint on language It also parallels Marr’s (1976) principle of least commit- ment, as we mentioned earlier, according to which the per- ceptual system should, as far as possible, only resolve perceptual ambiguities when sufﬁciently conﬁdent that they will not need to be undone. Moreover, it is compatible with the ﬁne-grained weakly parallel interactive model (Altmann & Steedman 1988) in which possible chunks are proposed, word-by-word, by an autonomous parser and one is rapidly chosen using top-down information. To facilitate chunking across multiple levels of represen- tation, prediction takes place in parallel across the different levels but at varying timescales. Predictions for higher-level chunks may run ahead of those for lower-level chunks. For example, most people simply answer “two” in response to the question “How many animals of each kind did Moses take on the Ark?” – failing to notice the semantic anomaly (i.e., it was Noah’s Ark, not Moses’ Ark) even in the absence of time pressure and when made aware that the sentence may be anomalous (Erickson & Matteson 1981). That is, anticipatory pragmatic and communicative considerations relating to the required response appear to trump lexical semantics. More generally, the time course of normal conversation may lead to an emphasis on more temporally extended higher-level predictions over lower- level ones. This may facilitate the rapid turn-taking that has been observed cross-culturally (Stivers et al. 2009) and which seems to require that listeners make quite spe- ciﬁc predictions about when the speaker’s current turn will ﬁnish (Magyari & De Ruiter 2012), as well as being able to quickly adapt their expectations to speciﬁc linguistic environments (Fine et al. 2013). We view the anticipation of turn-taking as one instance of the broader alignment that takes place between dialogue partners across all levels of linguistic representation (for a review, see Pickering & Garrod 2004). This dovetails with fMRI analyses indicating that although there are some comprehension- and production-speciﬁc brain areas, spa- tiotemporal patterns of brain activity are in general closely coupled between speakers and listeners (e.g., Silbert et al. 2014). In particular, Stephens et al. (2010) ob- served close synchrony between neural activations in speak- ers and listeners in early auditory areas. Speaker activations preceded those of listeners in posterior brain regions (in- cluding parts of Wernicke’s area), whereas listener activa- tions preceded those of speakers in the striatum and anterior frontal areas. In the Chunk-and-Pass framework, the listener lag primarily derives from delays caused by the chunking process across the various levels of linguistic representation, whereas the speaker lag predominantly re- ﬂects the listener’s anticipation of upcoming input, espe- cially at the higher levels of representation (e.g., pragmatics and discourse). Strikingly, the extent of the lis- tener’s anticipatory brain responses were strongly correlat- ed with successful comprehension, further underscoring the importance of prediction-based alignment for language processing. Indeed, analyses of real-time interactions show that alignment increases when the communicative task becomes more difﬁcult (Louwerse et al. 2012). By decreas- ing the impact of potential ambiguities, alignment thus makes processing as well as production easier in the face of the Now-or-Never bottleneck. We have suggested that only an incremental, predictive language system, continually building and passing on new chunks of linguistic material, encoded at increasingly BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 9 University Press bottleneck: A fundamental constraint on language 1986). Whatever the appropriate computational frame- work, the Now-or-Never bottleneck requires that language acquisition be viewed as a type of skill learning, such as learning to drive, juggle, play the violin, or play chess. Such skills appear to be learned through practicing the skill, using online feedback during the practice itself, al- though the consolidation of learning occurs subsequently (Schmidt & Wrisberg 2004). The challenge of language ac- quisition is to learn a dazzling sequence of rapid processing operations, rather than conjecturing a correct “linguistic theory.” Chunk- requires 4.1. Implications of Strategy 1: Online learning chunks The Now-or-Never bottleneck implies that learning can depend only on material currently being processed. As we have seen, this implication requires a processing strat- egy according to which modiﬁcation to current representa- tions (in this context, learning) occurs right away; in machine-learning terminology, learning is online. If learn- ing does not occur at the time of processing, the represen- tation of linguistic material will be obliterated, and the opportunity for learning will be gone forever. To facilitate such online learning, the child must learn to use all avail- able information to help constrain processing. The integra- tion of multiple constraints – or cues – is a fundamental component of many current theories of language acquisi- tion (see, e.g., contributions in Golinkoff et al. 2000; Morgan & Demuth 1996; Weissenborn & Höhle 2001; for a review, see Monaghan & Christiansen 2008). For example, second-graders’ initial guesses about whether a novel word refers to an object or an action are affected by that word’s phonological properties (Fitneva et al. 2009); 7-year-olds use visual context to constrain online sentence interpretation (Trueswell et al. 1999); and pre- schoolers’ language production and comprehension is con- strained by pragmatic factors (Nadig & Sedivy 2002). Thus, children learn rapidly to apply the multiple constraints used in incremental adult processing (Borovsky et al. 2012). Nonetheless, online learning contrasts with traditional approaches in which the structure of the language is learned ofﬂine by the cognitive system acquiring a corpus of past linguistic inputs and choosing the grammar or other model of the language that best ﬁts with those inputs. For example, in both mathematical and theoretical analysis (e.g., Gold 1967; Hsu et al. 2011; Pinker 1984) and in grammar-induction algorithms in machine learning and cognitive science, it is typically assumed that a corpus of language can be held in memory, and that the candidate grammar is successively adjusted to ﬁt the corpus as well as possible (e.g., Manning & Schütze 1999; Pereira & Schabes 1992; Redington et al. 1998). However, this ap- proach involves learning linguistic regularities (at, say, the morphological level), by storing and later surveying rele- vant linguistic input at a lower level of analysis (e.g., involv- ing strings of phonemes); and then attempting to determine which higher-level regularities best ﬁt the data- base of lower-level examples. There are a number of difﬁ- culties with this type of proposal – for example, that only a very rich lower-level representation (perhaps combined with annotations concerning relevant syntactic and seman- tic context) is likely to be a useful basis for later analysis. But more fundamentally, the Now-or-Never bottleneck re- quires that information be retained only if it is recoded at University Press Chater: The Now-or-Never bottleneck: A fundamental constraint on language acquisition involves learning to process, and generalizations can only be made over past processing episodes. 4.2. Implications of Strategy 2: Local learning Online learning faces a particularly acute version of a general learning problem: the stability-plasticity dilemma (e.g., Mermillod et al. 2013). How can new information be acquired without interfering with prior information? The problem is especially challenging because reviewing prior information is typically difﬁcult (because recalling earlier information interferes with new input) or impossible (where prior input has been forgotten). Thus, to a good ap- proximation, the learner can only update its model of the language in a way that responds to current linguistic input, without being able to review whether any updates are inconsistent with prior input. Speciﬁcally, if the learner has a global model of the entire language (e.g., a traditional grammar), the learner runs the risk of overﬁtting that model to capture regularities in the momentary lin- guistic input at the expense of damaging the match with past linguistic input. Avoiding this problem, we suggest, requires that learning be highly local, consisting of learning about speciﬁc rela- tionships between particular linguistic representations. New items can be acquired, with implications for later pro- cessing of similar items; but learning current items does not thereby create changes to the entire model of the language, thus potentially interfering with what was learned from past input. One way to learn in a local fashion is to store individ- ual examples (this requires, in our framework, that those examples have been abstractly recoded by successive Chunk-and-Pass operations, of course), and then to gener- alize, piecemeal, from these examples. This standpoint is consistent with the idea that the “priority of the speciﬁc,” as observed in other areas of cognition (e.g., Jacoby et al. 1989), also applies to language acquisition. For example, children seem to be highly sensitive to multiword chunks (Arnon & Clark 2011; Bannard & Matthews 2008; see Arnon & Christiansen, submitted, for a review13). More generally, learning based on past traces of processing will typically be sensitive to details of that processing, as is ob- served across phonetics, phonology, lexical access, syntax, and semantics (e.g., Bybee 2006; Goldinger 1998; Pierre- humbert 2002; Tomasello 1992). That learning is local provides a powerful constraint, in- compatible with typical computational models of how the child might infer the grammar of the language – because these models typically do not operate incrementally but range across the input corpus, evaluating alternative gram- matical hypotheses (so-called batch learning). But, given the Now-or-Never bottleneck, the “unprocessed” corpus, so readily available to the linguistic theorist, or to a comput- er model, is lost to the human learner almost as soon as it is encountered. Where such information has been memo- rized (as in the case of SF’s encoding of streams of digits), recall and processing is slow and effortful. More- over, because information is encoded in terms of the current encoding, it becomes difﬁcult to neutrally review that input to create a better encoding, and cross-check past data to test wide-ranging grammatical hypotheses.14 So, as we have already noted, the Now-or-Never bottleneck seems incompatible with the view of a child as a mini- linguist. BEHAVIORAL AND BRAIN SCIENCES, 39 (2016) 11 University Press bottleneck: A fundamental constraint on language which embodies these principles is the simple recurrent network (Altmann 2002; Christiansen & Chater 1999; Elman 1990), which learns to map from the current input on to the next element in a continuous sequence of linguis- tic (or other) input; and which learns, online, by adjusting its parameters (the “weights” of the network) to reduce the observed prediction error, using the back-propagation learning algorithm. Using a very different framework, in the spirit of construction grammar (e.g., Croft 2001; Gold- berg 2006), McCauley and Christiansen (2011) recently de- veloped a psychologically based, online chunking model of incremental language acquisition and processing , incorpo- rating prediction to generalize to new chunk combinations. Exemplar-based analogical models of language acquisition and processing may also be constructed, which build and predict language structure online, by incrementally creat- ing a database of possible structures, and dynamically using online computation of similarity to recruit these structures to process and predict new linguistic input. Importantly, prediction allows for top-down information to inﬂuence current processing across different levels of linguistic representation, fr

The Now-or-Never Bottleneck: A Fundamental Constraint on Language PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue