Psychology of Language Summary Midterm Exam PDF

Psychology of Language summary midterm Introduction Different levels Language can be studied in a number of different ways, at different levels with different aims. - Phonetics (the study of raw speech sounds) - Phonology (the more abstract study of sound categories in a language) - Morphology (the study of words and word formation) - Semantics (the study of meaning) - Syntax (the study of grammatical properties) - Pragmatics (the study of language use) - Discourse studies (the study of language in interaction) Definition of language “A system of form-meaning pairings that can be used to intentionally communicate meaning” - System: there is a structure to the madness - Form-meaning pairings: of different sizes, at various levels of specificity - Use: different modalities, production and perception - Intentionally: producer wants to achieve something - Communicate meaning: almost anything can be expressed Form-meaning pairings - For instance: words (like tree) - But also smaller elements (like the -s in elements and words) Language use - Language is spoken and heard, signed and seen, and written and read. - Language is acquired, learned, and sometimes forgotten or lost. - We know that we can use language - We don’t necessarily know how we do it: there is much about which we are unaware Intentional communication - We use language to exchange information, to express emotions, to get others to do something, etc. - This makes language very relevant for students of communication sciences. Communicate meaning - We can communicate almost anything - Language makes it possible to make a thought travel from my mind to yours and communicate meaning Basic assumptions Key aspects for understanding language and its use: - Humans are embodied creatures. While they are communicating, they use their body (their mouth, ears, eyes, hands, torso) extensively. - People are embedded in different social contexts. - Each human has its own mental model (i.e. a mental representation or interpretation of an external situation). - Incrementality (or incremental processing). Embodiedness Humans process language by way of their bodies: - Our body offers us different channels - These channels come with limitations o Articulation differs o Speaking and online chatting are faster than writing a book or typing a letter o Intonation differs (e.g., capital letters in print) - We typically use different channels at the same time - There is non-verbal information that is conveyed through the body, for example, via posture, clothes, smell, and facial expressions. - Information travels from brain to brain via the world and senses Non-illustrated examples are smell (soap, perfume) and touch (a handshake or high five). Embeddedness The language we use is typically embedded in a larger context. People constantly formulate and update (their “mental models”), the perspective of their bodies and personalities is in a central position, but it interacts heavily with the environment. This makes their language and models of reality embedded. Non-linguistic context Physical - Spoken utterances are affected by bodily limitations, such a the shape and momentum of the speech organs. - Assimilation and co-articulation imply that sounds may change depending on the context they occur in. ▪ E.g., pronouncing words ‘want to’ as ‘wanna’ and ‘handbag’ as ‘hambag’. - Our sentences and words may refer to people, objects, and actions in the present environment: o ‘This road’, ‘she is running’. Thus, the meanings and world knowledge we use can be related to this physical context and this time. o Our interpretation is also often context dependent: the word ‘here’ typically refers to this specific place and time depending on the conversation. Social and cultural context - For instance, using a special type of child directed language, called ‘baby talk’, ‘parentese’, or ‘motherese’. - The phenomenon of ‘cultural frame shifting’. o An impact of the cultural norms of a particular language community on the speakers’ personality as they themselves or others perceive it. In simple terms, bilinguals may feel they have two somewhat divergent personalities one for each of their languages. - Not only verbal aspects of communication play a role, so too do non-verbal aspects. o For instance, eye contact is important to establish whether the listener is following and understanding what you are talking about. o Body language may convey information about the relationship between speaker and listener, motivation and power, attentiveness and interest, etc. Linguistic context Our knowledge about the different levels of the language in use (e.g. syntax or semantics). - Sounds or letters are typically part of larger linguistic units such as words. - Words often appear in the context of a larger phrase or sentence. - Sentences are often part of larger stretches of text or discourse. Incremental processing Pieces of information are processed incrementally, i.e. as a series of different elements that follow one another in time and build on each other. - Language reaches your eyes and/or ears step by step - In enters your brain via your senses piece by piece - Language also leaves your mouth bit by bit Efficient processing implies that receivers do not wait until the utterance is finished, but process all information they can at each moment in time and predict what is coming next. This implies that syntax and semantics must be interwoven: as soon as the first words of an utterance come in, both their (lexical) meanings and their role in the sentence are established. Mental models As a hum you try to build a coherent internal model of the world, a simplified approximation of aspects that are relevant to you for physical and social survival. In the most simple mental model, your current situation is represented in terms of various aspects: - Physical: being aware of gravity, colour, resistance of material objects and loudness - Biological: perceiving things emotionally, via your different bodily senses, and by movement - Psychological: what is happening has an emotional value or an abstract meaning - Sociological: noticing your role in the ongoing dialog dependent on your background and the empathic relation between you and the person you speak with All these levels are intertwined: - E.g. you may speak more loudly (physical) when you are angry (biological and psychological), because your dialog partner just ridiculed your favourite soccer team (sociological). Mental models will contain both abstract and embodied information. - E.g. when the Chinese flag is mentioned in a news item, the mental model for the situation might add the colours red and yellow to its content. Information travels from the mental model of the sender to the mental model of the receiver. Consequences of this view on language These assumptions stress that humans make use of a variety of bodily channels to communicate about aspects of the rich meaning structures present in their minds. As such, this approach breaks with a tradition that conceptualizes language as a purely abstract system of rules and representations. Language and communication The term ‘communication’ has a broader application than the term ‘language’. Communication has been loosely defined as the exchange of information, ideas, or feelings. Communication can be seen as a sort of transaction: the participants in the dialog (technically called ‘interlocutors’) are creating meaning together, here affective meaning. Communication can also take place in order to influence other, to develop relationships, and to fulfil social obligations. Verbal and non-verbal communication There is no ‘hard’ distinction between verbal and non-verbal aspects of a message. Our cognitive systems are non-modular. In both the production and perception of communicative messages, information (e.g. visual, auditory) from different modalities is often combined to form composite signals. E.g., pointing and saying “the train station is over there”. Types of hand gestures - Iconic gestures: resemble what they mean, it looks like what it means o The word ‘beep’ sounds a bit like an actual beep: its spoken form resembles what the speaker means when saying it. o A speaker may bring her hand to her mouth as if holding a glass while talking about her night out at the pub. - Emblems: certain gestures that have a certain meaning in certain cultures (for example the thumbs up sign might mean something else in a different culture) - Pointing gestures: shifts somebody’s attention to something else o Such a relation between a sign and its referent is called indexical. - Beat gestures: rhythmic movements people make while speaking (aligned with the melody of the speech) Besides this, many spoken and written words are verbal signs that convey their meaning to a large extent in a symbolic way. The word form ‘tree’ does not look like or sound like a tree. Using the body for communication - Language as multi-modal and multi-channel in nature - No clear distinction between verbal and non-verbal cues - Segregation and binding (finding out which signal has meaning and which doesn’t) o Segregation: a addressee needs to segregate communicatively intended information from non-communicative signal during language comprehension. o Binding: the addressee needs to be able to ‘bind’ or integrate information that is communicated by a speaker through different modalities in an online fashion. - Example: lip movements and beat gestures The McGurk effect A perceptual phenomenon where what we see influences what we hear. When the visual input (lip movements) mismatches the auditory input (speech sounds), our brain integrates both, often creating a third, distinct perception. For example, hearing "ba" while seeing a person mouth "ga" may result in perceiving the sound as "da." This effect demonstrates the strong interaction between auditory and visual speech cues in shaping perception. - Evidence that people bind different signals (verbal and non-verbal) - Binding of lip movements and sounds Manual McGurk effect DEMrof versus demROF PERmit versus permit - Stress on different syllables in combination with beat gestures - Depending on where the beat gesture was, people heard different words (either PERmit or perMIT) - So beat gestures influence how you hear words Language and communication in context When communicating, the participants heavily rely on mutual, shared knowledge or common ground. Communication often takes place in noisy situations. Noise is any stimulus, external or internal, that disrupts the sharing of meaning. - It can be external, as when distractions happen, - Or internal, for instance, daydreaming or being on edge. - Semantic noise: unintended meanings that the perceived utterances evoke, when I say that we had a ‘nice date’, you might think that I found it only so-so, while, in fact, I really enjoyed it. Encoding: when ideas and feelings are turned into messages. Decoding: turning messages into ideas and feelings. Back-channelling: verbal and non-verbal reactions to messages to indicate if the message is seen, heard and/or understood, so that communication can proceed or be adapted. This is context-dependent and sometimes ambiguous. For instance, when someone nods this could mean ‘go on’, but also ‘I agree’. Displacement: talking about entities that are no longer present in our immediate environment. Sender-receiver model The medium and the message The medium is the message Most of the time, we pay attention to the content of a message (e.g. what is said), but the form of the message (e.g. how it is said) is an important co-determinant of how much and which information is transferred. In fact, the form of the message is therefore a part of the message. - Radio versus television - E-mail versus video calls Language extends our senses - Motor behaviour: sentences can be seen as representing actions (‘I fell’). - Perception: for instance by pointing out what the body sees at a distance (‘look at that star’). - Emotion: expressing what the body is feeling (‘I feel sad’). - Memory: retrieving what happened in the past ( ‘Plato said…’). The medium is the massage When a new medium is invented, it sometimes brings about radical changes in human society. The new media create specific environments, changing old concepts and putting new rules on information transfer. A new medium ‘massages’. - Effect of smartphones on society - New media prioritize some of our senses over others (these days our ears become more important than our eyes, e.g. listening instead of reading) - Attention span and reading skills Language without sound: sign language - Handshape - Location - Orientation - Movement Alternative: fingerspelling (relies on spoken language, for each letter there is a different sign) Learning phonemes in a sign language - Very similar language acquisition through sounds - Sign “babble”: different from other non-linguistic hand movements - Difference real sign vs. pantomime (iconic gesturing that is done for communicative purposes in the absence of speech) - First comprehensive signs seem to be acquired slightly earlier than the first spoken words - Note: cochlear implants (implant in brain to hear) Evolution of language - Languages are/keep evolving - Languages can have a common ancestor (Germanic, Celtic, Slavic, etc.) Gestural origins of language? Some people say there was gestural language before spoken language: - Understandable under noisy conditions - Silent themselves - Iconic form-meaning mappings Vocal origins of language? Other researchers say there are vocal origins of language: - Communication in the dark - Communication over large distances - To some extent: iconicity Pointing Pointing gestures must have been super important in early stages of language evolution: - Important for joint attention following an urge to communicate - Joint attention is important for creating link between word and referent - A universal property of human communication Animal communication research - Comparing human language with how animals communicate - Maybe that is what our language looked like before we started developing it - Basic conclusion about animal communication: almost all of them have something that we also see in human communication Cross-species comparisons - E.g. Apes vs. humans - What is similar? What is different? - They are using language as a social basis Bonobo Kanzi - Kanzi makes different vocalizations in the context of different objects (e.g. juice vs. bananas vs. grapes) - Language or communication? Behaviour ≠ knowledge - More and more evidence of human-like nonlanguage behaviour in animals (e.g., mourning elephants), so humans may not be so special after all. - Should we actually be taking such an anthropocentric view? Language User Framework How does language work in the human mind? - Language comprehension - Language production - Memory - Language versus thought - Conceptual system would be the location of your mental model (the way that you think etc.) According to the book, four major types of tasks: - Memory retrieval: finding different linguistic representations in Long Term Memory - Processing representations: stepwise recoding of internally represented linguistic input into thought, or vice versa - Using Working Memory: temporarily storing linguistic half-products during recoding - Exerting cognitive control: managing and monitoring ongoing processes (e.g. by checking, shifting attention, or inhibiting representations) Language comprehension - Input - Signal recognizer - Word recognizer - Sentence processor - Text and discourse comprehension - Mental model - Bottom-up or data-driven and signal-driven Signal recognizer Small units (e.g. sounds or letters) must be recognized in the signal and represented faithfully. In spoken language the signal is speech: - This signal is continuous o For written language there are short breaks (e.g. gaps between words and sentences). Therefore written language is not continuous. Recognizing what is in the signal - Knowing which sounds exist in the language at hand - Looking for meaningful sound distinctions - Making use of phonotactic constraints o Str versus sbr o Sequence of sounds o You know that there are existing words starting with str (e.g. street), but not with sbr. Therefore you know that after the s a new word starts with br. - Pattern example: final devoicing o Bed versus bed o E.g. in Dutch you wouldn’t pronounce the letter d, in English you would - Statistical learning o You have learned stuff overtime, because you’ve got a lot of input from people (everyone speaks all the time) o You start to notice certain patterns Word recognizer On the basis of the represented speech or print, words and all their properties must be looked up in the ‘mental lexicon’, the store of lexicon-semantic knowledge in Long Term Memory; this is handled by a Word Recognizer. Sentence Processor The syntactic structure of sentences must be established by a Sentence Processer. The semantic (meaning) structure of the sentence message must also be determined. - Syntactic (grammatical roles) o The word order o Is the sentence active or passive? - Semantic (thematic roles) and pragmatic (intended meaning) - Syntactic and semantic integration o Putting the different elements into a sentence Language comprehension: reading Letter-to-sound mapping - A.k.a. grapheme-to-phoneme mapping For many languages: no one-to-one mapping between written letters and spoken sounds! - E.g. if the gh sound in enough is pronounced f and the o in women makes the short i sound and the ti in nation is pronounced sh. Then the word GHOTI is pronounced just like FISH. Reading versus listening - Parallel (reading) versus sequential (listening) processing - From signal to word to sentence recognition Production Focus on production after the midterm - Mental model - Text and discourse production - Grammatical encoder - Phonological encoder - Articulator - Output - Top-down and knowledge-driven Order of language production - Thoughts and intentions (i.e. semantic structures) are formulated within the communicative context by the Conceptual System. - Appropriate word units and syntactic structures that convey the message must be built by a Grammatical Encoder. - The selected words and sentences must be specified with respect to their sounds by a Phonological Encoder. - Utterances must be articulated by an Articulator in close synchronization with the articulation of communicative signals through modalities other than speech (e.g. co- speech gestures). Parallel and sequential processing For instance, when we listen to the speech signal, the Signal Recognizer could process phonemes on the basis of which the Word Recognizer might activate words and their meaning. It could be that during listening, the Word Recognizer is still processing the previous word, while the Signal Recognizer is already at the same time trying to determine the phonetic characteristics of the following one. The flow of information during comprehension is mainly bottom-up and signal-driven, whereas in production it is predominantly top-down and concept-driven. Nevertheless, comprehension is helped by top-down predictions. There is room for predictions because of incremental processing. Long Term Memory - Conceptual memory - Syntax - Lexicon and morphology - Phonetics and phonology - (Hand gestures) - (Language membership; multilinguals) Linguistic representations in Long Term Memory Phonetics Focusses on the characteristics of raw speech sounds (phones). It is a discipline concerned with how people perceive these sounds (acoustic phonetics) and how they produce them (articulatory phonetics). Phonology Considers which abstract categories of sounds must be discerned in the sound repertoires of different languages. Abstract sound categories are referred to as phonemes. When two words differ in one phoneme ( a so-called ‘minimal pair’), this phoneme makes a difference to their meaning. E.g. ‘bad – pad’, in which /b/ and /p/ are different phonemes that change word meaning. Different sounds within a phoneme category are called allophones. - The difference between the sound /p/ at the beginning or end of a word; or sound differences occurring because the speaker has the flu. - They are not phoneme changes and do not change meaning. Phonemes can be (combined into) syllables. Syllables consist of vowels (V) and/or consonants (C). The V syllable /a/ consists of just one phoneme; the CVC syllable /baed/ is a word consisting of three phonemes. It is said to have an onset /b/, a nucleus /ae/, and a coda /d/. Its rhyme is /aed/. Lexicology Lexicology studies different properties of words, word representations. The mental lexicon contains all words a language user knows and specifies all these different properties: - Phonetic and phonological: about the sounds and phonemes a word is built of - Orthographic: about the letters and graphemes that make up a written or printed word - Morphological: which meaningful smaller parts can be discerned in a word - Semantic: what a word means - Syntactic: what the syntactic category of a word is, what gender, etc. - Articulation: how a word is pronounced (cf. articulatory phonetics) - Motor: what writing or typing movements are required to produce a word on paper A word’s lexical properties are significant determinants of how fast the word can be looked up in Long Term Memory. The most important properties of words are their duration or length, the frequency with which they are used, the diversity of contexts in which this happens, and the specific combinations of sounds and letters that the words consist of. How many other words there are similar to the target word, and how frequent these are. Morphology How complex words can be built on the basis of more simple elements in different languages. - E.g. ‘book’ can become ‘bookshop’ or ‘books’, ‘booking’, etc. Syntax Syntax describes the coherent system of syntactic structures (grammar) both in a language and across languages. The syntactic structure of a sentence can have consequences for its meaning. - Sentence structure, word order, word class. - E.g. ‘boy sees shark’ has a different meaning than ‘shark sees boy’. Semantics The meaning of sentences, words, and texts. Instead of semantic, psycholinguists often use the term conceptual. Pragmatics How meaning depends on the context in which utterances occur. Semantics represents the meaning from this utterance, while pragmatics notes its intention. Cognitive control and working memory - Working memory (< 1 min). - Words are briefly stored here. Because different levels of processing are involved simultaneously in language, ongoing computations must be monitored (checked), controlled, and updated. There is a dedicated Cognitive Control System for this: - Decide/choose between relevant or selected alternatives (e.g. responses) - Suppress unwanted alternatives (e.g. other responses) - Switch or shift to the most relevant task aspects - Update information in Working Memory Language User Framework and the brain Language processing makes use of substantial parts of the brain, including areas located in the frontal, temporal, parietal and occipital loves. Components of the Language User Framework do not one-to-one map onto brain regions. Rather, networks of brain regions dynamically interact to allow for efficient and successful communication. - E.g., word reading involves several different subtasks: finding the letters of a word, locating the word form in Long Term Memory, activating sound forms an d meaning, etc. Language Research Techniques On-line and off-line On-line On-line tasks: measuring mental (on neurophysiological) processes as they occur. - E.g. eye movements while reading sentences with typical agent and patient roles (the cheese was eaten by the mouse) vs. atypical agent and patient roles (the mouse was eaten by the cheese). On-line behavioral reaction time and neuroscientific studies can be used to investigate how language users retrieve words from their lexicon, following different stages, triggered by an incoming signal. Off-line Off-line tasks: measuring the content of (long-term) memory. - E.g. studying 40 action verbs items in an artificial language (vimmi) with left- or right- handed pictures and testing how many participants remember after 1 hour and again after 1 week. Off-line memory studies (word recognition and word recall) are used to gain insights about how words are represented in our mental lexicon and how the lexicon develops during acquisition. Memory - Word recognition: participants first learn words and are later asked to recognize items, which they might have learned before or not, from a presented stimulus list. - Word recall: participants are asked to recall (produce) as many words as possible from a set of words they have learned before. o Free recall: list as many action verbs in Vimmi as you remember o Cued recall: the English translation of luko is…. On-line behavioural techniques Lexical decision In lexical decision, participants must decide as soon and as accurately as possible if a presented item is an existing word or not. In visual lexicon decision, the stimulus is presented in a printed form, in auditory lexical decision in spoken form. Some factors that determine reaction time (RT): - The age at which you first acquired the word (younger is faster) - The frequency of the word (more frequent is faster) - The length of the word (shorter is faster) o These three are highly correlated: words that you learn at a young age tend to be frequent, and frequent words tend to be short. Neighbourhood density: the more words that look alike exist in the language, the slower the RT (‘bit’ has many neighbours [bat, but, bot, big, bid, etc.], quiz only has a few [ quit, quip, quill, …?]). Imageability: the easier it is to evoke a mental image, the faster the RT (‘apple’ vs. ‘truth’). Orthographic regularity: the more regular the spelling, the faster the RT (‘mint’, ‘hint’ ‘lint’ vs. ‘pint’). In progressive demasking, the presentation of a target word is alternated with that of a mask. During this alternation process, the target item is presented for a longer and longer time, while that of the mask decreases. Participant are asked to push the button as soon as they identify the target word, and then to type it in. Priming Lexical decision has often been combined with a priming technique. The target item is then preceded by a word or sentence in the same or a different modality (e.g., auditory sentence followed by a visual target word) that has a particular relation to that target. - E.g. the target word ‘nurse’ may be preceded by the semantically related word ‘doctor’. - This is called semantic priming. Cross-modal priming: the prime is presented in a different modality than the target. Masked priming: in priming with a forward mask, the prime word is preceded by a mask and then presented for a very short time. Backward masks (following the prime) are also sometimes used. Word association: related but off-line task. People are presented with a cue word (e.g., shark) and are asked to give as many associations they can come up with (e.g., animal, blood, fish, …). Why? Lexical decision has been used to study… - How many words people actually know (including non-words to prevent cheating). - How the mental lexicon is organized. o Including: if this changes during aging, if different languages influence each other in bilinguals, etc. - How written word recognition works. o Including: how we cope with irregularly spelled words (e.g. yacht). - How sentence context guides the interpretation of ambiguous words o E.g. we went down to the river and sat at the bank for a while. Word naming Word naming: participants read aloud visually presented words or pseudowords (i.e., pronounceable letter strings that are not words) as quickly and accurately as they can. - This removes the metalinguistic component (in lexical decision) and corresponds more closely to real-life linguistic behaviour. Record the response with a microphone and determine the exact moment at which the participant starts saying the word. - This is not as trivial as it may seem! What if a participant says: o ‘Uhhh lemon?’ Or ‘melon uhmm lemon?’ o Due to the way we produce speech sounds, certain first letters can be pronounced faster than others, regardless of how long it took us to read or recognize the word they occur in. In order to name a word, the orthographic representation derived from the printed letter string must be converted into a phonological-phonetic representation that can be used for articulation. This could be done in two different ways: - Lexical route: look up the orthographic representation of the word (e.g., lemon) in the mental lexicon directly. The spoken word form could be derived from this representation or from its associated meaning (lemon is pronounced a particular way). - Sublexical route: the letters of a word (e.g., l-e-m-o-n) could first be recoded into individual sounds. This process has been called grapheme-to-phoneme conversion. Assembling the different sounds into a whole word again would then indirectly result in the retrieval of the word’s phonological representation. Picture naming To name a presented picture correctly, the depicted object must first be identified, resulting in an activation of (parts of) semantic representation or concept. Next, the associated object name is looked up and the spoken word form can be retrieved and produced. Going from semantics to phonology could be a similar process in word naming and picture naming. However, in word naming there are also active lexical and sublexical routes from its orthographic representation to its phonological representation. Self-paced reading Participants are presented with a word (or a phrase) and press a button as soon as they have read it. The time it takes to press the button is a reflection of word and sentence properties. There are also other factors contributing to the response, they may develop a response rhythm. A. While Susan was dressing herself the baby played on the floor. B. While Susan was dressing the baby played on the floor. played on the floor = called ‘disambiguating region’ because it helps resolve the temporary ambiguity in sentence B. Eye tracking Eye tracking: an eye tracker device uses illuminators that throw a pattern of near-infrared light on the eyes of the reader. Perceptual span: how many characters we can see. Saccades: eyes make little jumps from one word to a later one. Fixation duration: the time the eye rests on a particular word. Gaze duration: the total of the first and later fixations. Regressions: eyes move back to earlier positions in a sentence. - These backward jumps probably occur when not all is clear. The reader’s behaviour can therefore be considered as incremental. Visual world paradigm Listening to spoken instructions or descriptions while viewing objects (on a monitor or actual physical objects). This can also reveal predictions that participants make while listening. Virtual reality - Experimental control and ecological validity are usually thought of as ends of a continuum; to get a little more of the one, you need to lose some of the other. - Virtual reality allows us to treat these as two separate constructs and get the best of both worlds. - Allows recording of behavioural, eye-tracking, motion, and/or EEG data in a realistic environment, while keeping behaviour of virtual agents constant. Neurophysiological techniques fMRI fMRI detects changes in the degree of oxygen in the blood in different brain areas. When a particular cognitive activity must be carried out by the brain, the involved brain areas require more ‘energy’ in terms of oxygen. This induces an increased blood flow to the active area. - Very useful for finding which anatomical structures are involved in performing a particular task (e.g. reading, listening, picture naming, word repetition, etc.) - But also at a much more fine-grained level, e.g. understanding action verbs (kick) vs. abstract verbs (think). - Activity is usually compared with a baseline condition; choosing an appropriate baseline is not trivial. Frontal lobe: associated with planning, control, motor behaviour, and short-term memory. Parietal lobe: related to receiving and processing somatosensory information. Occipital lobe: processing visual information. Temporal lobe: processing auditory information, memory, and meaning. Limitations - BOLD response is always delayed with respect to brain activity: it tells you which regions were active approximately 4 seconds ago o Not enough precision to give insight into the timing of linguistic processes - Limits the range of tasks subjects can perform (e.g. no normal conversation) - Expensive - Not necessarily a limitation, but beware of reverse inferencing EEG EEG measures electrical activity arising in the brain. We can’t really ‘decode’ all the waveforms, but there are certain signature waveforms that show up consistently in response to certain types of events (e.g. an unexpected word). Waveforms that are time-locked to the presentation of a stimulus are called event-related potentials (ERPs). N400 Negative voltage peak about 400 ms after stimulus onset, larger when word is more unexpected. - A more negative N400 reflects more processing effort. For instance, pseudowords and nonwords. P600 Positive voltage peak about 600 ms after stimulus onset, when there is a syntactic violation. Strengths Very precise information about timing of language-related processes in the brain. - This is useful, because language processing is fast and occurs at small timescales. No overt response required (similar for fMRI): participants can read or listen to stimuli, without having to press buttons, answer questions, etc. Limitations Poor localizations: we only measure electric potentials t the scalp, not knowing (very exactly) where in the brain they originate. Experiments require many trials of the same/similar stimuli to achieve a clear averaged signal; the waveforms of single trials are extremely noisy. Recognizing Spoken Words Prelexical processing The Spoken Signal Recognizer segments the raw speech wave into acoustic-phonetic representations (phones) that can be turned into abstract representations (e.g., phonemes or allophones) serving as input to the Spoken Word Recognizer. We call this auditory preprocessing of segmentation and classification prelexical, because it takes place before whole words are available or before the mental lexicon in Long Term Memory is contacted. Following this stage, the Spoken Word Recognizer turns the phonemic representations of speech signal into larger units in order to look up the matching spoken words and their meanings in Long Term Memory. Such representations can be larger than phonemes and smaller than the word and are called sublexical. Spoken Word Recognizer must also integrate information of suprasegmental nature. Suprasegmental information includes various acoustic cues for prosodic structures that cover long stretches of speech, for instance, syllables, lexical stress patterns, and phrases. Note: the process of spoken (or ‘auditory’) word recognition is incremental: when we are listening, spoken word information comes in gradually over time. Categorization Every spoken language has a set of distinct sounds: - English: 26 letters, but about 40 different sounds! - Vowels and consonants - Phonemes: change the meaning of a word o /b/ and /p/ are different phonemes (‘back’ vs. ‘pack’) - Allophones: do not change the meaning of the word o /k/ in ‘kill’ vs. ‘cool’ vs. ‘skill’ - Different languages, different sound categories o Dutch ‘man’ (/man/) vs. ‘maan’ (/ma:n/, English ‘bad’ (/baed/ vs. ‘bed’ /bed/) Vowels When we produce vowels, there is a flow of air from the lungs to the mouth that passes the vocal cords in a unobstructed way. Vowels produce a lot of energy (they come out a relatively strong amplitude) that can be sustained or stable over a relatively long period of time. This is why we call vowels voiced. - Vowel height o Position of the tongue - Vowel backness o Part of the tongue (front, back, etc.) - Lip rounding - Tenseness Acoustic triangle: how closed or opened the mouth is when the vowel is produced, and whether the vowel is produced in the back or front of the mouth. Diphthongs: e.g., au, vowels transitioning into each other. Consonants Consonant sounds, in contrast, characteristically create some sort of constriction in the flow of air. In the production of plosives, for instance, b, d, g, p, t, and k, the air pressure is gradually built up and then suddenly released. In contrast, fricatives, for example, s, z, f, and v are produced with a partially closed articulatory channel, resulting in some friction. - Place of articulation - Manner of articulation - Voicedness Need to understand these concepts, not the exact terminology! Place of articulation Where in the mouth a sound is produced or which parts of the articulatory apparatus are used, where on the vocal track stops the air flow? - E.g., in the case of a ‘p’ sound, both lips are used; hence the sound is called bilabial. Manner of articulation Indicates how the sound is produced. - Stop (or ‘plosive’): blocking the sound, then letting go - Fricative: narrowing the flow of sound with your tongue - Affricate: combining an oral stop an a fricative (a little bit of both: plosive and fricative) - Liquid: letting the air flow over the side of your tongue - Glide: only mild obstruction Voice / voicedness - Voiced versus voiceless consonants o Bet versus pet - A voiced sound has vibrating vocal folds - All vowels are voiced by definition Voice Onset Time (VOT): The delay between opening of the vocal tract and vibration of the vocal cords. Nasality Whether during the production of a sound the nose cavity plays a role or not. Categorical perception With phonemes, a small change in VOT does not change your perception until that change crosses a boundary, this causes a large change in your perception. This is called categorical perception. The place of articulation of the sound goes in a physically continuous way, but the perception of the sound is relatively categorical. - For example, when you listen to sounds that fall between a /b/ and a /p/ (like in "bat" vs. "pat"), your brain doesn’t hear a blend—it hears either /b/ or /p/, with nothing in between. Even though the difference between the two sounds is small, you hear them as completely separate categories. - /b/ (as in "bat") has a short VOT, meaning the vocal cords start vibrating almost immediately after the lips release the sound. - /p/ (as in "pat") has a longer VOT, meaning there’s a slight delay before the vocal cords start vibrating. Ganong effect Lexical context affects perception of (ambiguous) speech sounds. Specifically, when listeners hear a sound that is unclear or falls between two categories (like a sound between /t/ and /d/), they are more likely to interpret it as the sound that creates a real word in their language, rather than a non-word. - If someone hears a sound between /t/ and /d/ in the context of “_ask,” they are more likely to hear it as /t/ (making “task”) because "task" is a real word, while "dask" is not. Similarly, if the ambiguous sound occurs in "_ash," people might hear it as /d/ (making "dash") because "dash" is a real word, while "tash" is not. Phonemic restoration When listeners hear sentences like ‘they saw the *un shining on the beach’ in which the sound /s/ has been replaced by a cough, they often do not notice the absence of the /s/ sound and even have problems indicating where in the sentence the cough was presented. Thus, they easily restore the missing phoneme. - The sounds we perceive are not necessarily the sounds that are there; information from the word level affects perception. Co-articulation Speech sounds overlap in time. You are pronouncing words in a flow, not letter by letter! In connected speech, any speech gesture is affected by its preceding and following gesture. - Not only within, but alco across words (‘sandhi’): o Lean bacon -> leam bacon o Take that -> tage that - Creates a smooth speech signal. - Creates redundancy: individual segments provide clues about preceding and following segments. o Evidence from (artificial) ‘silent center vowels’: when you edit the /ae/ out of /baeg/, the result /b_g/ still sound like ‘bag’, not like ‘bug’ or ‘big’). - Creates variability: the mapping between phones (acoustic signals) and phonemes (abstract mental categories) is many-to-one. Challenges for the spoken signal recognizer Speech is variable - Same meaning can be represented by different sounds, depending on person, timing, and situation. ➔ Many-to-one mapping Speech is continuous - No pauses between words (the spaces we use in writing often aren’t there, or in different places). - Nopausesbe tweenwords butsometimeswi thinwords ➔ Continuous-to-discrete mapping Speech is ambiguous - The speech signal may be consistent with multiple meaning; lexical embeddings are frequent. - ‘to recognize speech’ vs. ‘to wreck a nice beach’ ➔ One-to-many mapping Models of spoken word recognition Cohort and TRACE Both are models of recognition of word forms: matching the spoken/written input to an existing entry in the mental lexicon. These models do not explicitly deal with activation of word meaning. Both models are explicitly incremental. Cohort: words are processed from left to right: - Perceiving input activates a (potentially very large) set of candidates: a cohort. - These candidates are activated in parallel, at very little cognitive cost. - As the input unfolds, candidates that are not consistent with the new input are ruled out. Cohort Hearing the first phoneme of a word triggers parallel activation of all words starting with that phoneme in the mental lexicon. This set of initially activated words is called the word initial cohort. (Cohort 1: all or nothing; Cohort 2: graded by frequency). The exact moment of recognition would depend on a complex set of factors: - Physical properties (word duration, stimulus quality) - Intrinsic lexical properties of the word (word frequency) - The number of other words similar to the target (the so-called cohort member of competitors) - The efficiency of the selection process Uniqueness point (UP): the moment when enough phonetic information has been received to uniquely identify the word. At this point, the word no longer matches any other potential word candidates in the mental lexicon. It signals when the listener can confidently identify the word even before hearing the entire word. Recognition point (RP): the moment when the listener successfully recognizes or identifies the word. Nonword point (NWP): when the item’s cohort no longer contains any candidates. TRACE The TRACE model is a localist and symbolic connectionist model for spoken word recognition. It is called ‘localist’ because in its network each node represents a specific symbol or concept. Based on a (small) neural network that learns mappings between phonetic features, phonemes, and words. Activation based on overall similarity between input and word form in long-term memory rather than exact match at beginning of a word. TRACE allows for top-down influences, meaning that context and word expectations can help shape perception. Empirical studies Visual world paradigm Participants looked at visual displays on a computer screen while listening to spoken instructions, e.g. ‘pick up the beaker; now put it below the diamond’. Phoneme monitoring The task of listening for a particular target phoneme and pressing a button as soon as it is detected. Does it matter where in the word this phoneme occurs, and whether it is a real word or a pseudoword? - facilitation after uniqueness point in real words, hinting at feedback from lexical level. First, you could check a prelexical (phonetic) code, based on a direct analysis of the speech signal. Second, you could check whether there is a /p/ in the stored lexical representation of ‘telescope’. This would be a postlexical (phonemic) code. Now suppose someone asks whether there is a p-sound in the word ‘paradise’. In this case, the prelexical code would be available much earlier, because the /p/ is situated before the UP at which the lexical code can be determined with certainty. Thus, the prelexical code becomes available already before the word is recognized, but the postlexical code only after it is recognized. The role of context in word recognition - Autonomous models: word recognition is strictly bottom-up. - Interactive models: contextual knowledge can affect word recognition, but only after an initial set of candidates has been set up based on the bottom-up input. o I took the new car for a spin o Sentence contexts helps select between spill, spider, spin, etc. faster - Predictive pre-activation models: word candidates can be predicted based on context, even before word has started. o I took the new car for a spin o Sentence context boosts activation of spin Visual context The larger visual context in which a spoken word (in a sentence) is encountered influences how fluently a listener processes that word. Not only does the non-linguistic, visual context influence spoken word processing, but also preceding sentence context plays a role. Mental model of the speaker Listeners will typically construe a mental model of the speaker they are listening to. To what extent can such a mental model of the speaker influence spoken word recognition in the listener? The mental model we have of a speaker will influence the mapping of incoming spoken word forms onto meaning representations. Embodiedness The speaker’s body is important as a source of information for the listener. Both auditory and visual types of information contribute to what words we hear. Evidence from cross-modal priming - Neutral context: ‘They mourned the loss of their…’ - Biasing context: ‘With dampened spirit, the men stood around the grave. They mourned the loss of their…’ - Schip of kapitein? Recognizing printed and written words Writing systems and scripts Writing system: a set of scripts - Logographic (symbols map onto units of meaning, e.g. morphemes) - Syllabic (symbols map onto syllables) - Alphabetic (symbols map onto individual phonemes) Script: a system for writing language - Logographies: Chinese characters (incorporated in Japanese as Kanji, in Korean as Hanja) - Alphabets: English alphabet, Cyrillic alphabet, Thai alphabet, etc. Many modern scripts incorporate features of multiple writing systems - Numerals (1, 2 500) used across languages are in fact logographs - Chinese characters may contain markings that indicate their pronunciation - Japanese is a mixture of syllabic (katana, hiragana) and logographic Orthographies Shallow orthography: writing systems with a close one-to-one relationship between sounds and letters, one-to-many. E.g. in Finnish, one phoneme is represented by just one letter unit or ‘grapheme’. Deep orthography: relationships between letters and phonemes are (at times) irregular, opaque, many-to-many. E.g., English, -ough depends on preceding letters: tough, though, bough. Italian: a few exceptions, e.g., one-to-many: c -> /k/ or /t̠ʃ/ (concerto) Dutch: more exceptions, e.g., one-to-many: c -> /k/ or /s/ (concert), many-to-one: au, ou -> /au/; ei, ij -> /ɛi/, final devoicing: d at end of word -> /t/ (David) English: chaos! tough, bough, though, yacht, aisle, colonel A grapheme is not a letter. Grapheme is a abstract representation corresponding to a letter. It covers all different instances of a letter. Phones are actual speech sounds. Representations Letter features and letters Print consists of different font types that come in different sizes, boldness, and italics. Also in the visual modality we encounter a problem of variability. This problem is less severe for printed messages than for handwritten messages, which vary not only across language users but also within an individual. Morphemes In addition to features and letters, words can be segmented into somewhat larger units such as syllables and morphemes. The term morpheme refers to the smallest meaningful unit of a word. E.g. ‘tables’ contains an -s morpheme (here the -s makes the meaning plural). The majority of the words we encounter consist of two or more morphemes! Some common morphological operations: - Inflection: adding markers to indicate number, tense, aspect, etc. (verbs) or number (nouns) without changing the syntactic category of the word. o Talk -> talk-s, talk-ed, talk-ing - Derivation: changing the syntactic category of a word (noun-verb, verb-adjective, etc.) o Talk (V) -> talk-er (N); talk (V) -> talk-ative (A) - Compounding: combining stems to form a new word o Baby + talk -> babytalk; talk + show -> talkshow In the Printed sublexical representations we might represent the words as multiple morphemes. We might access those morphemes before we access the actual word. Are multimorphemic words stored and retrieved as one whole or in several pieces? Three theoretical possibilities 1. Full storage o All words, including multimorphemic ones, are stored as single units in the mental lexicon. 2. Full decomposition o All morphemes (un-talk-ative, talk-show) are stored and retrieved separately and then combined to form the full word. 3. Hybrid or dual-route o High-frequency words tend to be stored and retrieved as wholes (happiness), whereas low-frequency words tend to be decomposed (un-happi-er). o Regular inflections (e.g., -ed for past tense) are more likely to be decomposed than irregular ones (ran). Processes The visual word recognition process during reading starts as the incoming signal has been analyzed and abstractly represented. The Print and Written Signal Recognizer encode the raw scribbles on paper as actual letter representations (concrete printed symbols, called graphs or glyphs) that can be turned into abstract representations (graphemes) serving as input to the Word Recognizer. The Word Recognizer uses these grapheme representations and their combinations to look up the printed words in memory, where their meaning and other properties are found. Because printed words are clearly separated by blanks, the word segmentation process in the visual modality is relatively simple. One by one or parallel? Neighbourhood: the set of word categories that differ in only one letter position from a target word. For instance, neighbours of the word ‘wind’ are words such as find, wand, wild and wing. One possibility it that the different word candidates would be checked and searched in the mental lexicon in an order determined by their degree of activation. Depending on similarity to the input and the frequency of usage of the word. The word ‘find’ is more frequently used than ‘wand’ and ‘wind’ in everyday communication; and it may have been experienced in more contexts and thus have a higher contextual diversity or semantic diversity. However, another model proposed that many word candidates could be considered for recognition simultaneously. The activation of possible word candidates can be updated for all units in parallel when new information comes in. Models of visual word recognition Interactive Activation (IA) model The Interactive Activation Model (IA or IAC) is a localist connectionist network model with symbolic representations for letter features, letters, and words. The IA model assumes that the Signal Recognizer identifies possible letters in the input word by their visual features. The activated letter representations activate the words of which they are part and inhibit words they are not compatible with (e.g. ‘w’ activates ‘wind’, but not ‘find’ and not even ‘flower’, because the ‘w’ is in the wrong position). The IA model assumes parallel activation of word candidates. At the word level: active words compete: they reduce the activation of other activated words. This is called lateral inhibition. Each word has a ‘resting level’. If a unit has not been activated for a longer period of time, its activation level decreases (this is called decay). Bottom-up activation flow from letters to words, but also top-down feedback from words to letters. Word superiority effect Remember that phoneme monitoring is easier when the phoneme is part of a real vs. pseudoword. Similar for visual word recognition: letters (W) are recognized more quickly in words (WIND) than in random letter sequences (WNDF). A pseudoword superiority effect exists too: letters (W) are recognized more quickly in pronounceable pseudowords (WUND) than in unpronounceable pseudowords (WNDF). Explained by IA model as feedback from the word level to the letter level. Limitations The model considers only orthographic representations, but not phonological, semantic, or morphological representations. For instance, it fail to notice pronunciation differences in ‘pint’ and ‘mint’, their similarities in ‘bough’ and ‘cow’, and the presence of two morphemes in ‘baker’ (bake + r) but not in ‘corner’. The model also assumes absolute letter position coding. This implies that letters in words are assumed to be in a particular absolute position in the word. Spatial Coding Model The Spatial Coding model is a localist connectionist model that aims to explain the subprocesses involved in printed word recognition. It also assumes that there are distinct levels of representation for letter features, letters, and words, that these levels are connected, and that specific features, letters, and word are implemented as nodes in a large network. Readers need to: 1. Determine the identity of each letter and the order of the letters. 2. Look up word representations in the mental lexicon that match the input. 3. Select the best matching candidate. Relative position of letters (e.g., A before T) is more important than absolute position (e.g., A = letter 2). The model suggests that the reader encounters two sources of uncertainty during reading: - Letter position uncertainty: the reader is not completely sure about the relative position of the letter in the word relative to where their eyes fixate. Letter position uncertainty is greater the further away from the reader’s fixation letters are. - Letter identity uncertainty: depends on the degree of perceptual evidence in favour of a particular letter at a certain moment in time. The more certain readers are about the presence of a letter, the more they will activate or ‘excite’ that letter. Dual Route Cascaded model The Dual Route Cascaded model incorporates several important insights with respect to visual word recognition and word naming. Just as the other two models, it is assumed that in both alphabetic and non-alphabetic scripts, a direct orthographic route into the mental lexicon is possible. Direct route - Input letter string activates letters and orthographic word form that is then addressed in the mental lexicon. - The word’s phonological form is then retrieved from the mental lexicon, allowing for its pronunciation. Indirect route - Input letters and letter combinations activate their corresponding phonemes, which are then ‘assembled’ into a phonological representation of the word. - Assemblage takes time, making this a slower route. Third route There is actually a third route that could be followed from input to articulation: from the printed (orthographic) representation of the word to its spoken (phonological) representation via word meaning (semantics). 2nd route faster than 1st one when we see a word that we don’t know yet, low frequency word. - Activity spreads across both routes at the same time. - For high-frequency words, route 1 is usually fastest, therefore: o Irregular spelled high-frequent words (two, who, friend) can be named quickly. - For low-frequent words, route 2 may be faster, therefore: o Regularly spelled words (intoxication, splurge) are named somewhat slower. o Irregularly spelled words (yacht, aisle, quay) may lead to conflicting information from routes 1 and 2, causing even longer naming times. - When learning to read as a child or learning a foreign language as an adult, most written word forms are (subjectively) low-frequency, therefore extensive use of route 2. Evidence of activation of phonology in reading: - Masked priming studies: when prime presented for a very short period (14-29) orthographic priming observed (mert – MERE), but no phonological priming (mair – MERE). - When prime presented for a longer period (43 ms), orthographic and phonological priming observed (mert – MERE and mair – MERE). Evidence of activation of orthography during listening: - Phoneme monitoring (e.g., /k) in spoken stimuli faster when carrier word uses primary spelling (e.g., Dutch word ‘paprika’) than secondary spelling (e.g., Dutch word ‘replica’) Multilink model Developed to account for word recognition in multilingual populations. The model assumes that words from different languages are stored together in an integrated mental lexicon. Similarity between input and stored representations computed on the basis of Levenshtein distance. This is the number of operations (deletion, addition or substitution) that must be performed to transcode one letter string in another: - E.g. to change Dutch word ‘tomaat’ into English ‘tomato’ it requires removal of one ‘a’ and addition of an ‘o’. Levehnstein distance of 2 = 2 operations needed. Cognates: translation equivalents with form overlap (e.g. ‘work’ and ‘werk’ have approximately the same meaning). Ambiguity Homophones: words that sound the same (when spoken) but have separate, non-overlapping meanings. - See-sea, know-no, main-mane Homographs: words that have the same spelling (and sometimes pronunciation), but separate, non-overlapping meanings. - Bow, down, content Polysemous words: words that have multiple different but related meanings. - Wood (noun) – material that trees are made of - Wood (noun) – an area covered with trees - Film (noun) – a movie - Film (noun) – carrier of photographic images - Film (verb) – to record moving images of something Initially, all meanings for a given word are activated, but irrelevant meanings are quickly suppressed. We know this from cross-modal priming tasks. - But it is actually more complicated, depending on: o How dominant one lexical meaning is relative to another (pitcher vs. mint). o How strongly the context is biased toward one of the meanings. Equibiased: both meanings equally dominant (pitcher). Non-equibiased: one meaning more dominant than the other (mint). Sentence processing The essence Language processing proceeds sequentially and incrementally: word by word, or, in many languages, “from left to right”. Processing is not just sequentially. This is because different types of information can to some extent be conveyed in parallel. For instance, on the presentation of the word ‘Englishman’, we can find out over time that it consists of particular letters/sounds, has a particular pronunciation, is a noun, and has a particular meaning, depending on world knowledge, actual situation, and possibly sentence or discourse context. Syntactic parsing: the process of assigning a syntactic structure to the incoming words of a sentence during language comprehension. The Sentence Processor takes the form and meaning of individual words as its input, to be able for the listener or reader to combine these into larger syntactic and semantic structures for sentence and text/discourse levels. Roles - Grammatical/syntactic roles: subject, direct object, indirect object, etc. - Thematic/semantic roles: agent, patient, recipient, instrument, location, etc. Word order - In English, word order is the most important cue to the syntactic role of words o The default order is Subject-Verb-Object (SVO) o exceptions are possible, e.g., “You I don’t hate!” (OSV) - In other languages, word order is more flexible, because other cues (e.g., case marking) can also be used to mark grammatical roles o e.g., in German, these two sentences have the same meaning: o Der Fuchs frischt gleich den Hase (Lie: The foxNOM eats soon the hareACC) -> SVO o Den Hase frischt gleich der Fuchs (The hareACC eats soon the foxNOM) -> OVS; less common, but acceptable - Across the world, languages with SOV (48%) and SVO (41%) are the most common; some languages have VSO order; VOS and OVS are rare; and OSV even rarer - Why is this? Three linguistic principles: o Verb-object bonding principle: a stronger semantic bond between verbs and objects than verbs and subjects (intuition: actions need an object to ‘complete’ their meaning) keeps VO and OV next to each other o Topic-first principle: start sentence with the topic (and then add new information) favors putting the subject first o Animate-first principle: start with referring to living things favors putting the subject first Difficult sentences What makes sentences difficult to process? Among other things: - Embedding: putting phrases inside other phrases; this taxes our working memory capacity o Paintings are expensive o Paintings that artists make are expensive o Paintings that artists that billionaires pay make are expensive o Understanding them might rely more on semantics and world knowledge than on a full syntactic parse (see good-enough parsing) - Embedding can be central, see above, but it can also be asymmetrical. o ‘Tools made by man’: right-branching (most important information (tools) comes first)) o ‘Man-made tools’: left-branching (sentences are more compact and most important information can be immediately continued and expanded) - Ambiguity: words can have more than one meaning; selecting the wrong word meaning can cause a sentence structure to be built that does not align with the intended message o e.g., fire as noun or verb in New technology helps fire managers o e.g., raced as past simple or past participle in The horse raced past the barn fell ▪ Sentence can be paraphrased as ‘the horse, which was raced past the barn, fell’. ▪ Garden-path sentence: easier to understand if semantic information is added. Parsing: theories Garden-path model Principles - Incrementality: people do as much interpretive work as they can, based on partial information, and making possibly incorrect assumptions, rather than waiting until they have all the information to make a correct decision. - Serial processing: only one structure is considered at a time. o Stage 1: identify syntactic categories and build initial structure o Stage 2: asses outcome of stage 1 against semantic plausibility, prior knowledge, and discourse context; attempt a different parse if the outcome does not make sense (semantic/pragmatic) - Simplicity: no unnecessary structure; build the least complex representation. Processing heuristics Two strategies for keeping the structure as simple as possible: Late closure: If possible, continue to work on the same phrase or clause as long as possible. Minimal attachment: When more than one structure is consistent with the input, build the structure with the fewest nodes (simplest syntactic structure). Constraint-based models - Parallel processing: simultaneous activation of all structures that are consistent with the input. - The processor uses multiple sources of information (a.k.a. constraints), including syntax, semantics, discourse, and frequency. - Different structures compete for activation; the structure most consistent with the constraints receives the highest activation. - Garden-path effects (i.e., slowdown during reading) arise when constraints conflict with each other and no structure easily ‘wins’ the competition. Unlike the two-stage (syntax before semantics) Garden Path model, constraint-based models posit that all possible information can be used immediately to guide interpretation of an incoming sentence. In case of ambiguity, several possible structures are activated to different degrees. According to a one-stage model like this, longer reading times for garden path sentences are not a consequence of (serial) reinterpretation. Competition rather than reanalysis is proposed to explain processing difficulties. Subcategory preferences A. The student realized his dream of graduating cum laude was out of reach. B. The student realized that his dream of graduating cum laude was out of reach. - According to the garden-path model, the colored part should be more difficult to process in sentence A than in sentence B, because processing heuristics favor a structure that turns out to be incorrect. - But evidence shows that sentence A is not more difficult to process than sentence B. Why? - According to constraint-based models, this is because we keep track of which verb tends to go with which syntactic structure; the verb realized appears with a sentence complement (rather than a direct object) about 90% of the time. Reduced relative (RR) - In English relative clauses, the function word (complementizer) that (or which) is optional o I didn’t see the match (that/which was) discussed in the football talk show. o The man (that was) approached by the police officer ran away. o The horse (which was) raced past the barn fell. Word meaning 1a. The man recorded on the tape could hardly be understood. 1b. The man that was recorded on the tape could hardly be understood. 2a. The message recorded on the tape could hardly be understood. 2b. The message that was recorded on the tape could hardly be understood. - According to the garden-path model, sentences 1a and 2a should be more difficult to process than 1b and 2b, respectively. (Why are they difficult?) - According to constraint-based models, comprehenders immediately use their knowledge that inanimate nouns (‘message’) are unlikely agents but likely patients. They should therefore favor a reduced relative interpretation. Hence, sentence 2b should not be more difficult to process than sentence 2a. Note: this parsing difficulty is triggered by lexical ambiguity. Recorded can be both past simple and past participle. (Not all verbs have this property, e.g., ate-eaten, stole-stolen, drank-drunk, etc.) In this example, language users cannot use word form as a cue to decide between the main verb or reduced relative interpretation. - broom: expected lexical item, expected thematic role (instrument), minimal attachment (+++) - solvent: unexpected lexical item, expected thematic role (instrument), minimal attachment (-++) - manager: unexpected lexical item, unexpected thematic role (accompaniment), minimal attachment (--+) - odor: unexpected lexical item, unexpected thematic role (attribute), non-minimal attachment (---) no additional slowdown for ‘odor’ vs. ‘manager’ semantics matter (at least) as much as syntax! Referential context The burglar blew up the safe with the rusty lock. - Processing difficulty for the sentence-final NP (‘the rusty lock’) when read in isolation, because it violates minimal attachment. The burglar was planning his next job. He knew that the warehouse had two safes. Although one was brand new from the factory, the other one had been sitting out in the rain for ten years. The burglar blew up the safe with the rusty lock. - Processing difficulty for the sentence-final NP (‘the rusty lock’) when read in isolation, because it violates minimal attachment. - When read in context, processing difficulty disappears. - Inconsistent with serial models (garden-path model), consistent with interactive models (referential theory, constraint-based model). Visual context Prosodic cues When Roger leaves the house is dark. - When Roger leaves the house is dark. - When Roger leaves the house it’s dark. - Evidence: prosodic cues (tones, breaks, etc.) that mark constituent boundaries do help to reduce ambiguity. Referential + visual context + prosody Constraint-based models: summary - A constraint-based parser can activate multiple syntactic structures simultaneously. - It ranks different structures based on how much evidence is available for each in the input. - Evidence for a given structure and its semantic interpretation can come from multiple sources, including referential context, visual context, subcategory frequency information, and the semantic properties of specific words. Integration vs. prediction When reading or listening to a sentence, incoming words are incrementally incorporated or integrated within the meaning representation of the unfolding sentence. Each new incoming item can be fit into the syntactic and event structures corresponding to the unfolding sentence. This is a clear bottom-up processing strategy. At the same time, it is possible to make predictions of what is going to follow next on the basis of past experiences with processing sentences and the situation at hand. For instance, hearing a sentence such as ‘the boy will eat the …’ may lead you to predict that something edible will follow. Actively predicting what type of information is going to follow, and potentially already even pre- activating expected linguistic representations, is a clear top-down processing strategy. During sentence processing, bottom-up integration and top-down prediction likely both play a role, for instance allowing listeners to start preparing a response to an incoming sentence before the end of the speaker’s turn. Good-enough parsing - People do not always perform full or detailed syntactic parsing and mostly rely on semantic information to construct an interpretation. - “The mouse was eaten by the cheese” o when asked to transcribe or put into active voice (“The cheese ate the mouse”), people often come up with the wrong interpretation (such that the mouse ate the cheese) - “While the hunter was stalking the deer in the zoo drank from the puddle” o when asked if the hunter stalked the deer, people tend to say “yes” - What we parse in detail depends on what is important Q: Who orders a taxi after the party? A: After the party order the rather angry guests a taxi A: After the party orders the rather angry guests a taxi C (‘the dog and the foot move above the kite’) - Implication: people don’t always plan utterances in full, but rather in chunks - In C you have more formulation to do than in B Syntactic prominence - Active prime: one of the fans punched the referee o People are more likely to describe the picture as: lightning struck the church - Passive prime: the referee was punched by one of the fans o People are more likely to describe the picture as: the church was struck by lightning Note: effect sizes for this experiment were not very large Lexical accessibility Semantic priming Prime: TIGER - People would describe the picture as: A lion is approaching a man Prime: WOMAN - People would describe the picture as: A man is being approached by a lion Articulation Approach (lion, man) - A man is being approached by a lion o Insert sound from presentation o We plan this in syllables rather than in phonemes Evidence for stages in production Tip-of-the-tongue (TOT) state - Diary methods - Prospecting - During most TOT experiences, speakers: o Accurately predict whether they will come up with the correct word soon o Report the correct number of syllables o Accurately report the first phoneme o Are more accurate about the beginning and end phonemes than the middle o Report words that sound like the target - In general, speakers: o Have more TOTs for less frequent words o Resolve about 40% of TOTs within a few seconds to a few minutes - TOTs suggest we first retrieve a concept from long-term memory and then ‘fill in’ the sounds (i.e. phonological encoding as separate stage) Speech errors: common types - Substitution: a unit is replaced by an ‘intruder’ o Can you hand me the fork? -> Can you hand me the table? o At the top of the stack of books -> At the bottom – I mean top of the stack of books - Exchange: two units change places o A computer in our own laboratory -> A laboratory in our own computer o Turn the corner -> Torn the kerner - Anticipation: a unit is produced too early, sometimes replacing another unit o Bed and breakfast -> Bre(a)d and breakfast o Such observations -> Sub – such observations - Perseveration: a unit is repeated, sometimes replacing another unit o Irreplaceable -> Irrepraceable Spoonerisms - The initial sounds or letters of two or more words in a phrase are swapped, often creating humorous or nonsensical results. - Three cheers for our queer old dean! o Three cheers for our dear old queen - The Lord is a shoving leopard o The Lord is loving leopard SLIP technique - Spoonerism of Laboratory-Induced Predisposition - Trick the speaker into producing errors by giving phonological primes (‘interference sets’) - Participant might end up saying ‘cosy nooks’. Speech errors: observations 1. Speech errors occur at all levels of the language production process a. Concepts: this is my dog -> this is my cat b. Morphemes: naturalness of rules -> nationalness of rules c. Phonemes: keep a tape -> teep a kape 2. Not all speech errors that one can think of actually occur in real speech a. Slip -> flip but not slip -> tlip 3. Units that ‘interact’ in speech errors are of the same linguistic type a. Nouns with nouns, verbs with verbs, phonemes with phonemes, etc. 4. Units involved in speech errors have characteristics in common a. Semantic similarity: this is my dog -> this is my cat b. Phonological similarity: keep a tape -> teep a kape Speech errors: implications 1. Speech production processes are sequentially ordered a. A weekend for maniacs -> a maniac for weekends i. Final -s in weekends is pronounced as [z], suggesting we first specify the morphemes (e.g., stem + plural marker) in the word, and then the sounds in the morphemes 2. There is interactivity between different language production processes a. Mixed errors are very frequent: substitutions that involve both semantic and phonological similarity i. Stop -> start more common than stop -> begin or stop -> shop b. Lexical bias effect: speech errors more often than chance lead to existing words i. Big feet -> fig beet more common than big horse -> hig borse Interactivity: lower levels ‘talk back’ to higher levels Attraction errors - Finish this sentence: o The keys to the cabinet… is on the table o The key to the cabinets… are on the table o The label on the bottles… are hard to read o The baby on the blankets... is crying ▪ (are crying is less likely here, because there are probably many labels (a 100 bottles also means a 100 labels), however it is not that there are 20 blankets and 20 babies) The morphosyntactic process of constructing number agreement uses grammatical information but also conceptual information from the mental model! Picture naming - Different concepts are activated at the same speed. a. Picture recognition speed is constant, regardless of word frequency. b. Picture naming speed depends on word frequency: less frequent names take longer, suggesting it is the retrieval of the corresponding word form that takes longer The distractor affects the amount of time needed to initiate the naming process. The time between the onset of the picture and the onset of the distractor is called the Stimulus Onset Asynchrony (SOA). The influence of the distractor on the picture naming RT depends on both the SOA and on the type of relation between distractor and target. When the distractor is semantically related to the picture, interference is assumed to occur (only) when both the target word and the distractor are in a semantic processing stage. When distractor and target are phonologically related, the distractor influences target processing only when both are in a phonological processing stage. - Semantically related distractors will result in a slower RT (relative to the unrelated distractors) at an early SOA. o This is due to conceptual selection. - Phonologically related distractors, will result in a faster RT (relative to the unrelated distractors) at a late SOA. o Rather than competing for selection, it represents the same sounds which makes it easier. But this only helps later on, when people already made the decision to go with apple. - In contrast, if semantic and phonological processes interact, both effects would occur at the same SOA. Interpretation: speakers retrieve the lexical concepts for the entire utterance before they start speaking, but plan the sounds only a few words ahead. Self-monitoring and self-repair - Self-repair happens after an overt error; self-monitoring helps prevent overt errors. - When speakers make an error, they often replace the error with the correct word with no delay, or nearly no delay - Because speech planning takes time, the plan for the correction must be undertaken as the error is being produced; therefore, the error must have been detected before it was spoken. - Monitoring pays special attention to possibly embarrassing (‘taboo’) outputs: o ‘Tool kits’ is less likely to lead to a sound exchange error than ‘pool kits’ o Galvanic skin response is higher during (correct) production of ‘tool kits’ than ‘pool kits’ - Error-detection is best when planning load is lightest; i.e. at the ends of phrases and clauses Gestures and speech: interface hypothesis Human communication is multimodal in nature: speakers combine visual and auditory signals to get their message across. Communication planner: where do I need to use gestures and where do I only produce sounds. - The distribution of information over the auditory and the visual modality may depend on context: in a noisy bar you may rely more on hand gestures than on speech. The Message Generator comes up with a proposition that can be verbally formulated and is therefore concerned with the linguistic information that will be conveyed. At the same time, an Action Generator selects and uses information from the environment and/or Working Memory to generate a gesture. The model assumes that the spoken and gestural part of a message are in contact with each other. Spatial and to-be-spoken information interface: the Message Generator and Action Generator exchange information bi-directionally. The Message Generator may learn from the Formulator that certain information is not easily expressible in speech, after which that part of the message may be taken up by the Action Generator and conveyed visually. According to the model, what a hand gesture looks like exactly depends on three variables: - The communicative intention of the language user. - Information taken from the environment and/or Working Memory. - Information or ‘online feedback’ provided by the Message Generator. The model underlines that human language is non-modular. Language, perception, and action are tightly interwoven. Contextual errors Freudian speech errors can be considered as effects on language production from systems ‘outside’ language, in particular the emotional system. Due to the placement of fake electrodes and some remarks about quirky apparatus, participants became slightly anxious and produced speech errors like ‘damn shock’ more often when given a word pair like ‘sham dock’ than under neutral conditions. The presence of an attractive assistant led to more errors on a word pair like ‘past fashion’, being produced as ‘fast passion’. Contextually relevant concepts and emotions can pre-activate conceptual units used for speaking. Audience design Audience design: the specific message people produce is tailored to the (assumed) knowledge state of their addressee. Both the knowledge state and the age (child vs. adult) of a recipient are taken into account when people design and produce their multimodal messages. - Explain how to make coffee to an imagined a) adult with some knowledge about coffee making, b) a novice adult, c) a child. - More words used in explanations for b and c than for a. - Also higher density of iconic gestures for c than for a. Multilingualism Why study multilingualism? - More than half of the world’s population speaks more than one language - Multilingualism is the norm - Multilingualism is said to postpone the onset of dementia with 4-5 years - Language switching and language control keep the brain in good shape Some definitions - Bilinguals versus multilinguals versus polyglots o Polyglots: mastering a lot of languages - Early versus late bilinguals o Early: growing up with 2 languages o Late: grow up with 1 language and then later learn a second language (e.g., in school) - Language proficiency o How good you are at speaking, listening, reading in a language o Early bilinguals are very proficient o Late bilinguals can be proficient - Another distinction is between unbalanced and balanced bilinguals o Unbalanced: more proficient in one language (their dominant language) o Balanced: mastering two languages at a similar level of proficiency - Unimodal versus bimodal bilinguals o Unimodal: bilinguals who master two spoken languages o Bimodal: bilinguals who master one spoken language and one sign language ▪ When you are bimodal you can use two languages at the same time (speaking and signing) Bilingual children learning language Do monolingual children learn a language quicker than bilingual children? - Vocabulary size depends on amount of input - Initially: smaller vocabulary per language - Eventually: larger vocabulary overall - Metalinguistic awareness o If you grow up with two languages you realize there is multiple words for one item (E.g., Dutch, English and French word for ‘table’) o Children are more aware of how language works Multilingual mind - How does language work in the multilingual mind? - Concepts (largely language - independent) - Words (in L1, L2, L3, etc.) - Sounds (idem) - Grammatical structures (idem) - Gestures (idem) The Language User Framework - Long Term Memory o When you’re bilingual, there goes more into your Long Term Memory - Language Comprehension - Language Production o How do you make sure you select the right word (in which language) in the right context? - Conceptual System o Language independent o Many concepts exist in many different cultures (languages) in the world Revised Hierarchical Model - Focus on changes over time o If you get good at a language, you might not need your L1 to understand a word or sentence - Intuitive: o It matches how you experience things o If you learn a new word in L2, you do this via your L1 ▪ E.g. fraise -> strawberry -> meaning and concept of the word - Not very detailed - Outdated: model suggests separate lexicons o However, there is not one store for each language (all words are stored together) - Concepts are language independent (E.g., concept of a tree, lecture, teacher, etc.) - This model is for unbalanced bilinguals - Solid line: strong link - Dotted line: weaker link (less likely to go that direction) Mechanics of the multilingual mind - When you read or hear a word from one language, word candidates in your other language become active as well and compete for recognition. - Language non-selective access to the multilingual lexicon. o All words of any learned language become activated and compete. Cognates: words that more or less look identical and also mean the same thing (e.g., tomato and tomaat). - You recognize these words faster. Neighbors: words that more or less look identical but do not mean the same thing (e.g., neus and news) - You recognize these words slower, because it makes a bit more time to make a distinction between the words Translation equivalent: words that do not look identical but mean the same thing. - These words do not become activated at the same time, because they do not look identical. Unrelated word-pair representation: words that do not look identical and do not mean the same thing. - These words do not become activated at all. False friends: word pairs that have a large (often coincidental) form overlap between languages, but no meaning overlap. Experimental evidence: cognates - French-English unbalanced bilinguals - Visual lexical decision in English - Identical cognates (like “taxi”) - Word frequency of the cognates in both languages plays a role o Assassin not so frequent in French than it is in English o High frequency words are recognized most quickly if they are equally frequent in both languages - Impossible to completely “switch off” a language you do not need at a given moment If there were separate lexical storages of the languages, than cognate words should not be recognized faster than non-cognate words. Experimental evidence: cross-linguistic neighbors Reading the English word ‘pork’, in a Dutch-English bilingual not only its neighbors ‘cork’ and ‘park’ in English would become activated, but also ‘vork’ and ‘pook’ in Dutch. Increasing the number of orthographic neighbors slows down response time for target words. The Bilingual Interactive Activation + (BIA+) model This model focuses on the recognition of orthographic word forms. - Orthography - Phonology - Semantics - Language membership Language nodes: every word you know gets a tag (L1 or L2) Model works the same for monolinguals, however there is a bit more competition for bilinguals. The BIA model extended the monolingual IA model by combining all words from the first and second language into a shared lexicon and adding representations specifying the language of each word. The model was not able to simulate the recognition of cognates, due to the absence of implemented semantic representations. It is also not equipped for simulating L2 word learning. The Multilink model This model not only incorporates implementations for phonological and semantic lexical representations, but also provides some (simple) implementations of the control structure necessary to perform different tasks. For instance, the degree of lexical competition between neighbor words from the same or other languages under different task situations can be fine-tuned by varying the degree of ‘lateral inhibition’ in the model. A multilink simulation of language processing A computational model - A Dutch-English bilingual reading the word “dog” - Translation into Dutch - Representations become active over time - Orthography before phonology before semantics You first read a word, than the word form becomes active, and finally word meaning is activated. The bilingual brain - More or less the same brain areas & networks involved in both languages - Need for cognitive control: inhibiting the contextually irrelevant language Language control - As a multilingual, you need to make sure you express your thoughts in the right (context -appropriate) language - You also need to activate the right meaning of a false friend (like “room”) - Monitoring and control Bilingual language switching - Bilinguals often switch between languages o Within sentences o Between speakers - How do bilinguals select the right language to speak? - What are the control mechanisms involved in bilingual language production? Experiment (Meuter & Allport, 1999) - Participants: 16 unbalanced bilinguals - Task: Name numerals (1-9) in L1 or L2 depending on color of rectangle (or flag or background color etc.) o Blue: first language o Yellow: second language - The classic result o Overall switching takes a bit longer (RT higher) o If you switch into your L1 this actually takes longer than switching into your L2 ▪ Why? ▪ Their theory was: to use your L2 you need to suppress your L1 which takes less effort than suppressing your L2 when switching to L1 ▪ When speaking in the second language (L2), bilinguals need to suppress their dominant language (L1) to avoid interference. Since L1 is so strong and automatic, a lot of mental effort is required to inhibit it. ▪ When speaking in the second language (L2), bilinguals need to suppress their dominant language (L1) to avoid interference. Since L1 is so strong and automatic, a lot of mental effort is required to inhibit it. - Those are called asymmetrical switch costs - Reactive inhibition Valid criticism of the paradigm - Language switching in the lab does not reflect everyday intentional language switching. - Switch cues (colors, flags, rectangles, etc.) are artificial cues. o Language switching in everyday life is not induced by artificial cues. - Cue-induced switch costs might not reflect basic mechanisms involved in bilingual language production. A virtual reality approach (Peeters & Dijkstra, 2018) To make the paradigm a bit more natural Experiment 1: The baseline experiment - 24 Dutch-English unbalanced bilinguals - Cued picture naming - 40 trials per condition - Picture naming latencies Results experiment 1 - Switch costs: switching takes longer than not switching - Reversed language dominance o L2 English is faster in this task than L1 Dutch o All of a sudden the Dutch natives were faster in English than in Dutch Two mechanisms proposed - Reactive inhibition o Evidence: Switch costs - Proactive inhibition of the stronger L1 o Evidence: Reversed language dominance Experiment 2: the VR baseline experiment Results experiment 2 - Switch costs - Reversed language dominance The same task but in VR, found the exact same outcome. Experiment 3: towards improved ecological validity VR with animated persons involved Results experiment 3 - Switch costs - Reversed language dominance Results were still exactly the same Experiment 4: a replication of experiment 3 The exact same results Overview of the results - Stable results across four samples of the same bilingual population. - Proactive inhibition of the stronger, native language to allow for more fluent language production in the less proficient L2. Valid criticism of the paradigm - Confound between cue-switching and language switching o Maybe the switch cost had to do with the listener (switching from one person to the other) and not the language Experiment 5: the four avatar experiment - 48 Dutch-English unbalanced bilinguals - Cued picture naming - 40 trials per condition - Picture naming latencies (+EEG) Results experiment 5 - Switching languages comes at a cost, over and above the significant cost of switching listeners Ongoing work: increased realism Graphics look better nowadays and become more ecologically valid. Recent developments Last chapter Chapter 1: basic assumptions Basic assumption 1: embodiedness - We use our body to communicate - Our body offers us different channels - These channels come with limitations - We typically use different channels at the same time - Information travels from brain to brain via the world and the senses The MultiModal Language User Framework Framework might be limited: where are the hand gestures, facial expressions, etc. The Framework was mostly just about language. - The framework extended to account for bodily signals (e.g. hand gestures, facial expressions) - Language comprehension: from signal to sign - Language production as signal production Gesture, facial expression, spoken word, emoticon in text message -> all could be a sign. - So maybe we should talk about sign recognition instead. Going beyond just seeing language as words. Basic assumption 2: embeddedness Language is always embedded in something Embeddedness: linguistic context Words are embedded in sentences. - You can start predicting or integrating - Supra-segmental representations: (segment is sound): (supra; intonation) - Speech is embedded in gestures, non-verbal (posture, smell), non-communicative (e.g., walking around while speaking) Basic assumption 3: incremental processing - Language reaches your eyes and/or ears step by step - It enters your brain via your senses piece by piece - Language also leaves your mouth bit by bit - So it must also be processed incrementally Basic assumption 4: mental models - Language does not work independently from other ‘cognitive faculties’ - Langua

Psychology of Language Summary Midterm Exam PDF

Document Details

Tags

Related

Summary

Full Transcript