Psychology of Language Summary Midterm PDF

Psychology of Language summary midterm Introduction Different levels Language can be studied in a number of different ways, at different levels with different aims. - Phonetics (the study of raw speech sounds) - Phonology (the more abstract study of sound categories in a language) - Morphology (the study of words and word formation) - Semantics (the study of meaning) - Syntax (the study of grammatical properties) - Pragmatics (the study of language use) - Discourse studies (the study of language in interaction) Definition of language “A system of form-meaning pairings that can be used to intentionally communicate meaning” - System: there is a structure to the madness - Form-meaning pairings: of different sizes, at various levels of specificity - Use: different modalities, production and perception - Intentionally: producer wants to achieve something - Communicate meaning: almost anything can be expressed Form-meaning pairings - For instance: words (like tree) - But also smaller elements (like the -s in elements and words) Language use - Language is spoken and heard, signed and seen, and written and read. - Language is acquired, learned, and sometimes forgotten or lost. - We know that we can use language - We don’t necessarily know how we do it: there is much about which we are unaware Intentional communication - We use language to exchange information, to express emotions, to get others to do something, etc. - This makes language very relevant for students of communication sciences. Communicate meaning - We can communicate almost anything - Language makes it possible to make a thought travel from my mind to yours and communicate meaning Basic assumptions Key aspects for understanding language and its use: - Humans are embodied creatures. While they are communicating, they use their body (their mouth, ears, eyes, hands, torso) extensively. - People are embedded in different social contexts. - Each human has its own mental model (i.e. a mental representation or interpretation of an external situation). - Incrementality (or incremental processing). Embodiedness Humans process language by way of their bodies: - Our body offers us different channels - These channels come with limitations o Articulation differs o Speaking and online chatting are faster than writing a book or typing a letter o Intonation differs (e.g., capital letters in print) - We typically use different channels at the same time - There is non-verbal information that is conveyed through the body, for example, via posture, clothes, smell, and facial expressions. - Information travels from brain to brain via the world and senses Non-illustrated examples are smell (soap, perfume) and touch (a handshake or high five). Embeddedness The language we use is typically embedded in a larger context. People constantly formulate and update (their “mental models”), the perspective of their bodies and personalities is in a central position, but it interacts heavily with the environment. This makes their language and models of reality embedded. Non-linguistic context Physical - Spoken utterances are affected by bodily limitations, such a the shape and momentum of the speech organs. - Assimilation and co-articulation imply that sounds may change depending on the context they occur in. ▪ E.g., pronouncing words ‘want to’ as ‘wanna’ and ‘handbag’ as ‘hambag’. - Our sentences and words may refer to people, objects, and actions in the present environment: o ‘This road’, ‘she is running’. Thus, the meanings and world knowledge we use can be related to this physical context and this time. o Our interpretation is also often context dependent: the word ‘here’ typically refers to this specific place and time depending on the conversation. Social and cultural context - For instance, using a special type of child directed language, called ‘baby talk’, ‘parentese’, or ‘motherese’. - The phenomenon of ‘cultural frame shifting’. o An impact of the cultural norms of a particular language community on the speakers’ personality as they themselves or others perceive it. In simple terms, bilinguals may feel they have two somewhat divergent personalities one for each of their languages. - Not only verbal aspects of communication play a role, so too do non-verbal aspects. o For instance, eye contact is important to establish whether the listener is following and understanding what you are talking about. o Body language may convey information about the relationship between speaker and listener, motivation and power, attentiveness and interest, etc. Linguistic context Our knowledge about the different levels of the language in use (e.g. syntax or semantics). - Sounds or letters are typically part of larger linguistic units such as words. - Words often appear in the context of a larger phrase or sentence. - Sentences are often part of larger stretches of text or discourse. Incremental processing Pieces of information are processed incrementally, i.e. as a series of different elements that follow one another in time and build on each other. - Language reaches your eyes and/or ears step by step - In enters your brain via your senses piece by piece - Language also leaves your mouth bit by bit Efficient processing implies that receivers do not wait until the utterance is finished, but process all information they can at each moment in time and predict what is coming next. This implies that syntax and semantics must be interwoven: as soon as the first words of an utterance come in, both their (lexical) meanings and their role in the sentence are established. Mental models As a hum you try to build a coherent internal model of the world, a simplified approximation of aspects that are relevant to you for physical and social survival. In the most simple mental model, your current situation is represented in terms of various aspects: - Physical: being aware of gravity, colour, resistance of material objects and loudness - Biological: perceiving things emotionally, via your different bodily senses, and by movement - Psychological: what is happening has an emotional value or an abstract meaning - Sociological: noticing your role in the ongoing dialog dependent on your background and the empathic relation between you and the person you speak with All these levels are intertwined: - E.g. you may speak more loudly (physical) when you are angry (biological and psychological), because your dialog partner just ridiculed your favourite soccer team (sociological). Mental models will contain both abstract and embodied information. - E.g. when the Chinese flag is mentioned in a news item, the mental model for the situation might add the colours red and yellow to its content. Information travels from the mental model of the sender to the mental model of the receiver. Consequences of this view on language These assumptions stress that humans make use of a variety of bodily channels to communicate about aspects of the rich meaning structures present in their minds. As such, this approach breaks with a tradition that conceptualizes language as a purely abstract system of rules and representations. Language and communication The term ‘communication’ has a broader application than the term ‘language’. Communication has been loosely defined as the exchange of information, ideas, or feelings. Communication can be seen as a sort of transaction: the participants in the dialog (technically called ‘interlocutors’) are creating meaning together, here affective meaning. Communication can also take place in order to influence other, to develop relationships, and to fulfil social obligations. Verbal and non-verbal communication There is no ‘hard’ distinction between verbal and non-verbal aspects of a message. Our cognitive systems are non-modular. In both the production and perception of communicative messages, information (e.g. visual, auditory) from different modalities is often combined to form composite signals. E.g., pointing and saying “the train station is over there”. Types of hand gestures - Iconic gestures: resemble what they mean, it looks like what it means o The word ‘beep’ sounds a bit like an actual beep: its spoken form resembles what the speaker means when saying it. o A speaker may bring her hand to her mouth as if holding a glass while talking about her night out at the pub. - Emblems: certain gestures that have a certain meaning in certain cultures (for example the thumbs up sign might mean something else in a different culture) - Pointing gestures: shifts somebody’s attention to something else o Such a relation between a sign and its referent is called indexical. - Beat gestures: rhythmic movements people make while speaking (aligned with the melody of the speech) Besides this, many spoken and written words are verbal signs that convey their meaning to a large extent in a symbolic way. The word form ‘tree’ does not look like or sound like a tree. Using the body for communication - Language as multi-modal and multi-channel in nature - No clear distinction between verbal and non-verbal cues - Segregation and binding (finding out which signal has meaning and which doesn’t) o Segregation: a addressee needs to segregate communicatively intended information from non-communicative signal during language comprehension. o Binding: the addressee needs to be able to ‘bind’ or integrate information that is communicated by a speaker through different modalities in an online fashion. - Example: lip movements and beat gestures The McGurk effect A perceptual phenomenon where what we see influences what we hear. When the visual input (lip movements) mismatches the auditory input (speech sounds), our brain integrates both, often creating a third, distinct perception. For example, hearing "ba" while seeing a person mouth "ga" may result in perceiving the sound as "da." This effect demonstrates the strong interaction between auditory and visual speech cues in shaping perception. - Evidence that people bind different signals (verbal and non-verbal) - Binding of lip movements and sounds Manual McGurk effect DEMrof versus demROF PERmit versus permit - Stress on different syllables in combination with beat gestures - Depending on where the beat gesture was, people heard different words (either PERmit or perMIT) - So beat gestures influence how you hear words Language and communication in context When communicating, the participants heavily rely on mutual, shared knowledge or common ground. Communication often takes place in noisy situations. Noise is any stimulus, external or internal, that disrupts the sharing of meaning. - It can be external, as when distractions happen, - Or internal, for instance, daydreaming or being on edge. - Semantic noise: unintended meanings that the perceived utterances evoke, when I say that we had a ‘nice date’, you might think that I found it only so-so, while, in fact, I really enjoyed it. Encoding: when ideas and feelings are turned into messages. Decoding: turning messages into ideas and feelings. Back-channelling: verbal and non-verbal reactions to messages to indicate if the message is seen, heard and/or understood, so that communication can proceed or be adapted. This is context-dependent and sometimes ambiguous. For instance, when someone nods this could mean ‘go on’, but also ‘I agree’. Displacement: talking about entities that are no longer present in our immediate environment. Sender-receiver model The medium and the message The medium is the message Most of the time, we pay attention to the content of a message (e.g. what is said), but the form of the message (e.g. how it is said) is an important co-determinant of how much and which information is transferred. In fact, the form of the message is therefore a part of the message. - Radio versus television - E-mail versus video calls Language extends our senses - Motor behaviour: sentences can be seen as representing actions (‘I fell’). - Perception: for instance by pointing out what the body sees at a distance (‘look at that star’). - Emotion: expressing what the body is feeling (‘I feel sad’). - Memory: retrieving what happened in the past ( ‘Plato said…’). The medium is the massage When a new medium is invented, it sometimes brings about radical changes in human society. The new media create specific environments, changing old concepts and putting new rules on information transfer. A new medium ‘massages’. - Effect of smartphones on society - New media prioritize some of our senses over others (these days our ears become more important than our eyes, e.g. listening instead of reading) - Attention span and reading skills Language without sound: sign language - Handshape - Location - Orientation - Movement Alternative: fingerspelling (relies on spoken language, for each letter there is a different sign) Learning phonemes in a sign language - Very similar language acquisition through sounds - Sign “babble”: different from other non-linguistic hand movements - Difference real sign vs. pantomime (iconic gesturing that is done for communicative purposes in the absence of speech) - First comprehensive signs seem to be acquired slightly earlier than the first spoken words - Note: cochlear implants (implant in brain to hear) Evolution of language - Languages are/keep evolving - Languages can have a common ancestor (Germanic, Celtic, Slavic, etc.) Gestural origins of language? Some people say there was gestural language before spoken language: - Understandable under noisy conditions - Silent themselves - Iconic form-meaning mappings Vocal origins of language? Other researchers say there are vocal origins of language: - Communication in the dark - Communication over large distances - To some extent: iconicity Pointing Pointing gestures must have been super important in early stages of language evolution: - Important for joint attention following an urge to communicate - Joint attention is important for creating link between word and referent - A universal property of human communication Animal communication research - Comparing human language with how animals communicate - Maybe that is what our language looked like before we started developing it - Basic conclusion about animal communication: almost all of them have something that we also see in human communication Cross-species comparisons - E.g. Apes vs. humans - What is similar? What is different? - They are using language as a social basis Bonobo Kanzi - Kanzi makes different vocalizations in the context of different objects (e.g. juice vs. bananas vs. grapes) - Language or communication? Behaviour ≠ knowledge - More and more evidence of human-like nonlanguage behaviour in animals (e.g., mourning elephants), so humans may not be so special after all. - Should we actually be taking such an anthropocentric view? Language User Framework How does language work in the human mind? - Language comprehension - Language production - Memory - Language versus thought - Conceptual system would be the location of your mental model (the way that you think etc.) According to the book, four major types of tasks: - Memory retrieval: finding different linguistic representations in Long Term Memory - Processing representations: stepwise recoding of internally represented linguistic input into thought, or vice versa - Using Working Memory: temporarily storing linguistic half-products during recoding - Exerting cognitive control: managing and monitoring ongoing processes (e.g. by checking, shifting attention, or inhibiting representations) Language comprehension - Input - Signal recognizer - Word recognizer - Sentence processor - Text and discourse comprehension - Mental model - Bottom-up or data-driven and signal-driven Signal recognizer Small units (e.g. sounds or letters) must be recognized in the signal and represented faithfully. In spoken language the signal is speech: - This signal is continuous o For written language there are short breaks (e.g. gaps between words and sentences). Therefore written language is not continuous. Recognizing what is in the signal - Knowing which sounds exist in the language at hand - Looking for meaningful sound distinctions - Making use of phonotactic constraints o Str versus sbr o Sequence of sounds o You know that there are existing words starting with str (e.g. street), but not with sbr. Therefore you know that after the s a new word starts with br. - Pattern example: final devoicing o Bed versus bed o E.g. in Dutch you wouldn’t pronounce the letter d, in English you would - Statistical learning o You have learned stuff overtime, because you’ve got a lot of input from people (everyone speaks all the time) o You start to notice certain patterns Word recognizer On the basis of the represented speech or print, words and all their properties must be looked up in the ‘mental lexicon’, the store of lexicon-semantic knowledge in Long Term Memory; this is handled by a Word Recognizer. Sentence Processor The syntactic structure of sentences must be established by a Sentence Processer. The semantic (meaning) structure of the sentence message must also be determined. - Syntactic (grammatical roles) o The word order o Is the sentence active or passive? - Semantic (thematic roles) and pragmatic (intended meaning) - Syntactic and semantic integration o Putting the different elements into a sentence Language comprehension: reading Letter-to-sound mapping - A.k.a. grapheme-to-phoneme mapping For many languages: no one-to-one mapping between written letters and spoken sounds! - E.g. if the gh sound in enough is pronounced f and the o in women makes the short i sound and the ti in nation is pronounced sh. Then the word GHOTI is pronounced just like FISH. Reading versus listening - Parallel (reading) versus sequential (listening) processing - From signal to word to sentence recognition Production Focus on production after the midterm - Mental model - Text and discourse production - Grammatical encoder - Phonological encoder - Articulator - Output - Top-down and knowledge-driven Order of language production - Thoughts and intentions (i.e. semantic structures) are formulated within the communicative context by the Conceptual System. - Appropriate word units and syntactic structures that convey the message must be built by a Grammatical Encoder. - The selected words and sentences must be specified with respect to their sounds by a Phonological Encoder. - Utterances must be articulated by an Articulator in close synchronization with the articulation of communicative signals through modalities other than speech (e.g. co- speech gestures). Parallel and sequential processing For instance, when we listen to the speech signal, the Signal Recognizer could process phonemes on the basis of which the Word Recognizer might activate words and their meaning. It could be that during listening, the Word Recognizer is still processing the previous word, while the Signal Recognizer is already at the same time trying to determine the phonetic characteristics of the following one. The flow of information during comprehension is mainly bottom-up and signal-driven, whereas in production it is predominantly top-down and concept-driven. Nevertheless, comprehension is helped by top-down predictions. There is room for predictions because of incremental processing. Long Term Memory - Conceptual memory - Syntax - Lexicon and morphology - Phonetics and phonology - (Hand gestures) - (Language membership; multilinguals) Linguistic representations in Long Term Memory Phonetics Focusses on the characteristics of raw speech sounds (phones). It is a discipline concerned with how people perceive these sounds (acoustic phonetics) and how they produce them (articulatory phonetics). Phonology Considers which abstract categories of sounds must be discerned in the sound repertoires of different languages. Abstract sound categories are referred to as phonemes. When two words differ in one phoneme ( a so-called ‘minimal pair’), this phoneme makes a difference to their meaning. E.g. ‘bad – pad’, in which /b/ and /p/ are different phonemes that change word meaning. Different sounds within a phoneme category are called allophones. - The difference between the sound /p/ at the beginning or end of a word; or sound differences occurring because the speaker has the flu. - They are not phoneme changes and do not change meaning. Phonemes can be (combined into) syllables. Syllables consist of vowels (V) and/or consonants (C). The V syllable /a/ consists of just one phoneme; the CVC syllable /baed/ is a word consisting of three phonemes. It is said to have an onset /b/, a nucleus /ae/, and a coda /d/. Its rhyme is /aed/. Lexicology Lexicology studies different properties of words, word representations. The mental lexicon contains all words a language user knows and specifies all these different properties: - Phonetic and phonological: about the sounds and phonemes a word is built of - Orthographic: about the letters and graphemes that make up a written or printed word - Morphological: which meaningful smaller parts can be discerned in a word - Semantic: what a word means - Syntactic: what the syntactic category of a word is, what gender, etc. - Articulation: how a word is pronounced (cf. articulatory phonetics) - Motor: what writing or typing movements are required to produce a word on paper A word’s lexical properties are significant determinants of how fast the word can be looked up in Long Term Memory. The most important properties of words are their duration or length, the frequency with which they are used, the diversity of contexts in which this happens, and the specific combinations of sounds and letters that the words consist of. How many other words there are similar to the target word, and how frequent these are. Morphology How complex words can be built on the basis of more simple elements in different languages. - E.g. ‘book’ can become ‘bookshop’ or ‘books’, ‘booking’, etc. Syntax Syntax describes the coherent system of syntactic structures (grammar) both in a language and across languages. The syntactic structure of a sentence can have consequences for its meaning. - Sentence structure, word order, word class. - E.g. ‘boy sees shark’ has a different meaning than ‘shark sees boy’. Semantics The meaning of sentences, words, and texts. Instead of semantic, psycholinguists often use the term conceptual. Pragmatics How meaning depends on the context in which utterances occur. Semantics represents the meaning from this utterance, while pragmatics notes its intention. Cognitive control and working memory - Working memory (< 1 min). - Words are briefly stored here. Because different levels of processing are involved simultaneously in language, ongoing computations must be monitored (checked), controlled, and updated. There is a dedicated Cognitive Control System for this: - Decide/choose between relevant or selected alternatives (e.g. responses) - Suppress unwanted alternatives (e.g. other responses) - Switch or shift to the most relevant task aspects - Update information in Working Memory Language User Framework and the brain Language processing makes use of substantial parts of the brain, including areas located in the frontal, temporal, parietal and occipital loves. Components of the Language User Framework do not one-to-one map onto brain regions. Rather, networks of brain regions dynamically interact to allow for efficient and successful communication. - E.g., word reading involves several different subtasks: finding the letters of a word, locating the word form in Long Term Memory, activating sound forms an d meaning, etc. Language Research Techniques On-line and off-line On-line On-line tasks: measuring mental (on neurophysiological) processes as they occur. - E.g. eye movements while reading sentences with typical agent and patient roles (the cheese was eaten by the mouse) vs. atypical agent and patient roles (the mouse was eaten by the cheese). On-line behavioral reaction time and neuroscientific studies can be used to investigate how language users retrieve words from their lexicon, following different stages, triggered by an incoming signal. Off-line Off-line tasks: measuring the content of (long-term) memory. - E.g. studying 40 action verbs items in an artificial language (vimmi) with left- or right- handed pictures and testing how many participants remember after 1 hour and again after 1 week. Off-line memory studies (word recognition and word recall) are used to gain insights about how words are represented in our mental lexicon and how the lexicon develops during acquisition. Memory - Word recognition: participants first learn words and are later asked to recognize items, which they might have learned before or not, from a presented stimulus list. - Word recall: participants are asked to recall (produce) as many words as possible from a set of words they have learned before. o Free recall: list as many action verbs in Vimmi as you remember o Cued recall: the English translation of luko is…. On-line behavioural techniques Lexical decision In lexical decision, participants must decide as soon and as accurately as possible if a presented item is an existing word or not. In visual lexicon decision, the stimulus is presented in a printed form, in auditory lexical decision in spoken form. Some factors that determine reaction time (RT): - The age at which you first acquired the word (younger is faster) - The frequency of the word (more frequent is faster) - The length of the word (shorter is faster) o These three are highly correlated: words that you learn at a young age tend to be frequent, and frequent words tend to be short. Neighbourhood density: the more words that look alike exist in the language, the slower the RT (‘bit’ has many neighbours [bat, but, bot, big, bid, etc.], quiz only has a few [ quit, quip, quill, …?]). Imageability: the easier it is to evoke a mental image, the faster the RT (‘apple’ vs. ‘truth’). Orthographic regularity: the more regular the spelling, the faster the RT (‘mint’, ‘hint’ ‘lint’ vs. ‘pint’). In progressive demasking, the presentation of a target word is alternated with that of a mask. During this alternation process, the target item is presented for a longer and longer time, while that of the mask decreases. Participant are asked to push the button as soon as they identify the target word, and then to type it in. Priming Lexical decision has often been combined with a priming technique. The target item is then preceded by a word or sentence in the same or a different modality (e.g., auditory sentence followed by a visual target word) that has a particular relation to that target. - E.g. the target word ‘nurse’ may be preceded by the semantically related word ‘doctor’. - This is called semantic priming. Cross-modal priming: the prime is presented in a different modality than the target. Masked priming: in priming with a forward mask, the prime word is preceded by a mask and then presented for a very short time. Backward masks (following the prime) are also sometimes used. Word association: related but off-line task. People are presented with a cue word (e.g., shark) and are asked to give as many associations they can come up with (e.g., animal, blood, fish, …). Why? Lexical decision has been used to study… - How many words people actually know (including non-words to prevent cheating). - How the mental lexicon is organized. o Including: if this changes during aging, if different languages influence each other in bilinguals, etc. - How written word recognition works. o Including: how we cope with irregularly spelled words (e.g. yacht). - How sentence context guides the interpretation of ambiguous words o E.g. we went down to the river and sat at the bank for a while. Word naming Word naming: participants read aloud visually presented words or pseudowords (i.e., pronounceable letter strings that are not words) as quickly and accurately as they can. - This removes the metalinguistic component (in lexical decision) and corresponds more closely to real-life linguistic behaviour. Record the response with a microphone and determine the exact moment at which the participant starts saying the word. - This is not as trivial as it may seem! What if a participant says: o ‘Uhhh lemon?’ Or ‘melon uhmm lemon?’ o Due to the way we produce speech sounds, certain first letters can be pronounced faster than others, regardless of how long it took us to read or recognize the word they occur in. In order to name a word, the orthographic representation derived from the printed letter string must be converted into a phonological-phonetic representation that can be used for articulation. This could be done in two different ways: - Lexical route: look up the orthographic representation of the word (e.g., lemon) in the mental lexicon directly. The spoken word form could be derived from this representation or from its associated meaning (lemon is pronounced a particular way). - Sublexical route: the letters of a word (e.g., l-e-m-o-n) could first be recoded into individual sounds. This process has been called grapheme-to-phoneme conversion. Assembling the different sounds into a whole word again would then indirectly result in the retrieval of the word’s phonological representation. Picture naming To name a presented picture correctly, the depicted object must first be identified, resulting in an activation of (parts of) semantic representation or concept. Next, the associated object name is looked up and the spoken word form can be retrieved and produced. Going from semantics to phonology could be a similar process in word naming and picture naming. However, in word naming there are also active lexical and sublexical routes from its orthographic representation to its phonological representation. Self-paced reading Participants are presented with a word (or a phrase) and press a button as soon as they have read it. The time it takes to press the button is a reflection of word and sentence properties. There are also other factors contributing to the response, they may develop a response rhythm. A. While Susan was dressing herself the baby played on the floor. B. While Susan was dressing the baby played on the floor. played on the floor = called ‘disambiguating region’ because it helps resolve the temporary ambiguity in sentence B. Eye tracking Eye tracking: an eye tracker device uses illuminators that throw a pattern of near-infrared light on the eyes of the reader. Perceptual span: how many characters we can see. Saccades: eyes make little jumps from one word to a later one. Fixation duration: the time the eye rests on a particular word. Gaze duration: the total of the first and later fixations. Regressions: eyes move back to earlier positions in a sentence. - These backward jumps probably occur when not all is clear. The reader’s behaviour can therefore be considered as incremental. Visual world paradigm Listening to spoken instructions or descriptions while viewing objects (on a monitor or actual physical objects). This can also reveal predictions that participants make while listening. Virtual reality - Experimental control and ecological validity are usually thought of as ends of a continuum; to get a little more of the one, you need to lose some of the other. - Virtual reality allows us to treat these as two separate constructs and get the best of both worlds. - Allows recording of behavioural, eye-tracking, motion, and/or EEG data in a realistic environment, while keeping behaviour of virtual agents constant. Neurophysiological techniques fMRI fMRI detects changes in the degree of oxygen in the blood in different brain areas. When a particular cognitive activity must be carried out by the brain, the involved brain areas require more ‘energy’ in terms of oxygen. This induces an increased blood flow to the active area. - Very useful for finding which anatomical structures are involved in performing a particular task (e.g. reading, listening, picture naming, word repetition, etc.) - But also at a much more fine-grained level, e.g. understanding action verbs (kick) vs. abstract verbs (think). - Activity is usually compared with a baseline condition; choosing an appropriate baseline is not trivial. Frontal lobe: associated with planning, control, motor behaviour, and short-term memory. Parietal lobe: related to receiving and processing somatosensory information. Occipital lobe: processing visual information. Temporal lobe: processing auditory information, memory, and meaning. Limitations - BOLD response is always delayed with respect to brain activity: it tells you which regions were active approximately 4 seconds ago o Not enough precision to give insight into the timing of linguistic processes - Limits the range of tasks subjects can perform (e.g. no normal conversation) - Expensive - Not necessarily a limitation, but beware of reverse inferencing EEG EEG measures electrical activity arising in the brain. We can’t really ‘decode’ all the waveforms, but there are certain signature waveforms that show up consistently in response to certain types of events (e.g. an unexpected word). Waveforms that are time-locked to the presentation of a stimulus are called event-related potentials (ERPs). N400 Negative voltage peak about 400 ms after stimulus onset, larger when word is more unexpected. - A more negative N400 reflects more processing effort. For instance, pseudowords and nonwords. P600 Positive voltage peak about 600 ms after stimulus onset, when there is a syntactic violation. Strengths Very precise information about timing of language-related processes in the brain. - This is useful, because language processing is fast and occurs at small timescales. No overt response required (similar for fMRI): participants can read or listen to stimuli, without having to press buttons, answer questions, etc. Limitations Poor localizations: we only measure electric potentials t the scalp, not knowing (very exactly) where in the brain they originate. Experiments require many trials of the same/similar stimuli to achieve a clear averaged signal; the waveforms of single trials are extremely noisy. Recognizing Spoken Words Prelexical processing The Spoken Signal Recognizer segments the raw speech wave into acoustic-phonetic representations (phones) that can be turned into abstract representations (e.g., phonemes or allophones) serving as input to the Spoken Word Recognizer. We call this auditory preprocessing of segmentation and classification prelexical, because it takes place before whole words are available or before the mental lexicon in Long Term Memory is contacted. Following this stage, the Spoken Word Recognizer turns the phonemic representations of speech signal into larger units in order to look up the matching spoken words and their meanings in Long Term Memory. Such representations can be larger than phonemes and smaller than the word and are called sublexical. Spoken Word Recognizer must also integrate information of suprasegmental nature. Suprasegmental information includes various acoustic cues for prosodic structures that cover long stretches of speech, for instance, syllables, lexical stress patterns, and phrases. Note: the process of spoken (or ‘auditory’) word recognition is incremental: when we are listening, spoken word information comes in gradually over time. Categorization Every spoken language has a set of distinct sounds: - English: 26 letters, but about 40 different sounds! - Vowels and consonants - Phonemes: change the meaning of a word o /b/ and /p/ are different phonemes (‘back’ vs. ‘pack’) - Allophones: do not change the meaning of the word o /k/ in ‘kill’ vs. ‘cool’ vs. ‘skill’ - Different languages, different sound categories o Dutch ‘man’ (/man/) vs. ‘maan’ (/ma:n/, English ‘bad’ (/baed/ vs. ‘bed’ /bed/) Vowels When we produce vowels, there is a flow of air from the lungs to the mouth that passes the vocal cords in a unobstructed way. Vowels produce a lot of energy (they come out a relatively strong amplitude) that can be sustained or stable over a relatively long period of time. This is why we call vowels voiced. - Vowel height o Position of the tongue - Vowel backness o Part of the tongue (front, back, etc.) - Lip rounding - Tenseness Acoustic triangle: how closed or opened the mouth is when the vowel is produced, and whether the vowel is produced in the back or front of the mouth. Diphthongs: e.g., au, vowels transitioning into each other. Consonants Consonant sounds, in contrast, characteristically create some sort of constriction in the flow of air. In the production of plosives, for instance, b, d, g, p, t, and k, the air pressure is gradually built up and then suddenly released. In contrast, fricatives, for example, s, z, f, and v are produced with a partially closed articulatory channel, resulting in some friction. - Place of articulation - Manner of articulation - Voicedness Need to understand these concepts, not the exact terminology! Place of articulation Where in the mouth a sound is produced or which parts of the articulatory apparatus are used, where on the vocal track stops the air flow? - E.g., in the case of a ‘p’ sound, both lips are used; hence the sound is called bilabial. Manner of articulation Indicates how the sound is produced. - Stop (or ‘plosive’): blocking the sound, then letting go - Fricative: narrowing the flow of sound with your tongue - Affricate: combining an oral stop an a fricative (a little bit of both: plosive and fricative) - Liquid: letting the air flow over the side of your tongue - Glide: only mild obstruction Voice / voicedness - Voiced versus voiceless consonants o Bet versus pet - A voiced sound has vibrating vocal folds - All vowels are voiced by definition Voice Onset Time (VOT): The delay between opening of the vocal tract and vibration of the vocal cords. Nasality Whether during the production of a sound the nose cavity plays a role or not. Categorical perception With phonemes, a small change in VOT does not change your perception until that change crosses a boundary, this causes a large change in your perception. This is called categorical perception. The place of articulation of the sound goes in a physically continuous way, but the perception of the sound is relatively categorical. - For example, when you listen to sounds that fall between a /b/ and a /p/ (like in "bat" vs. "pat"), your brain doesn’t hear a blend—it hears either /b/ or /p/, with nothing in between. Even though the difference between the two sounds is small, you hear them as completely separate categories. - /b/ (as in "bat") has a short VOT, meaning the vocal cords start vibrating almost immediately after the lips release the sound. - /p/ (as in "pat") has a longer VOT, meaning there’s a slight delay before the vocal cords start vibrating. Ganong effect Lexical context affects perception of (ambiguous) speech sounds. Specifically, when listeners hear a sound that is unclear or falls between two categories (like a sound between /t/ and /d/), they are more likely to interpret it as the sound that creates a real word in their language, rather than a non-word. - If someone hears a sound between /t/ and /d/ in the context of “_ask,” they are more likely to hear it as /t/ (making “task”) because "task" is a real word, while "dask" is not. Similarly, if the ambiguous sound occurs in "_ash," people might hear it as /d/ (making "dash") because "dash" is a real word, while "tash" is not. Phonemic restoration When listeners hear sentences like ‘they saw the *un shining on the beach’ in which the sound /s/ has been replaced by a cough, they often do not notice the absence of the /s/ sound and even have problems indicating where in the sentence the cough was presented. Thus, they easily restore the missing phoneme. - The sounds we perceive are not necessarily the sounds that are there; information from the word level affects perception. Co-articulation Speech sounds overlap in time. You are pronouncing words in a flow, not letter by letter! In connected speech, any speech gesture is affected by its preceding and following gesture. - Not only within, but alco across words (‘sandhi’): o Lean bacon -> leam bacon o Take that -> tage that - Creates a smooth speech signal. - Creates redundancy: individual segments provide clues about preceding and following segments. o Evidence from (artificial) ‘silent center vowels’: when you edit the /ae/ out of /baeg/, the result /b_g/ still sound like ‘bag’, not like ‘bug’ or ‘big’). - Creates variability: the mapping between phones (acoustic signals) and phonemes (abstract mental categories) is many-to-one. Challenges for the spoken signal recognizer Speech is variable - Same meaning can be represented by different sounds, depending on person, timing, and situation. ➔ Many-to-one mapping Speech is continuous - No pauses between words (the spaces we use in writing often aren’t there, or in different places). - Nopausesbe tweenwords butsometimeswi thinwords ➔ Continuous-to-discrete mapping Speech is ambiguous - The speech signal may be consistent with multiple meaning; lexical embeddings are frequent. - ‘to recognize speech’ vs. ‘to wreck a nice beach’ ➔ One-to-many mapping Models of spoken word recognition Cohort and TRACE Both are models of recognition of word forms: matching the spoken/written input to an existing entry in the mental lexicon. These models do not explicitly deal with activation of word meaning. Both models are explicitly incremental. Cohort: words are processed from left to right: - Perceiving input activates a (potentially very large) set of candidates: a cohort. - These candidates are activated in parallel, at very little cognitive cost. - As the input unfolds, candidates that are not consistent with the new input are ruled out. Cohort Hearing the first phoneme of a word triggers parallel activation of all words starting with that phoneme in the mental lexicon. This set of initially activated words is called the word initial cohort. (Cohort 1: all or nothing; Cohort 2: graded by frequency). The exact moment of recognition would depend on a complex set of factors: - Physical properties (word duration, stimulus quality) - Intrinsic lexical properties of the word (word frequency) - The number of other words similar to the target (the so-called cohort member of competitors) - The efficiency of the selection process Uniqueness point (UP): the moment when enough phonetic information has been received to uniquely identify the word. At this point, the word no longer matches any other potential word candidates in the mental lexicon. It signals when the listener can confidently identify the word even before hearing the entire word. Recognition point (RP): the moment when the listener successfully recognizes or identifies the word. Nonword point (NWP): when the item’s cohort no longer contains any candidates. TRACE The TRACE model is a localist and symbolic connectionist model for spoken word recognition. It is called ‘localist’ because in its network each node represents a specific symbol or concept. Based on a (small) neural network that learns mappings between phonetic features, phonemes, and words. Activation based on overall similarity between input and word form in long-term memory rather than exact match at beginning of a word. TRACE allows for top-down influences, meaning that context and word expectations can help shape perception. Empirical studies Visual world paradigm Participants looked at visual displays on a computer screen while listening to spoken instructions, e.g. ‘pick up the beaker; now put it below the diamond’. Phoneme monitoring The task of listening for a particular target phoneme and pressing a button as soon as it is detected. Does it matter where in the word this phoneme occurs, and whether it is a real word or a pseudoword? - facilitation after uniqueness point in real words, hinting at feedback from lexical level. First, you could check a prelexical (phonetic) code, based on a direct analysis of the speech signal. Second, you could check whether there is a /p/ in the stored lexical representation of ‘telescope’. This would be a postlexical (phonemic) code. Now suppose someone asks whether there is a p-sound in the word ‘paradise’. In this case, the prelexical code would be available much earlier, because the /p/ is situated before the UP at which the lexical code can be determined with certainty. Thus, the prelexical code becomes available already before the word is recognized, but the postlexical code only after it is recognized. The role of context in word recognition - Autonomous models: word recognition is strictly bottom-up. - Interactive models: contextual knowledge can affect word recognition, but only after an initial set of candidates has been set up based on the bottom-up input. o I took the new car for a spin o Sentence contexts helps select between spill, spider, spin, etc. faster - Predictive pre-activation models: word candidates can be predicted based on context, even before word has started. o I took the new car for a spin o Sentence context boosts activation of spin Visual context The larger visual context in which a spoken word (in a sentence) is encountered influences how fluently a listener processes that word. Not only does the non-linguistic, visual context influence spoken word processing, but also preceding sentence context plays a role. Mental model of the speaker Listeners will typically construe a mental model of the speaker they are listening to. To what extent can such a mental model of the speaker influence spoken word recognition in the listener? The mental model we have of a speaker will influence the mapping of incoming spoken word forms onto meaning representations. Embodiedness The speaker’s body is important as a source of information for the listener. Both auditory and visual types of information contribute to what words we hear. Evidence from cross-modal priming - Neutral context: ‘They mourned the loss of their…’ - Biasing context: ‘With dampened spirit, the men stood around the grave. They mourned the loss of their…’ - Schip of kapitein? Recognizing printed and written words Writing systems and scripts Writing system: a set of scripts - Logographic (symbols map onto units of meaning, e.g. morphemes) - Syllabic (symbols map onto syllables) - Alphabetic (symbols map onto individual phonemes) Script: a system for writing language - Logographies: Chinese characters (incorporated in Japanese as Kanji, in Korean as Hanja) - Alphabets: English alphabet, Cyrillic alphabet, Thai alphabet, etc. Many modern scripts incorporate features of multiple writing systems - Numerals (1, 2 500) used across languages are in fact logographs - Chinese characters may contain markings that indicate their pronunciation - Japanese is a mixture of syllabic (katana, hiragana) and logographic Orthographies Shallow orthography: writing systems with a close one-to-one relationship between sounds and letters, one-to-many. E.g. in Finnish, one phoneme is represented by just one letter unit or ‘grapheme’. Deep orthography: relationships between letters and phonemes are (at times) irregular, opaque, many-to-many. E.g., English, -ough depends on preceding letters: tough, though, bough. Italian: a few exceptions, e.g., one-to-many: c -> /k/ or /t̠ʃ/ (concerto) Dutch: more exceptions, e.g., one-to-many: c -> /k/ or /s/ (concert), many-to-one: au, ou -> /au/; ei, ij -> /ɛi/, final devoicing: d at end of word -> /t/ (David) English: chaos! tough, bough, though, yacht, aisle, colonel A grapheme is not a letter. Grapheme is a abstract representation corresponding to a letter. It covers all different instances of a letter. Phones are actual speech sounds. Representations Letter features and letters Print consists of different font types that come in different sizes, boldness, and italics. Also in the visual modality we encounter a problem of variability. This problem is less severe for printed messages than for handwritten messages, which vary not only across language users but also within an individual. Morphemes In addition to features and letters, words can be segmented into somewhat larger units such as syllables and morphemes. The term morpheme refers to the smallest meaningful unit of a word. E.g. ‘tables’ contains an -s morpheme (here the -s makes the meaning plural). The majority of the words we encounter consist of two or more morphemes! Some common morphological operations: - Inflection: adding markers to indicate number, tense, aspect, etc. (verbs) or number (nouns) without changing the syntactic category of the word. o Talk -> talk-s, talk-ed, talk-ing - Derivation: changing the syntactic category of a word (noun-verb, verb-adjective, etc.) o Talk (V) -> talk-er (N); talk (V) -> talk-ative (A) - Compounding: combining stems to form a new word o Baby + talk -> babytalk; talk + show -> talkshow In the Printed sublexical representations we might represent the words as multiple morphemes. We might access those morphemes before we access the actual word. Are multimorphemic words stored and retrieved as one whole or in several pieces? Three theoretical possibilities 1. Full storage o All words, including multimorphemic ones, are stored as single units in the mental lexicon. 2. Full decomposition o All morphemes (un-talk-ative, talk-show) are stored and retrieved separately and then combined to form the full word. 3. Hybrid or dual-route o High-frequency words tend to be stored and retrieved as wholes (happiness), whereas low-frequency words tend to be decomposed (un-happi-er). o Regular inflections (e.g., -ed for past tense) are more likely to be decomposed than irregular ones (ran). Processes The visual word recognition process during reading starts as the incoming signal has been analyzed and abstractly represented. The Print and Written Signal Recognizer encode the raw scribbles on paper as actual letter representations (concrete printed symbols, called graphs or glyphs) that can be turned into abstract representations (graphemes) serving as input to the Word Recognizer. The Word Recognizer uses these grapheme representations and their combinations to look up the printed words in memory, where their meaning and other properties are found. Because printed words are clearly separated by blanks, the word segmentation process in the visual modality is relatively simple. One by one or parallel? Neighbourhood: the set of word categories that differ in only one letter position from a target word. For instance, neighbours of the word ‘wind’ are words such as find, wand, wild and wing. One possibility it that the different word candidates would be checked and searched in the mental lexicon in an order determined by their degree of activation. Depending on similarity to the input and the frequency of usage of the word. The word ‘find’ is more frequently used than ‘wand’ and ‘wind’ in everyday communication; and it may have been experienced in more contexts and thus have a higher contextual diversity or semantic diversity. However, another model proposed that many word candidates could be considered for recognition simultaneously. The activation of possible word candidates can be updated for all units in parallel when new information comes in. Models of visual word recognition Interactive Activation (IA) model The Interactive Activation Model (IA or IAC) is a localist connectionist network model with symbolic representations for letter features, letters, and words. The IA model assumes that the Signal Recognizer identifies possible letters in the input word by their visual features. The activated letter representations activate the words of which they are part and inhibit words they are not compatible with (e.g. ‘w’ activates ‘wind’, but not ‘find’ and not even ‘flower’, because the ‘w’ is in the wrong position). The IA model assumes parallel activation of word candidates. At the word level: active words compete: they reduce the activation of other activated words. This is called lateral inhibition. Each word has a ‘resting level’. If a unit has not been activated for a longer period of time, its activation level decreases (this is called decay). Bottom-up activation flow from letters to words, but also top-down feedback from words to letters. Word superiority effect Remember that phoneme monitoring is easier when the phoneme is part of a real vs. pseudoword. Similar for visual word recognition: letters (W) are recognized more quickly in words (WIND) than in random letter sequences (WNDF). A pseudoword superiority effect exists too: letters (W) are recognized more quickly in pronounceable pseudowords (WUND) than in unpronounceable pseudowords (WNDF). Explained by IA model as feedback from the word level to the letter level. Limitations The model considers only orthographic representations, but not phonological, semantic, or morphological representations. For instance, it fail to notice pronunciation differences in ‘pint’ and ‘mint’, their similarities in ‘bough’ and ‘cow’, and the presence of two morphemes in ‘baker’ (bake + r) but not in ‘corner’. The model also assumes absolute letter position coding. This implies that letters in words are assumed to be in a particular absolute position in the word. Spatial Coding Model The Spatial Coding model is a localist connectionist model that aims to explain the subprocesses involved in printed word recognition. It also assumes that there are distinct levels of representation for letter features, letters, and words, that these levels are connected, and that specific features, letters, and word are implemented as nodes in a large network. Readers need to: 1. Determine the identity of each letter and the order of the letters. 2. Look up word representations in the mental lexicon that match the input. 3. Select the best matching candidate. Relative position of letters (e.g., A before T) is more important than absolute position (e.g., A = letter 2). The model suggests that the reader encounters two sources of uncertainty during reading: - Letter position uncertainty: the reader is not completely sure about the relative position of the letter in the word relative to where their eyes fixate. Letter position uncertainty is greater the further away from the reader’s fixation letters are. - Letter identity uncertainty: depends on the degree of perceptual evidence in favour of a particular letter at a certain moment in time. The more certain readers are about the presence of a letter, the more they will activate or ‘excite’ that letter. Dual Route Cascaded model The Dual Route Cascaded model incorporates several important insights with respect to visual word recognition and word naming. Just as the other two models, it is assumed that in both alphabetic and non-alphabetic scripts, a direct orthographic route into the mental lexicon is possible. Direct route - Input letter string activates letters and orthographic word form that is then addressed in the mental lexicon. - The word’s phonological form is then retrieved from the mental lexicon, allowing for its pronunciation. Indirect route - Input letters and letter combinations activate their corresponding phonemes, which are then ‘assembled’ into a phonological representation of the word. - Assemblage takes time, making this a slower route. Third route There is actually a third route that could be followed from input to articulation: from the printed (orthographic) representation of the word to its spoken (phonological) representation via word meaning (semantics). 2nd route faster than 1st one when we see a word that we don’t know yet, low frequency word. - Activity spreads across both routes at the same time. - For high-frequency words, route 1 is usually fastest, therefore: o Irregular spelled high-frequent words (two, who, friend) can be named quickly. - For low-frequent words, route 2 may be faster, therefore: o Regularly spelled words (intoxication, splurge) are named somewhat slower. o Irregularly spelled words (yacht, aisle, quay) may lead to conflicting information from routes 1 and 2, causing even longer naming times. - When learning to read as a child or learning a foreign language as an adult, most written word forms are (subjectively) low-frequency, therefore extensive use of route 2. Evidence of activation of phonology in reading: - Masked priming studies: when prime presented for a very short period (14-29) orthographic priming observed (mert – MERE), but no phonological priming (mair – MERE). - When prime presented for a longer period (43 ms), orthographic and phonological priming observed (mert – MERE and mair – MERE). Evidence of activation of orthography during listening: - Phoneme monitoring (e.g., /k) in spoken stimuli faster when carrier word uses primary spelling (e.g., Dutch word ‘paprika’) than secondary spelling (e.g., Dutch word ‘replica’) Multilink model Developed to account for word recognition in multilingual populations. The model assumes that words from different languages are stored together in an integrated mental lexicon. Similarity between input and stored representations computed on the basis of Levenshtein distance. This is the number of operations (deletion, addition or substitution) that must be performed to transcode one letter string in another: - E.g. to change Dutch word ‘tomaat’ into English ‘tomato’ it requires removal of one ‘a’ and addition of an ‘o’. Levehnstein distance of 2 = 2 operations needed. Cognates: translation equivalents with form overlap (e.g. ‘work’ and ‘werk’ have approximately the same meaning). Ambiguity Homophones: words that sound the same (when spoken) but have separate, non-overlapping meanings. - See-sea, know-no, main-mane Homographs: words that have the same spelling (and sometimes pronunciation), but separate, non-overlapping meanings. - Bow, down, content Polysemous words: words that have multiple different but related meanings. - Wood (noun) – material that trees are made of - Wood (noun) – an area covered with trees - Film (noun) – a movie - Film (noun) – carrier of photographic images - Film (verb) – to record moving images of something Initially, all meanings for a given word are activated, but irrelevant meanings are quickly suppressed. We know this from cross-modal priming tasks. - But it is actually more complicated, depending on: o How dominant one lexical meaning is relative to another (pitcher vs. mint). o How strongly the context is biased toward one of the meanings. Equibiased: both meanings equally dominant (pitcher). Non-equibiased: one meaning more dominant than the other (mint).

Psychology of Language Summary Midterm PDF

Document Details

Tags

Related

Summary

Full Transcript