G2 Perception of Language PDF

PERCEPTION of LANGUAGE GROUP 2 PSYCHOLOGY OF LANGUAGE PERCEPTION OF LANGUANGE: DEFINITION AND SCOPE Language perception encompasses the processes by which humans recognize, interpret, and understand spoken and written language. It involves isolating and segmenting words, phrases, and longer units, and attributing meaning to them. CONTENT Perception of Perception of The structure Isolated Speech Continous speech of speech Segments Prosodic Levels of speech Prosodic Factors Factors processing in speech Articulatory Speech as recognition Phonetics modular System Semantics and Acoustic The motor Syntactic Phonetics theory of Factors in speech speech perception THE STRUCTURE OF SPEECH/ A LINGUISTIC LEGO Complexity of Speech Perception: At first glance, speech perception seems straightforward: listeners categorize sounds into classes that exist in their language. However, this task is extraordinarily complex due to two major reasons. THE STRUCTURE OF SPEECH: A LINGUISTIC LEGO ENVIRONMENTAL INTERFERENCE VARIABILITY OF SPEECH SIGNAL The environmental context often interferes Even in ideal environmental conditions, the with the speech signal. perception of speech presents another major Under normal listening conditions, speech problem: the variability of the speech signal itself. There is no one-to-one correspondence between competes with other stimuli for our limited the characteristics of the acoustic stimulus and processing capacity. the speech sound we hear. Other auditory signals, such as conversations Several factors influence or distort the acoustic across the room or someone sneezing or stimulus that reaches our ears, including: burping, can interfere with the fidelity of the 1. The voice of the speaker (e.g., high versus low speech signal. pitch). Visual signals can also serve as sources of 2. The rate at which the speaker is producing distraction. speech. 3. The phonetic context. THE STRUCTURE OF SPEECH/ A LINGUISTIC LEGO Achieving Stable Phonetic Perception: Despite the competition from other stimuli and inherent variability, we achieve stable phonetic perception. The ease with which we recognize phonetic segments suggests that listeners make a series of adjustments during perceptual recognition. Some of these adjustments are based on implicit knowledge of how speech sounds are produced. prosodic factors: the melody of meaning Ferreira (2003) defines prosody as “a general term that refers to the aspects of an utterance’s sound that are not specific to the words themselves.” Prosodic factors influence the overall meaning of an utterance, allowing us to change the stress or intonational pattern to create entirely different meanings. prosodic factors: the melody of meaning Prosodic factors are sometimes called suprasegmentals, meaning they lie above speech segments (phones) and provide a musical accompaniment to speech. The same word or sentence can be expressed prosodically in different ways, serving as important cues to the speaker’s meaning and emotional state. PROSODIC FACTORS: STRESS Stress refers to the emphasis given to syllables in a sentence and corresponds closely with loudness. For example, in one pronunciation of “chimpanzee,” -zee receives the greatest stress, -pan the least, and chimp is intermediate (Ferreira, 2003). A sentence may have a different meaning once the stress is shifted to another word. PROSODIC FACTORS: INTONATION Intonation involves the use of pitch to signify different meanings; the pitch pattern of a sentence is called its intonational contour. An example from men’s restrooms: “We aim to please. You aim too, please” (Fromkin & Rodman, 1974). In English, intonation rises at the end of yes/no questions (e.g., “Are you coming?”) but not wh- questions (e.g., “Who is coming?”) or declarative sentences (e.g., “I am coming”). PROSODIC FACTORS: RATE Rate refers to the speed at which speech is articulated, modified by altering the number and length of pauses and the time spent articulating speech segments. The rate of speech can convey meaning, as seen in the difference between “Take your time” and “We’ve got to get going!” (Bolinger, 1975). The rate at which individual words are produced can vary with their syntactic role in a sentence. For example, “walk” in “Bill wants to walk but Mary wants to drive” is longer than in “Bill wants to walk to the store” (Ferreira, 2003) ARTICULATORY PHONETICS: THE SPEECH SCULPTOR The study of speech sounds is called phonetics, and the more specific study of the pronunciation of speech sounds is called articulatory phonetics. All sounds of a language can be described in terms of the movements of the physical structures of the vocal tract. Speech sounds differ mainly in whether the airflow is obstructed and, if so, at what point and in what way. VOICING A distinction among consonants concerns whether the vocal cords are together or separated when lung air travels over them. The opening between the vocal cords is called the glottis. If the cords are together, the airstream must force its way through the glottis, causing the vocal cords to vibrate, resulting in a voiced speech sound, as in [z]. If the cords are separated, the air is not obstructed, resulting in a voiceless sound, as in [s]. PLACE AND MANNER OF ARTICULATION refers to the specific physical locations and ways in which speech sounds are produced in the vocal tract. Place of articulation identifies where in the vocal tract the airflow restriction occurs, such as the lips, tongue, or throat. Manner of articulation refers to how airflow is restricted or modified as it passes through the vocal tract during the production of speech sounds. Articulation and Phonetic Segments Describing speech sounds in terms of their articulation suggests it might be possible to describe the entire inventory of phonetic segments by constituent features based on their mode of production. The utility of distinctive features lies in their ability to economically describe the relationships among various speech sounds. Acoustic phonetics Acoustic phonetics is the study of the physical properties of speech and aims to analyze sound wave signals that occur within speech through varying frequencies, amplitudes and durations. One way we can analyze the acoustic properties of speech sounds is through looking at a waveform. Spectrograms- One of the most common ways of describing the acoustical energy of speech sounds is called a sound spectrogram. It is produced by presenting a sample of speech to a device known as a sound spectrograph, which consists of a set of filters that analyze the sound and then project it onto a moving belt of phosphor, producing the spectrogram. Each of the spectrograms contains a series of dark bands, called formants, at various frequency levels. SOUND SPECTOGRAM Acoustic phonetics Two aspects of formants have been found to be important in speech perception. Formant transitions are the large rises or drops in formant frequency that occur over short durations of time. In card, the first formant is rising and the second one falling in frequency near the end of the word. These transitions nearly always occur either at the beginning or the end of a syllable. In between is the formant’s steady state, during which formant frequency is relatively stable. It is a bit oversimplified but basically correct to say that the transitions correspond to the consonantal portion of the syllable, and the steady state to the vowel Acoustic phonetics Parallel Transmission- We are now in a position to examine some of the acoustic properties of the speech signal. One, called parallel transmission, \ refers to the fact that different phonemes of the same syllable are encoded into the speech signal simultaneously. There is no sharp physical break between adjacent sounds in a syllable. Acoustic phonetics Context-conditioned Variation A related characteristic, context-conditioned variation, describes the phenomenon that the exact spectrographic appearance of a given phone is related to (or conditioned by) the speech context. The clearest example is the way that the spectrogram of a consonant is conditioned by the following vowel. \ The phenomenon of producing more than one speech sound at a given time is called coarticulation; it reveals the important point that production, like the physical signal that results from it, tends to vary with the phonetic context. PERCEPTION OF ISOLATED SPEECH SEGMENTS Levels of Speech Processing We may roughly distinguish the process of speech perception into three levels. auditory level- , the signal is represented in terms of its frequency, intensity, and temporal attributes (as, for example, shown on a spectrogram), as\ with any auditory stimulus. phonetic level- we identify individual phones by a combination of acoustic cues, such as formant transitions. phonological level- the phonetic segment is converted into a phoneme, and phonological rules are applied to the sound sequence. These levels may be construed as successive discriminations that we apply to the speech signal. speech as a modular system Fodor (1983) defines a modular system in cognitive psychology as one that is domain-specific (that is, if it is dedicated to speech processing but not, say, to vision), mandatory, fast, and unaffected by feedback. lack of invariance The "lack of invariance" problem refers to the complexity in speech perception due to the absence of a one-to-one relationship between acoustic signals and how they are perceived. Speech sounds vary depending on the context, but the brain still understands them due to its flexible processing abilities. For example, sounds like [m] and [n] are distinguished by their acoustic properties, but vowel transitions also play a role in perceiving these differences categorical perception in speech Research shows that when speech sounds are close, we tend to group them into the same category. The brain categorizes speech sounds, like [p] and [b], into distinct groups, helping us quickly identify them despite their physical similarities. Voice Onset Time (VOT) Voice onset time (VOT) is a key factor in distinguishing between voiced and voiceless consonants, such as [ba] and [pa]. VOT refers to the time delay between when a sound is released and when the vocal cords begin to vibrate. Consonant vs. Vowel Perception Vowels are processed continuously, while consonants are processed categorically due to differences in their acoustic characteristics. Memory and Categorical Perception Memory plays a bigger role in vowel perception than consonant perception because vowels leave a stronger auditory "echo." the motor theory of speech Developed by Liberman and colleagues (1967). Motor Theory: A theory linking speech perception to speech production mechanisms. Speech Perception: The process by which spoken language is heard, interpreted, and understood. Articulatory Knowledge: Implicit understanding of how speech sounds are physically produced. rationale behind the motor theory Aims to explain the lack of invariance in speech perception. Argues that the relationship between articulation and perception is more direct than acoustic perception alone. Economy of Shared Mechanisms: The idea that the brain uses overlapping systems for both producing and perceiving speech. Lack of Invariance: The challenge that speech sounds do not always have the same acoustic properties across different contexts. evidence supporting the motor theory Anecdotal Evidence observations from language learning experiences that suggest articulating sounds improves listening ability. McGurk and MacDonald Theory a phenomenon where conflicting visual and auditory speech cues create a new perceived sound. mcgurk and macdonald (1976) Visual Cues Information gained from watching the speaker’s mouth movements. Auditory Cues Information from hearing the actual spoken sound. Perceptual Integration The process by which the brain combines visual and auditory cues into a single perception. criticism of the motor theory Infant Perception Studies research showing that infants can distinguish phonetic contrasts without being able to produce them Contextual Variation in Articulation the idea that articulatory movements are not always consistent across contexts. revised motor theory Phonetic Gestures the specific physical movements involved in producing speech sounds, such as lip rounding or jaw movement. Phonetic Module a specialized brain system that quickly converts acoustic signals into phonetic gestures for perception. Ongoing Criticisms and Abstractness Abstractness of Phonetic Gestures the idea that intended phonetic gestures are too abstract and not directly testable. Mixed Results in Testing early tests on the theory did not consistently support its predictions. Implications for Brain Implications for Mechanisms Language Acquisition Neurological Link Between Early Sensitivity to Phonetic Perception and Production the Gestures infants may be idea that areas of the brain sensitive to phonetic responsible for speech gestures even before they production and perception are learn to speak. closely related. Phonetic Module in Language Support from Ojemann (1983) Acquisition a brain research suggesting that brain mechanism that links regions involved in speech perception and production production are also activated could aid in learning during speech perception. languages. Perception of continuous speech PERCEPTION OF CONTINUOUS SPEECH Until now we have dealt with the convenient fiction of the speech sound in iso- lation. Under normal listening conditions, however, speech sounds are embedded in a context of fluent speech. Because we know that the acoustic structure of a speech sound varies with its immediate phonetic context, it seems likely that broader aspects of context, such as adjacent syllables and clauses, may play a significant role in our identification of speech. Prosodic factors in speech recognition There is little doubt that prosodic factors such as stress, intonation, and rate influence the perception of speech. They provide a source of stability in perception because we can often hear these superimposed qualities at a distance that would tax our ability to identify individual speech segments. Prosodic factors in speech recognition Stress It seems that we perceive stress in speech by using a combination of acoustic cues and our knowledge of language stress rules Acoustic cues such as intensity, pitch, and duration play a role in distinguishing stress in words For example, the loudness of syllables can help us differentiate between meanings, as shown with the word "blackbird" The rate of syllable production also affects perceived stress, as demonstrated by brief pauses in words like "light house keeper" Research suggests that listeners use stress patterns in speech to anticipate upcoming information Experimental studies have shown that listeners detect speech segments faster in stressed syllables compared to unstressed ones, indicating our tendency to interpret continuous speech based on stress patterns Prosodic factors in speech recognition Rate Speakers adjust their production rates based on the number and length of pauses during speech, as well as the time spent articulating words Changes in speaking rates affect vowel duration and cue duration for consonantal distinctions, such as Voice Onset Time (VOT), which is crucial for distinguishing voiced and voiceless sounds VOT values vary with speech rate, with faster rates decreasing VOT values Additionally, speech rate affects perception of consonant sounds, with faster rates shifting the boundary between voiced and voiceless sounds towards smaller VOT values This adjustment based on speech rate is known as rate normalization Moreover, listeners also consider other factors like vocal tract size and pitch of the speaker's voice in perceiving speech sounds, known as speaker normalization These normalizations suggest that implicit articulatory knowledge plays a role in speech perception Semantic and syntactic factors in speech recognition Context and speech recognition As we have seen, a word isolated from its context becomes less intelligible (Pollack & Pickett, 1964). It follows that if we vary semantic and syntactic aspects of this context, then we should find changes in the perceptibility of the speech passage. The role of higher-order contextual factors in speech recognition has been convincingly demonstrated by George Miller and his associates. Miller, Heise, and Lichten (1951) presented words either in isolation or in five-word sentences in the presence of white noise (hissing sound). Performance was better in the sentence condition at all levels of noise. Apparently, listeners were able to use the syntactic and semantic constraints of continuous speech to limit the number of possibilities to consider. Semantic and syntactic factors in speech recognition Phonemic restoration A most dramatic demonstration of the role of top-down processing of speech signals comes from what is called phonemic restoration (Warren, 1970; Warren & Warren, 1970). The first /s/ in the word legislatures in sentence (4) was removed and replaced with a cough: (4) The state governors met with their respective legislatures convening in the capital city. This procedure led to a striking auditory illusion: Listeners reported hearing the excised /s/! In addition, when told that a sound was missing and asked to guess which one, nearly all listeners were unsuccessful. Restoration has also been found in a variation of the procedure in which a noise is added to but does not replace the speech sound (Samuel, 1981). Semantic and syntactic factors in speech recognition Phonemic restoration Subsequent studies have shown that it is the context that helps determine how phonemic restorations take place. When Warren and Warren (1970) pre- sented the following four sentences to listeners, they found that the restorations that were made were related to the subsequent context: *eel was heard as wheel, heel, peel, or meal, depending on the sentence. (5) It was found that the *eel was on the axle. (6) It was found that the *eel was on the shoe. (7) It was found that the *eel was on the orange. (8) It was found that the *eel was on the table. Semantic and syntactic factors in speech recognition Mispronunciation detection What happens when a perfectly ordinary sentence contains a minor phonetic error? It has been zuggested that students be required to preregister. Cole (1973) found that the likelihood of detection depends on the place in a word or sentence. Detection performance was better for mispronuncia- tions at the beginning of a word compared with those later in a word, and better earlier in a sentence than later on. Semantic and syntactic factors in speech recognition Mispronunciation detection Marslen-Wilson and Welsh (1978) extended these results by combining the mispronunciation detection task with a shadowing task. A shadowing task is one in which subjects have to repeat immediately what they hear. Marslen- Wilson and Welsh examined the conditions under which listeners would repeat a mispronounced sound exactly, as opposed to restoring the ‘‘intended’’ pronunciation Semantic and syntactic factors in speech recognition Mispronunciation detection - Restorations during speech were found to show greater fluency compared to exact repetitions, with less pausing observed - Restorations were more common in highly predictable contexts, while reproductions occurred in low predictability situations - The fluent nature of restorations suggests integration of semantic and syntactic constraints with incoming speech - This indicates that our immediate understanding of speech involves analyzing sounds while applying semantic and syntactic rules Thank you

G2 Perception of Language PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue