History of Speech Recognition

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What is the primary challenge in automatic speech recognition, as indicated in the provided content?

  • The inability to record human speech accurately.
  • The lack of funding for speech recognition research.
  • The limited processing power of modern computers.
  • The difficulty in converting continuous acoustic signals into discrete language units. (correct)

Why was the 'Radio Rex' toy able to respond?

  • It responded to a burst of acoustic energy at a specific frequency. (correct)
  • It used sophisticated speech recognition software for command recognition.
  • It employed complex algorithms to identify different sounds.
  • It used a series of levers and pulleys.

Which of the following is a key reason why automated speech recognition is difficult?

  • The limited acoustic properties of speech.
  • The consistent and direct mapping between abstract sound units and their physical expression.
  • The complex and variable relationship between abstract sound units and their physical manifestations. (correct)
  • The straightforward association of sounds with physical objects.

In the context of speech perception, what is perceptual invariance?

<p>The ability to consistently recognize variable acoustic inputs as the same sound category. (C)</p> Signup and view all the answers

What are allophones, in the context of phonemes?

<p>Variants of a phoneme that are part of the same abstract category. (D)</p> Signup and view all the answers

How does the brain handle the accurate sorting of variable sounds into correct categories?

<p>It amplifies certain acoustic differences while minimizing others. (C)</p> Signup and view all the answers

What is the voice onset time (VOT)?

<p>The period between articulatory release and vocal fold vibration. (C)</p> Signup and view all the answers

Mental categories impose sharp boundaries impacting speech sound perception by making one assume?

<p>All sounds within a phoneme category sound the same. (B)</p> Signup and view all the answers

What is the implication of the McGurk effect on speech perception?

<p>Visual and auditory information are integrated when interpreting speech. (D)</p> Signup and view all the answers

According to Uri Hasson and colleagues, what is the relationship between acoustic and abstract representation of sounds?

<p>Abstract sound representations can be separate from sensory inputs. (C)</p> Signup and view all the answers

Which brain area is involved in processing detailed acoustic information of speech sounds?

<p>Left superior temporal gyrus (STG). (A)</p> Signup and view all the answers

When comparing Standard British English to American English, what is an example of regional dialectical difference?

<p>Vowel production. (C)</p> Signup and view all the answers

In the context of speech perception, what does cue weighting refer to?

<p>Prioritizing certain acoustic cues over others. (A)</p> Signup and view all the answers

What is one reason V0T isn't a reliable cue?

<p>It’s too variable. (B)</p> Signup and view all the answers

The ability to be multilingual does not affect

<p>VOT. (D)</p> Signup and view all the answers

How does knowledge of surrounding sounds impact what you’re hearing?

<p>You work backward, applying your knowledge of how sounds 'shape-shift' to make out the sounds. (B)</p> Signup and view all the answers

When listeners are asked to heard /d/ or /t/ sounds in various word formations, how does a word frame affect the sounds straddling a category boundary?

<p>The word frame only had an effect on sounds that straddled the category boundary between a /t/ and /d/. (A)</p> Signup and view all the answers

In the studies conducted by Kuhl and Miller on the categorical perception, which test subjects were used?

<p>Chinchillas. (A)</p> Signup and view all the answers

What is the phoneme restoration effect?

<p>The illusion of hearing a missing speech sound when a non-speech sound replaced it. (B)</p> Signup and view all the answers

In speech perception, the tendency to split the difference due to conflicting cues and then perceive it as another sound is due to

<p>McGurk effect. (D)</p> Signup and view all the answers

What did the study run by Emily Myers and her colleagues uncover about the brain's process of abstract sounds?

<p>Neural regions responded to abstractly rather than acoustically representing the same phonemic category. (C)</p> Signup and view all the answers

What was a factor in why researchers chose to study the effects of musical training of 9-month-olds by Christina Zhao and Pat Kuhl?

<p>Babies of this age show a remarkable capacity for perceptual tuning. (C)</p> Signup and view all the answers

How do listeners adapt and generalize across accented talkers?

<p>Listeners adjust their categories based on the speech sample they'd heard, and, moreover, carried these adjustments accordingly. (D)</p> Signup and view all the answers

In the study of infant speech perception that used teethers by Bruderer et al, what was the purpose of the flat tether?

<p>Impeded tongue movement. (C)</p> Signup and view all the answers

What is a possible effect of an individual with greater use of burst durations that assist the release of articulators?

<p>Assigned it greater weight accordingly. (C)</p> Signup and view all the answers

According to Stasenko studies, what does brain damage to the inferior frontal gyrus, premotor cortex, and primary motor cortex lead to?

<p>Inconsistent toungue movements in simple sounds. (C)</p> Signup and view all the answers

How did Stasenko and colleagues determine that the stroke patient's issues stemmed from articulation challenges?

<p>He could not say whether to label the sound aba. (C)</p> Signup and view all the answers

For the most successful ventriloquist arts, what do they rely on us to do in the process of speech sounds?

<p>Hear what we expect to hear. (B)</p> Signup and view all the answers

As we develop during speech perception, which example relies the least on articulatory information?

<p>Voicing distinctions. (C)</p> Signup and view all the answers

In general, where are younger people more sensitive and skillful in, versus an older crowd?

<p>Quick dialogue. (C)</p> Signup and view all the answers

Which technique allows researchers to deliver electric current to targeted areas of the brain through the skull and observe the effects of this stimulation on behavioral performance?

<p>Transranial magnetic stimulation (B)</p> Signup and view all the answers

One of the benefits of the technique of TMS is?

<p>Disrupting articulators at a neural level. (C)</p> Signup and view all the answers

What best helps parents determine whether their child may have an ear dialect?

<p>Allowing and having a mix of genders and pitches of the range. (A)</p> Signup and view all the answers

In assessing patients with dyslexia, one test measures speech sounds to determine?

<p>Phonemic awareness. (C)</p> Signup and view all the answers

How does the performance of children with dyslexia differ than their neurotypical age-peers when performing a categorical perception of the sounds of speech?

<p>Children have greater sensitivity. (C)</p> Signup and view all the answers

What is true in regard to those with and without dyslexia?

<p>Those with dyslexia have lower success, they are more likely to have fewer reading ability compared to strong readers. (B)</p> Signup and view all the answers

What sets human speech recognition apart from simply detecting acoustic properties?

<p>The ability to detect and combine multiple acoustic dimensions. (C)</p> Signup and view all the answers

Why is chunking a stream of acoustic information into units of language complex?

<p>Because languages combine and recombine a small set of sound units in many ways. (C)</p> Signup and view all the answers

How does the act of typing bypass the complexity of recognizing language's basic units?

<p>Each keyboard symbol corresponds directly to an abstract unit. (A)</p> Signup and view all the answers

According to the provided text, what is the 'core problem' explored in the chapter?

<p>To translate ever-changing speech properties into stable sound sequences. (B)</p> Signup and view all the answers

What does speech perception require listeners to do beyond simply hearing sounds?

<p>Structure information while adapting to different accents and conditions. (D)</p> Signup and view all the answers

Why does this content compare spoken language to Easter eggs on a conveyor belt?

<p>To illustrate how sounds influence one another in speech. (C)</p> Signup and view all the answers

How does coarticulation affect speech perception?

<p>Coarticulation increases variability in sounds due to neighboring sounds. (B)</p> Signup and view all the answers

What is the challenge that perceptual invariance poses to identifying components of speech?

<p>Consistently mapping the variable acoustic input onto stable representation. (A)</p> Signup and view all the answers

Why is the cup and bowl example used?

<p>The cup and bowl example signifies how mental categories can shift perception. (A)</p> Signup and view all the answers

What is the role of the vocal folds in the production of voiced and voiceless sounds?

<p>Vocal folds influence vocal onset time. (B)</p> Signup and view all the answers

How does categorical perception affect the way sounds are perceived?

<p>Categorical perception creates discrete differences in the sound. (B)</p> Signup and view all the answers

What does research suggest about the perception of signed language?

<p>Some handshape contrasts have categorical perception, while some contrasts don't. (C)</p> Signup and view all the answers

In the variability of a 'U' or 'V' handshape experiments, why were participants more sensitive to the 'U' variation in handshapes versus the 'V' shape?

<p>The gap between fingers can more easily be perceived when the gap is small. (D)</p> Signup and view all the answers

What does the 'forced-choice identification task' reveal about speech perception?

<p>Identification tasks reveal how listeners categorize sounds without certainty. (A)</p> Signup and view all the answers

What does the study led by Emily Myers suggest about speech perception?

<p>Brain regions show sensitivity to different kinds of speech sound information. (C)</p> Signup and view all the answers

According to the study of differing accounts in scientific study, what does it provide when results conflict?

<p>This provides critical clues about the specific conditions under which a phenomenon can be observed. (B)</p> Signup and view all the answers

What does the ABX discrimination task accomplish that the forced-choice identification task does not?

<p>ABX tasks puts increased reliance on short-term memory. (D)</p> Signup and view all the answers

What does cue weighting explain about the process of speech recognition?

<p>Listeners prioritize certain acoustic cues over others. (B)</p> Signup and view all the answers

Why might it not be worth paying attention to the variability of a cue?

<p>The variability values between members of the same category are too wildly divergent. (C)</p> Signup and view all the answers

What musical aspects are used to help someone become sensitive to and zoom in or disregard others?

<p>Rhythmic and pitch. (D)</p> Signup and view all the answers

In the study of the music program and the infants, what element was added to make the waltz based musical rhythm more difficult?

<p>Involving 3 beats. (C)</p> Signup and view all the answers

What does the ability to recognize bananas under different lighting conditions suggest about speech perception?

<p>The ability to use contextual cues to interpret sound categories. (A)</p> Signup and view all the answers

What did the Ganong effect demonstrate about speech perception?

<p>Listeners often utilize context of words to understand the specific sound. (A)</p> Signup and view all the answers

What is the McGurk effect an example of?

<p>An example where visual information alters auditory perception. (D)</p> Signup and view all the answers

In the study by Hasson and colleagues, what did they find when some brain regions equate a ta-percept?

<p>Are so convinced by the McGurk illusion that they ignore auditory and visual sensory information. (D)</p> Signup and view all the answers

What did Alena Stasenko uncover about the patient who had had a stroke?

<p>His tongue movements were inconsistent while uttering simple sounds. (A)</p> Signup and view all the answers

What occurs for the listener in ventriloquism?

<p>Believing your eyes over your ears. (C)</p> Signup and view all the answers

What can influence language learning when learning how to use sounds from various speakers, according to research?

<p>Gender of the speaker matters to the acoustic cues. (B)</p> Signup and view all the answers

The study that was conducted by Xin Xie, what was revealed in their findings for English Speaking Listeners adapting to the mandarin accented speech?

<p>They came to rely heavily on burst duration for a queue when deciding whether a word ended in a /d/ or /t/. (A)</p> Signup and view all the answers

Why might having high variable pronunciation be confusing for language learners?

<p>Due to how can one possible develop a stable representation of a sound, if it's borders shift? (D)</p> Signup and view all the answers

In studying the multiple accents, what can exposure to this do to younger children?

<p>The multiple sounds can delay children's abilities to recognize words. (A)</p> Signup and view all the answers

At what stage have babies have shown trouble recognizing words that include a shift in the talker's style?

<p>7.5 months. (C)</p> Signup and view all the answers

What does a study by David Saltzman and Emily Myers show in their 2017 research?

<p>Listeners only adapted to their recent experience with some speaker. (C)</p> Signup and view all the answers

What do tests for categorical perception measure?

<p>That one test leads to others being a long time. (D)</p> Signup and view all the answers

According to Riikka Mottonen, what will occur if the motor representations play to speech?

<p>Should affect the ability to make this sound. (B)</p> Signup and view all the answers

What does one need to accomplish at all times for speech to be at its full capacity?

<p>One must have an ability to help articulate. (A)</p> Signup and view all the answers

What, specifically, did Alison Bruderer uncover about babies at the age of 6 months?

<p>The babies can distinguish between certain non-English consonants. (B)</p> Signup and view all the answers

According to research by Patti Adank in 2010 where was the study that was performed?

<p>Netherlands. (D)</p> Signup and view all the answers

What do studies of voice over the web and vision lead to about people who have more movement in general?

<p>They use pitch. (A)</p> Signup and view all the answers

According to research, what is known to have a strong hereditary and has a basis for likely playing a role?

<p>Dyslexia. (B)</p> Signup and view all the answers

Why may reading not come as natural at times, and can be easy to derail?

<p>That it may be part of the mental representation of sounds. (C)</p> Signup and view all the answers

Flashcards

Automated Speech Recognition

The process of converting spoken words into text using computers.

Truly understanding speech

Requires detecting and combining different acoustic dimensions.

Sound units

Abstract representations related to certain relevant acoustic properties of speech

Phoneme

The smallest unit of sound that changes the meaning of a word.

Signup and view all the flashcards

Allophones

Two or more similar sounds that are variants of the same phoneme.

Signup and view all the flashcards

Categorical Perception

Mental categories impose sharp boundaries on sounds we perceive.

Signup and view all the flashcards

Coarticulation

Variation in the pronunciation of a phoneme caused by neighboring sounds.

Signup and view all the flashcards

Perceptual Invariance

The ability to perceive sounds with variable acoustic manifestations as the same.

Signup and view all the flashcards

Cue Weighting

Prioritizing the acoustic cues that signal a sound distinction.

Signup and view all the flashcards

Context Effects

The process of using contextual cues to infer sound categories.

Signup and view all the flashcards

Phoneme Restoration Effect

A non-speech sound shares acoustic properties with a speech sound.

Signup and view all the flashcards

McGurk Effect

Mismatch between auditory and visual information alters sound perception.

Signup and view all the flashcards

Motor Theory of Speech Perception

The perception of words relies on accessing needed articulatory gestures.

Signup and view all the flashcards

Developmental Dyslexia

Difficulties in learning to read despite normal language.

Signup and view all the flashcards

Phonemic Awareness

The conscious ability to break down words to phonemes.

Signup and view all the flashcards

Signed languages

This says we all have two or more languages inside our brains

Signup and view all the flashcards

Forced-Choice Identification task

An experimental task where people categorize stimuli into two categories.

Signup and view all the flashcards

ABX Discrimination task

A test in which participants hear two different stimuli followed by a third that is identical to one of the first two.

Signup and view all the flashcards

Cross-Sectional studies

Studies that test and compare different groups at a single point in time.

Signup and view all the flashcards

Longitudinal studies

Studies in which the same group or multiple groups are studied over time, with comparisons made between different time points.

Signup and view all the flashcards

Study Notes

  • Creating machines to understand speech has been a lengthy process, demanding extensive research funding, time, ingenuity and raw computation.

Radio Rex

  • In 1922, Radio Rex was released, a toy bulldog that jumped from its house when "hearing" its name.
  • It was activated via a spring triggered by a 500-Hz acoustic energy burst, that corresponds to a vowel in "Rex".
  • Adult males could trigger the toy, while an 8-year-old girl would not, unless she sounded similar to the males.

Early Speech Recognition

  • Unlike Radio Rex, speech recognition needs to detect and combine acoustic dimensions.
  • In 1952, "Audrey", a room-sized computer, identified spoken numbers from zero to nine, with pauses and one speaker
  • After this the development of speech recognition advanced slowly, until the creation of smartphone applications.

Acoustic information

  • Speech recognition involves "chunking" acoustic streams into language units.
  • Languages form words by recombining a finite set of fundamental sound units, giving rise to vocabularies.
  • These sound units combine relevant acoustic speech features in a messy, complex way.
  • The relationship between sound abstractions and their physical appearances poses a challenge, making automated speech recognition difficult.
  • Automated speech recognition is challenging because abstract language units are represented by symbols if a computing device is interacted with by typing.

Speech Perception

  • Translating speech sounds, which are elusive, into stable representations is a challenge for humans.
  • Speech perception requires both stability and flexibility
  • Listeners structure information by overlooking irrelevant acoustics or attending to details when relevant.
  • People adjust to speech based on the speaker's differences in accent and the surrounding environmental conditions.

Variability of Sounds

  • The tongue and mouth constrain spoken language by gestures, shapes, and movements.
  • Sounds smear properties onto one and another, not having orderly spaces like letters.
  • Signed language is analogous, where each 'sound unit' is performed at a body location, with transitions between these gestures lacking clear boundaries

Coarticulation Effects

  • Variability arises from coarticulation effects and different vocal tracts; identifying definite acoustic properties that map to sounds is hard.
  • Coarticulation: The pronunciation variation of a phoneme affected by properties from neighboring sounds.
  • Perceptual Invariance: How is it that acoustic input can be mapped to representation?
  • Perceptual Invariance: Perceiving sounds with highly variable acoustics as instances of a sound category.

The Mind

  • The mind imposes structure on speech sounds as a result of learning.
  • Similar sounds carrying the function are grouped into 'phonemes'.
  • Phonemes break down into 'allophones', understood as part of the same class.
  • Phoneme: Smallest sound unit changing a word's meaning, an abstract unit related to possible pronunciations.
  • Allophones: Similar sounds that are variants of a phoneme.

Sorting Categories

  • Perceptual warping may be beneficial, mental structure may amplify acoustic differences while minimizing the others.
  • Perceptual warping allows ignoring insignificant sound differences.

Linguistic Properties

  • Speech sounds are grouped on how they are articulated like consonants, which is characterized by dimensions.
  • Place of articulation refers to where airflow is blocked.
  • Manner of articulation signifies extent of airflow cutoff.
  • "State of the glottis" signifies the presence of vocal fold vibration.

Variability of signed language

  • Variability in some ASL sources are: coarticulation effects, production speed, individual signer differences, age, and geographical location.
  • Challenges in visual: Visual Orientation, visual info masking.

Voicing

  • Voicing distinguishes voiced stop consonants like /b/ and /d/ from their voiceless counterparts /p/ and /t/.

###Categorical Perception

  • The idea is that vocal categories impose sharp boundaries.
  • Sounds in a phoneme category are perceived as, even if differing, while sound across the boundaries differ in ways.
  • Differences can be perceived even when subtle

Forced choice identification task

  • Categorize stimuli falling into one of two categories despite the uncertainty.

Adaptation-Signed language

  • It is easier to determine the differences between two fingers when the gap is very small
  • Those with more experience in sign language were less sensitive to "U" signs than those less experienced.

Speech Perception

  • Humans can sort sounds accurately.
  • One possibility is humans can amplify acoustic differences between sound and minimise others. Mental categories actually warp perception.

Speech Recognition

  • In 1922 Radio Rex toy released, springs activate with 500hz of acoustic energy.
  • Rex would obey adult males who contain the targeted vowel.
  • In 1952 "Audrey" was able to speak recognizing numbers zero to nine.
  • The relationship between abstractions and physical manifestations is complex.

Multiplicity-Sounds

  • Mental structure amplifies, while minimizing others with sound; The mental is what shapes speech.

Continuous Perception

  • Judgements bout the gradually identity change.

Categorical Perception

  • Judgements about which sound abruptly shift.

Voice Onset TIme(VOT)

  • English Voice sound(ba) can occur at Vot of 0 ms. Where as a typical unvoiced may sound at 60ms.

Categorical Perception

  • Assigning categories to continuous sound.

Speech studies

  • Many studies are to see if sounds. A study led by MC Murray to see often experience sounds.

Adaptive learning

  • Adaptive comes from hearing accents. Five native people from each background listened to these lines, so their minds can change a bit.

Web Activity 7.1

  • Testing the limits of Automatic Speech regonition
  • Web activity 7.2: Variability in signed language.
  • Web activity 7.4: Phoneme restoration.

Motor theory

  • Inability to process movement.

Gestural representation

  • Gestural representation for the daily task of speech of perception is complex and there is a disagreement on it.

Phoneme Restoration Effect

  • Non-speech sounds share acoustic with sound is heard w/ expectation.

McGurk Effect

  • Vision and auditory information affects the way we speak and is the reason why it causes to believe the sound.

Inability-To See

  • Auditory or visually makes humans perceive with different aspects and what is heard depends on what and why, and also to have to do with a lack of motor capabilities.

Speech adaptation

  • People can adapt to language and voices overtime.
  • Tuning for speech, sound, and to talker.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser