Lecture 4: Speech Recognition
Document Details
Uploaded by SufficientParrot
Tags
Summary
This lecture discusses speech recognition, including the speech signal in time and frequency domain, the process of ASR, the challenges of automatic speech recognition, speech production and perception and speaker verification. It covers topics such as word boundary hypothesis, speaking rate variability, vocabulary size, and noise.
Full Transcript
Utterance The Speech Signal In Frequency Domain What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR? Main Diagram of Speech Recognition Signal Constraints Search Why is Speech Recognition...
Utterance The Speech Signal In Frequency Domain What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR? Main Diagram of Speech Recognition Signal Constraints Search Why is Speech Recognition Difficult ? 1. Word boundary hypothesis is still an unsolved problem due to continuity, variability, and disfluencies in speakers 2. Speaking rate variability 3. Large vocabularies in all languages 4. Variability in ambient acoustics, channel characteristics, microphone characteristics, and background noise Speech Production\Perception (for reading) Multilayer Structure of speech production\recognition: [book_airplane_flight] [from_locality] Pragmatic Layer [to_locality] [ departure_time] Semantic Layer [I] [would] [like] [to] [book] [a] [flight] [from] [Rome] [to] [London][tomorrow][morning] Syntactic Layer [book]➔[b/uh/k] Prosodic/Phonetic Layer Acoustic Layer Extracting Information from Speech Goal: Automatically extract information transmitted in speech signal Speech Signal Speech Words recognition “How are you?” Speaker Speaker identity recognition Dr. Ahmad Speaker identification (for reading) Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification – Assume that all speakers are known to the system Open set identification – Possibility that speaker is not among the speakers known to the system ? Whose voice is this? ? ? Speaker Verification (for reading) Synonyms: authentication, detection User claims an identity System task: Accept or reject identity claim All speakers known: closed set Impostor: All voices but the true identity Is this Ahmad’s voice? ? Phases of Speaker Verification System (for reading) Two distinct phases to any speaker verification system Enrolment Enrolment speech for Voiceprints (models) Phase each speaker for each speaker Feature Model Ahmad extraction training Ahmad Salma Salma Verification Phase Feature Verification Accepted! extraction decision Claimed identity: Salma Verification Performance Evaluating Speaker Verification Systems There are many factors to consider in evaluating speaker verification systems: Speech quality Channel and microphone characteristics Noise level and type Variability between enrolment and verification speech Speech modality Fixed or user-selected phrases (free text) Speech duration Duration and number of sessions of enrolment and verification speech Speaker population Size