Speech Recognition Lecture Notes PDF

What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR ? What is speech recognition ‘SR’ or ‘ASR? Main Diagram of Speech Recognition Signal Constraints Search Why is Speech Recognition Difficult ? 1. Word boundary hypothesis is still an unsolved problem due to continuity, variability, and disﬂuencies in speakers 2. Speaking rate variability 3. Large vocabularies in all languages 4. Variability in ambient acoustics, channel characteristics, microphone characteristics, and background noise Speech Production\Perception (for reading) Multilayer Structure of speech production\recognition: [book_airplane_flight] [from_locality] Pragmatic Layer [to_locality] [ departure_time] Semantic Layer [I] [would] [like] [to] [book] [a] [flight] [from] [Rome] [to] [London][tomorrow][morning] Syntactic Layer [book][b/uh/k] Prosodic/Phonetic Layer Acoustic Layer Extracting Information from Speech Goal: Automatically extract information transmitted in speech signal Speech Signal Speech Words recognition “How are you?” Speaker Speaker identity recognition Dr. Ahmad Speaker identification (for reading) Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification – Assume that all speakers are known to the system Open set identification – Possibility that speaker is not among the speakers known to the system ? Whose voice is this? ? ? Speaker Verification (for reading) Synonyms: authentication, detection User claims an identity System task: Accept or reject identity claim All speakers known: closed set Impostor: All voices but the true identity Is this Ahmad’s voice? ? Phases of Speaker Verification System (for reading) Two distinct phases to any speaker verification system Enrolment Phase Enrolment speech for Voiceprints (models) each speaker for each speaker Feature Model Ahmad extraction training Ahmad Salma Salma Verification Phase Feature Verification Accepted! extraction decision Claimed identity: Salma Verification Performance Evaluating Speaker Verification Systems There are many factors to consider in evaluating speaker verification systems: Speech quality Channel and microphone characteristics Noise level and type Variability between enrolment and verification speech Speech modality Fixed or user-selected phrases (free text) Speech duration Duration and number of sessions of enrolment and verification speech Speaker population Size Speech Modalities (for reading) Application dictates different speech modalities: Text-dependent recognition – Recognition system knows text spoken by person – Examples: fixed phrase, prompted phrase – Used for applications with strong control over user input – Knowledge of spoken text can improve system performance – Prompting may reduce risk of impostors using voice recordings Text-independent recognition – Recognition system does not know text spoken by person – Examples: User selected phrase, conversational speech – Used for applications with less control over user input – More flexible system but also more difficult problem – Speech recognition can provide knowledge of spoken text

Speech Recognition Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue