Podcast
Questions and Answers
What characteristic of vowels can be observed in a waveform?
What characteristic of vowels can be observed in a waveform?
How can fricatives be identified in a waveform?
How can fricatives be identified in a waveform?
Which of the following accurately describes spectral features?
Which of the following accurately describes spectral features?
What does the repeated wave in the diagram represent in terms of frequency?
What does the repeated wave in the diagram represent in terms of frequency?
Signup and view all the answers
What can be inferred about the smaller repeated wave in relation to the larger wave?
What can be inferred about the smaller repeated wave in relation to the larger wave?
Signup and view all the answers
Which type of software was mentioned for creating spectrograms?
Which type of software was mentioned for creating spectrograms?
Signup and view all the answers
In a waveform, what does high amplitude indicate?
In a waveform, what does high amplitude indicate?
Signup and view all the answers
Which phonetic feature would not be evident in a waveform without additional spectral analysis?
Which phonetic feature would not be evident in a waveform without additional spectral analysis?
Signup and view all the answers
What two characteristics are most important in analyzing a wave?
What two characteristics are most important in analyzing a wave?
Signup and view all the answers
If there are 28 repetitions of a wave captured in 0.11 seconds, what is the frequency in Hertz?
If there are 28 repetitions of a wave captured in 0.11 seconds, what is the frequency in Hertz?
Signup and view all the answers
What does a high amplitude in a waveform indicate?
What does a high amplitude in a waveform indicate?
Signup and view all the answers
How is the perceptual correlate of frequency described?
How is the perceptual correlate of frequency described?
Signup and view all the answers
What does the zero value on the vertical axis of a waveform represent?
What does the zero value on the vertical axis of a waveform represent?
Signup and view all the answers
What are spectral features used to represent in acoustic processing of speech?
What are spectral features used to represent in acoustic processing of speech?
Signup and view all the answers
What relationship between amplitude and loudness is described?
What relationship between amplitude and loudness is described?
Signup and view all the answers
What does LPC stand for in the context of speech signal processing?
What does LPC stand for in the context of speech signal processing?
Signup and view all the answers
Why is analyzing waveforms important in understanding speech?
Why is analyzing waveforms important in understanding speech?
Signup and view all the answers
In acoustic processing, what is the primary role of feature extraction?
In acoustic processing, what is the primary role of feature extraction?
Signup and view all the answers
How are sound waves represented in signal analysis?
How are sound waves represented in signal analysis?
Signup and view all the answers
If a sound has a lower frequency, how is its pitch perceived?
If a sound has a lower frequency, how is its pitch perceived?
Signup and view all the answers
What is the significance of analyzing a waveform in acoustic processing?
What is the significance of analyzing a waveform in acoustic processing?
Signup and view all the answers
What role do dialogue agents play in speech processing?
What role do dialogue agents play in speech processing?
Signup and view all the answers
What do changes in air pressure represent in the context of speech recognition?
What do changes in air pressure represent in the context of speech recognition?
Signup and view all the answers
Which of the following tools is commonly used for parsing in speech recognition?
Which of the following tools is commonly used for parsing in speech recognition?
Signup and view all the answers
What is the frequency of the first formant (F1) for the vowel [iy]?
What is the frequency of the first formant (F1) for the vowel [iy]?
Signup and view all the answers
What do dark bars on a spectrogram typically represent?
What do dark bars on a spectrogram typically represent?
Signup and view all the answers
Which frequency range is associated with the second formant (F2) for the vowel [iy]?
Which frequency range is associated with the second formant (F2) for the vowel [iy]?
Signup and view all the answers
What primarily causes the differences in formant frequencies across vowels?
What primarily causes the differences in formant frequencies across vowels?
Signup and view all the answers
Which of the following phones can be identified using formants?
Which of the following phones can be identified using formants?
Signup and view all the answers
How do the formants differ between vowels such as [iy] and [ɒ]?
How do the formants differ between vowels such as [iy] and [ɒ]?
Signup and view all the answers
What effect does moving the tongue have on vowel frequency production?
What effect does moving the tongue have on vowel frequency production?
Signup and view all the answers
What role do formants play in vowel identification?
What role do formants play in vowel identification?
Signup and view all the answers
What term describes the maximum frequency that can be measured based on the sampling rate?
What term describes the maximum frequency that can be measured based on the sampling rate?
Signup and view all the answers
Which of the following is NOT a step in the analogue-to-digital conversion process?
Which of the following is NOT a step in the analogue-to-digital conversion process?
Signup and view all the answers
How many amplitude measurements are required per second for a sampling rate of 8,000 Hz?
How many amplitude measurements are required per second for a sampling rate of 8,000 Hz?
Signup and view all the answers
What is the consequence of having less than two samples per cycle during the sampling process?
What is the consequence of having less than two samples per cycle during the sampling process?
Signup and view all the answers
Which of the following sampling rates would be sufficient to capture the majority of human speech frequencies below 10,000 Hz?
Which of the following sampling rates would be sufficient to capture the majority of human speech frequencies below 10,000 Hz?
Signup and view all the answers
What is typically the integer representation size used for quantisation in digital audio?
What is typically the integer representation size used for quantisation in digital audio?
Signup and view all the answers
What is the purpose of quantisation in the context of digitising a waveform?
What is the purpose of quantisation in the context of digitising a waveform?
Signup and view all the answers
To digitise a sound wave effectively, how many samples should be taken for each cycle of the wave?
To digitise a sound wave effectively, how many samples should be taken for each cycle of the wave?
Signup and view all the answers
What is the approximate frequency of the tiny wave observed on the 1000Hz waves?
What is the approximate frequency of the tiny wave observed on the 1000Hz waves?
Signup and view all the answers
What does the y-axis of a spectrum represent?
What does the y-axis of a spectrum represent?
Signup and view all the answers
Why is an LPC spectrum utilized in speech applications?
Why is an LPC spectrum utilized in speech applications?
Signup and view all the answers
Which of the following statements accurately describes a spectrogram?
Which of the following statements accurately describes a spectrogram?
Signup and view all the answers
What characteristic does a spectrum help identify in sound waves?
What characteristic does a spectrum help identify in sound waves?
Signup and view all the answers
What is the main function of the cochlea in human audition?
What is the main function of the cochlea in human audition?
Signup and view all the answers
In a spectrogram, what does the darkness of a point indicate?
In a spectrogram, what does the darkness of a point indicate?
Signup and view all the answers
How do scientists use spectral information to analyze chemical elements?
How do scientists use spectral information to analyze chemical elements?
Signup and view all the answers
Study Notes
Acoustic Processing of Speech Signals
- Introduction to the acoustic processing of speech signals (basis of speech recognition by computers)
- Signal Analysis
- Feature Extraction
- Fourier Analysis and Linear Predictive Coding (LPC)
- Spectral Analysis and Spectra: Human Voiceprints
- Sound Waves
- Describing the input to a speech recognizer as a complex series of air pressure changes originating from the speaker.
- These changes are caused by the specific way air passes through the glottis and out the oral or nasal cavities.
- Interpreting a waveform
- Representation of sound waves by plotting changes in air pressure over time.
- Graph plotting of a vertical plate blocking the air pressure waves (microphone in front of a speaker, or the eardrum of a hearer).
- The graph measures the amount of compression of air molecules at the plate.
- Waveform example: A diagram shows a speech waveform taken from a corpus of telephone speech of someone saying "she just had a baby".
- Characteristics of a wave:
- Frequency: the number of times a wave repeats itself per second (measured in Hertz, Hz).
- Amplitude: the amount of air pressure variation, measured on a vertical axis--a high value = higher air pressure at a given time, a zero value = normal atmospheric pressure, and a negative value = lower than normal air pressure.
- Perceptual properties related to frequency and amplitude:
- Pitch = perceptual correlate of frequency, a higher frequency = higher perceived pitch (but relationship is not linear)
- Loudness = perceptual correlate of power, which is related to the square of the amplitude (higher amplitude = louder sound, but relationship is not linear).
- Interpreting waveforms:
- Humans (and computers) can understand speech given the sound wave, so the waveform must contain relevant information.
- Visual inspection of waveforms can reveal characteristics, like the difference between vowels and consonants.
- Vowels tend to be long and relatively loud (high amplitude).
- Fricatives (like [sh]) create an intense irregular/noisy pattern.
Major Topics of this Course
- Language Structure
- Phonology
- Syntax
- Language Meaning
- Dialogue Agents
- Discourse structures
- Conversational Agents in software (XML/XMA)
- Use of XML and associated application interfaces.
- Tools and Technologies
- Grammars (pre-existing)
- Parsers
- Data Structures
- Machine Translation
- Translation Engines
- Speech / Audio Processing
- Speech recognition
- Speech synthesis systems
Spectrograms
- Spectra are representations of frequency components within a wave at a specific point in time.
- They can be calculated using a Fourier transform.
- LPC (Linear Predictive Coding) spectra are used often for speech because they are good at showing the peaks in the spectrum.
- A spectrogram displays how the frequency components of a wave change over time.
- In human audition, the cochlea computes a spectrum of the incoming waveform.
- Spectral information is important for both human and machine speech recognition
- Spectral peaks are characteristic of sounds.
- Phones (speech sounds) have characteristic spectral signatures (similar how chemical elements have characteristic light wavelengths).
- Speakers have varying features.
- Formant: The dark horizontal bars on a spectrogram, representing spectral peaks, usually of vowels.
- The x-axis shows time in a spectrogram, and the y-axis shows frequency (in Hz).
- Darkness of a point = The magnitude/amplitude of the frequency component.
- Examples of spectrograms for the vowel [i] in the word "she"
Spectral Features
- Based on Fourier's insight (complex waves can be broken down into sums of simple waves of different frequencies)
- Analogies to musical concepts (a chord is composed of multiple notes).
- These features are used for more advanced classification than just broad phonetic features.
Feature Extraction
- Process involves digitizing the input sound wave first using sampling and quantization
- Sampling: Measuring the input sound wave's amplitude at regular intervals (sampling rate is important, commonly 8,000Hz or 16,000Hz).
- Quantization: Assigning each sample to an integer value within a range; this involves using a certain precision (usually 8 or 16 bit; important for resolution/accuracy).
- Maximum frequency (Nyquist Frequency): The maximum frequency that can be measured accurately by a given sampling rate is half of that sampling rate.
- These digitized samples are then converted to a set of spectral features.
- The final result is a feature vector which may be used in further speech processing.
To Do This Week
- Read chapter 7 of Jurafsky and Martin textbook
- Record and Analyze utterances (using software):
- 'Say hod twice', 'Say hood twice', etc.
- Save as RAW format.
- Make waveform recordings & spectrograms.
- Note when the vowel starts and ends, and identify formants (frequencies).
- Compare results with classmates.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores the fundamental aspects of acoustic processing for speech recognition by computers. It covers signal analysis, feature extraction techniques like Fourier Analysis and Linear Predictive Coding, and the interpretation of sound waves. Learn how these concepts form the basis for understanding human voiceprints and the mechanics of waveform representation.