Acoustic Processing of Speech Signals
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What characteristic of vowels can be observed in a waveform?

  • Short duration and low amplitude
  • Long duration and relatively loud (correct)
  • Irregular pattern and low amplitude
  • Consistent pattern and high frequency

How can fricatives be identified in a waveform?

  • They produce an intense irregular pattern (correct)
  • They have a constant amplitude over time
  • They appear as smooth, consistent waves
  • They are characterized by high-frequency tones

Which of the following accurately describes spectral features?

  • They cannot represent phonetic features
  • They provide a less detailed classification than waveforms
  • They require only visual inspection for analysis
  • They interpret a complex wave as a sum of simpler waves (correct)

What does the repeated wave in the diagram represent in terms of frequency?

<p>It has a frequency of about 250 Hz (D)</p> Signup and view all the answers

What can be inferred about the smaller repeated wave in relation to the larger wave?

<p>It has a frequency approximately four times that of the larger wave (A)</p> Signup and view all the answers

Which type of software was mentioned for creating spectrograms?

<p>Gram software (C)</p> Signup and view all the answers

In a waveform, what does high amplitude indicate?

<p>A high volume sound (A)</p> Signup and view all the answers

Which phonetic feature would not be evident in a waveform without additional spectral analysis?

<p>Tone quality (C)</p> Signup and view all the answers

What two characteristics are most important in analyzing a wave?

<p>Frequency and amplitude (C)</p> Signup and view all the answers

If there are 28 repetitions of a wave captured in 0.11 seconds, what is the frequency in Hertz?

<p>255Hz (B)</p> Signup and view all the answers

What does a high amplitude in a waveform indicate?

<p>Higher than normal air pressure (A)</p> Signup and view all the answers

How is the perceptual correlate of frequency described?

<p>Sound pitch (B)</p> Signup and view all the answers

What does the zero value on the vertical axis of a waveform represent?

<p>Normal atmospheric pressure (D)</p> Signup and view all the answers

What are spectral features used to represent in acoustic processing of speech?

<p>The distribution of frequencies in a waveform (A)</p> Signup and view all the answers

What relationship between amplitude and loudness is described?

<p>Non-linear relationship (C)</p> Signup and view all the answers

What does LPC stand for in the context of speech signal processing?

<p>Linear Predictive Coding (C)</p> Signup and view all the answers

Why is analyzing waveforms important in understanding speech?

<p>They contain information to transcribe speech (A)</p> Signup and view all the answers

In acoustic processing, what is the primary role of feature extraction?

<p>To summarize and represent time slices of a speech signal (C)</p> Signup and view all the answers

How are sound waves represented in signal analysis?

<p>By a graph depicting air pressure changes over time (C)</p> Signup and view all the answers

If a sound has a lower frequency, how is its pitch perceived?

<p>As lower (C)</p> Signup and view all the answers

What is the significance of analyzing a waveform in acoustic processing?

<p>It enables the interpretation of sound frequencies (D)</p> Signup and view all the answers

What role do dialogue agents play in speech processing?

<p>They manage conversation flow and context (C)</p> Signup and view all the answers

What do changes in air pressure represent in the context of speech recognition?

<p>The sound waves generated by a speaker (D)</p> Signup and view all the answers

Which of the following tools is commonly used for parsing in speech recognition?

<p>Prolog (D)</p> Signup and view all the answers

What is the frequency of the first formant (F1) for the vowel [iy]?

<p>540 Hz (A)</p> Signup and view all the answers

What do dark bars on a spectrogram typically represent?

<p>Spectral peaks of vowels (C)</p> Signup and view all the answers

Which frequency range is associated with the second formant (F2) for the vowel [iy]?

<p>2581 Hz (A)</p> Signup and view all the answers

What primarily causes the differences in formant frequencies across vowels?

<p>Size of the oral cavity and tongue position (D)</p> Signup and view all the answers

Which of the following phones can be identified using formants?

<p>Nasal phones, lateral phone, and rhotic sound (A)</p> Signup and view all the answers

How do the formants differ between vowels such as [iy] and [ɒ]?

<p>Both first and second formant frequencies are different (C)</p> Signup and view all the answers

What effect does moving the tongue have on vowel frequency production?

<p>Creates resonant cavities that filter specific frequencies (B)</p> Signup and view all the answers

What role do formants play in vowel identification?

<p>They are crucial for recognizing vowel identity (B)</p> Signup and view all the answers

What term describes the maximum frequency that can be measured based on the sampling rate?

<p>Nyquist frequency (C)</p> Signup and view all the answers

Which of the following is NOT a step in the analogue-to-digital conversion process?

<p>Transmission (B)</p> Signup and view all the answers

How many amplitude measurements are required per second for a sampling rate of 8,000 Hz?

<p>8,000 (B)</p> Signup and view all the answers

What is the consequence of having less than two samples per cycle during the sampling process?

<p>Complete loss of frequency information (D)</p> Signup and view all the answers

Which of the following sampling rates would be sufficient to capture the majority of human speech frequencies below 10,000 Hz?

<p>16,000 Hz (B)</p> Signup and view all the answers

What is typically the integer representation size used for quantisation in digital audio?

<p>8-bit integers (A)</p> Signup and view all the answers

What is the purpose of quantisation in the context of digitising a waveform?

<p>To represent real-valued numbers as integers (D)</p> Signup and view all the answers

To digitise a sound wave effectively, how many samples should be taken for each cycle of the wave?

<p>Two samples (C)</p> Signup and view all the answers

What is the approximate frequency of the tiny wave observed on the 1000Hz waves?

<p>2000Hz (D)</p> Signup and view all the answers

What does the y-axis of a spectrum represent?

<p>Magnitude of each frequency component (D)</p> Signup and view all the answers

Why is an LPC spectrum utilized in speech applications?

<p>To simplify the analysis of frequency peaks (C)</p> Signup and view all the answers

Which of the following statements accurately describes a spectrogram?

<p>It visually represents how frequency components change over time. (B)</p> Signup and view all the answers

What characteristic does a spectrum help identify in sound waves?

<p>Spectral signatures of different sounds (A)</p> Signup and view all the answers

What is the main function of the cochlea in human audition?

<p>To compute a spectrum of incoming waveforms (C)</p> Signup and view all the answers

In a spectrogram, what does the darkness of a point indicate?

<p>The loudness of the sound (A)</p> Signup and view all the answers

How do scientists use spectral information to analyze chemical elements?

<p>By detecting wavelengths of light emitted when elements burn (C)</p> Signup and view all the answers

Flashcards

Speech Signal Analysis

A process of analyzing speech signals to extract meaningful information for computer processing, like speech recognition.

Feature Extraction

The process of selecting key characteristics (features) from a speech signal that help distinguish between sounds or words.

Fourier Analysis

A mathematical method for decomposing a complex signal into simpler sine and cosine waves, showing its frequency components.

Linear Predictive Coding (LPC)

A method of feature extraction that models a speech signal's short-term characteristics based on predicting the next sound.

Signup and view all the flashcards

Spectral Analysis

Examining the frequency components of a speech signal to understand how different frequencies contribute to the sound.

Signup and view all the flashcards

Sound Wave

A wave of fluctuating air pressure caused by a vibrating object, like the vocal cords.

Signup and view all the flashcards

Waveform

A graphical representation of a sound wave showing how air pressure changes over time.

Signup and view all the flashcards

Acoustic Processing

The general term for processes that analyze the sound vibrations of the speech signal to obtain features used for automated speech recognition and analysis.

Signup and view all the flashcards

Frequency

The number of wave repetitions per second, measured in Hertz (Hz).

Signup and view all the flashcards

Amplitude

The maximum extent of a wave's oscillation, relating to sound pressure.

Signup and view all the flashcards

Pitch

The perceptual correlate of frequency. Higher frequency = higher pitch.

Signup and view all the flashcards

Loudness

The perceptual correlate of sound wave amplitude, or power.

Signup and view all the flashcards

Hertz (Hz)

Unit of frequency, representing cycles per second.

Signup and view all the flashcards

Air Pressure Variation

Changes in air pressure that create sound waves.

Signup and view all the flashcards

Waveform Information

Speech can be understood from the sound wave's representation.

Signup and view all the flashcards

Vowel vs. Consonant

Vowels are typically long and loud in a waveform, while consonants are shorter and have a more irregular pattern.

Signup and view all the flashcards

Fricative

A consonant sound produced by forcing air through a narrow opening in the mouth, creating a hissing or friction sound.

Signup and view all the flashcards

Spectral Features

Properties of sound based on its frequency components, often displayed in a spectrogram.

Signup and view all the flashcards

Waveform Periodicity

The repeating pattern in a waveform, often indicating a specific sound.

Signup and view all the flashcards

Spectrogram

A visual representation of a sound's frequency components over time, depicting the sound's spectral features.

Signup and view all the flashcards

What does a spectrum show?

A spectrum shows the different frequency components of a wave at a single point in time.

Signup and view all the flashcards

What's a spectrogram?

A spectrogram visualizes how different frequencies in a waveform change over time.

Signup and view all the flashcards

What's the x-axis on a spectrogram?

The x-axis of a spectrogram represents time.

Signup and view all the flashcards

What's the y-axis on a spectrogram?

The y-axis of a spectrogram shows frequency in Hertz (Hz).

Signup and view all the flashcards

What does the darkness in a spectrogram show?

The darkness of a point on a spectrogram indicates the amplitude of the corresponding frequency component.

Signup and view all the flashcards

What's a formant?

A formant is a peak in the spectrum of a sound, indicating a significant frequency component.

Signup and view all the flashcards

How do formants help in speech recognition?

Different sounds have characteristic formant patterns, which helps both humans and machines recognize them.

Signup and view all the flashcards

What's LPC spectrum?

An LPC spectrum is a type of spectrum that makes it easier to identify formants by reducing noise and highlighting peaks.

Signup and view all the flashcards

Sampling Rate

The number of samples taken per second of a sound wave in the digitization process.

Signup and view all the flashcards

Nyquist Frequency

The maximum frequency that can be faithfully captured during digitization, equal to half the sampling rate.

Signup and view all the flashcards

Quantisation

The process of representing analog values (continuous signals) as discrete integers with a specific level of granularity.

Signup and view all the flashcards

Digitization

The process of converting an analog sound wave into a digital representation.

Signup and view all the flashcards

Feature Vector

A representation of a sound signal's essential characteristics in a numerical format.

Signup and view all the flashcards

Analogue-to-Digital Conversion

The process of transforming a continuous sound wave into a digital representation using sampling and quantization.

Signup and view all the flashcards

Amplitude Accuracy

How precisely the amplitude of a sound wave is captured during digitization.

Signup and view all the flashcards

What are formants?

Formants are dark horizontal bars on a spectrogram representing spectral peaks, typically of vowels. They indicate the frequencies that are amplified during sound production.

Signup and view all the flashcards

What do formants tell us about vowels?

Different vowels have their formants at characteristic locations, helping us distinguish between them. For example, the vowel [iy] in 'she' has formants at different frequencies compared to the vowel [] in 'that'.

Signup and view all the flashcards

How do formants help identify other sounds?

Formants can also be used to identify nasal sounds like [m], [n], and [ŋ] and the lateral sound [l] and [r]. These sounds have different formant patterns compared to vowels.

Signup and view all the flashcards

What causes formants?

Formants are caused by the resonant cavities of the mouth. The shape of the mouth, created by the position of the tongue, acts like a filter, amplifying certain frequencies and attenuating others.

Signup and view all the flashcards

How do tongue positions affect formants?

Moving the tongue around in the mouth creates different sized spaces which amplify different frequencies. This is why different tongue positions produce different vowel sounds with different formant patterns.

Signup and view all the flashcards

What is the relationship between formants and vowel identity?

Formants are a primary factor in our perception of vowel sounds. Vowels with similar formant patterns will sound similar.

Signup and view all the flashcards

What is the first formant (F1)?

The first formant (F1) is the lowest frequency formant in a vowel sound. It is related to the size of the pharyngeal cavity and the space between the tongue and the roof of the mouth.

Signup and view all the flashcards

What is the second formant (F2)?

The second formant (F2) is the second lowest frequency formant in a vowel sound. It is influenced by the size of the oral cavity and the position of the front of the tongue.

Signup and view all the flashcards

Study Notes

Acoustic Processing of Speech Signals

  • Introduction to the acoustic processing of speech signals (basis of speech recognition by computers)
  • Signal Analysis
  • Feature Extraction
    • Fourier Analysis and Linear Predictive Coding (LPC)
    • Spectral Analysis and Spectra: Human Voiceprints
  • Sound Waves
    • Describing the input to a speech recognizer as a complex series of air pressure changes originating from the speaker.
    • These changes are caused by the specific way air passes through the glottis and out the oral or nasal cavities.
  • Interpreting a waveform
    • Representation of sound waves by plotting changes in air pressure over time.
    • Graph plotting of a vertical plate blocking the air pressure waves (microphone in front of a speaker, or the eardrum of a hearer).
    • The graph measures the amount of compression of air molecules at the plate.
  • Waveform example: A diagram shows a speech waveform taken from a corpus of telephone speech of someone saying "she just had a baby".
  • Characteristics of a wave:
    • Frequency: the number of times a wave repeats itself per second (measured in Hertz, Hz).
    • Amplitude: the amount of air pressure variation, measured on a vertical axis--a high value = higher air pressure at a given time, a zero value = normal atmospheric pressure, and a negative value = lower than normal air pressure.
  • Perceptual properties related to frequency and amplitude:
    • Pitch = perceptual correlate of frequency, a higher frequency = higher perceived pitch (but relationship is not linear)
    • Loudness = perceptual correlate of power, which is related to the square of the amplitude (higher amplitude = louder sound, but relationship is not linear).
  • Interpreting waveforms:
    • Humans (and computers) can understand speech given the sound wave, so the waveform must contain relevant information.
    • Visual inspection of waveforms can reveal characteristics, like the difference between vowels and consonants.
    • Vowels tend to be long and relatively loud (high amplitude).
    • Fricatives (like [sh]) create an intense irregular/noisy pattern.

Major Topics of this Course

  • Language Structure
    • Phonology
    • Syntax
    • Language Meaning
    • Dialogue Agents
      • Discourse structures
      • Conversational Agents in software (XML/XMA)
    • Use of XML and associated application interfaces.
  • Tools and Technologies
    • Grammars (pre-existing)
    • Parsers
    • Data Structures
  • Machine Translation
    • Translation Engines
  • Speech / Audio Processing
    • Speech recognition
    • Speech synthesis systems

Spectrograms

  • Spectra are representations of frequency components within a wave at a specific point in time.
  • They can be calculated using a Fourier transform.
  • LPC (Linear Predictive Coding) spectra are used often for speech because they are good at showing the peaks in the spectrum.
    • A spectrogram displays how the frequency components of a wave change over time.
  • In human audition, the cochlea computes a spectrum of the incoming waveform.
  • Spectral information is important for both human and machine speech recognition
    • Spectral peaks are characteristic of sounds.
    • Phones (speech sounds) have characteristic spectral signatures (similar how chemical elements have characteristic light wavelengths).
    • Speakers have varying features.
    • Formant: The dark horizontal bars on a spectrogram, representing spectral peaks, usually of vowels.
  • The x-axis shows time in a spectrogram, and the y-axis shows frequency (in Hz).
  • Darkness of a point = The magnitude/amplitude of the frequency component.
  • Examples of spectrograms for the vowel [i] in the word "she"

Spectral Features

  • Based on Fourier's insight (complex waves can be broken down into sums of simple waves of different frequencies)
  • Analogies to musical concepts (a chord is composed of multiple notes).
  • These features are used for more advanced classification than just broad phonetic features.

Feature Extraction

  • Process involves digitizing the input sound wave first using sampling and quantization
  • Sampling: Measuring the input sound wave's amplitude at regular intervals (sampling rate is important, commonly 8,000Hz or 16,000Hz).
  • Quantization: Assigning each sample to an integer value within a range; this involves using a certain precision (usually 8 or 16 bit; important for resolution/accuracy).
  • Maximum frequency (Nyquist Frequency): The maximum frequency that can be measured accurately by a given sampling rate is half of that sampling rate.
  • These digitized samples are then converted to a set of spectral features.
  • The final result is a feature vector which may be used in further speech processing.

To Do This Week

  • Read chapter 7 of Jurafsky and Martin textbook
  • Record and Analyze utterances (using software):
    • 'Say hod twice', 'Say hood twice', etc.
    • Save as RAW format.
    • Make waveform recordings & spectrograms.
    • Note when the vowel starts and ends, and identify formants (frequencies).
    • Compare results with classmates.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

This quiz explores the fundamental aspects of acoustic processing for speech recognition by computers. It covers signal analysis, feature extraction techniques like Fourier Analysis and Linear Predictive Coding, and the interpretation of sound waves. Learn how these concepts form the basis for understanding human voiceprints and the mechanics of waveform representation.

More Like This

Hearing and Voice Mechanics
37 questions
Signal Conversion and Room Acoustics
40 questions
Kembrañs Microfon ha Teknologiezh
9 questions
Signaux et Acoustique
11 questions

Signaux et Acoustique

TriumphantPhiladelphia avatar
TriumphantPhiladelphia
Use Quizgecko on...
Browser
Browser