Acoustic Processing of Speech Signals
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What characteristic of vowels can be observed in a waveform?

  • Short duration and low amplitude
  • Long duration and relatively loud (correct)
  • Irregular pattern and low amplitude
  • Consistent pattern and high frequency
  • How can fricatives be identified in a waveform?

  • They produce an intense irregular pattern (correct)
  • They have a constant amplitude over time
  • They appear as smooth, consistent waves
  • They are characterized by high-frequency tones
  • Which of the following accurately describes spectral features?

  • They cannot represent phonetic features
  • They provide a less detailed classification than waveforms
  • They require only visual inspection for analysis
  • They interpret a complex wave as a sum of simpler waves (correct)
  • What does the repeated wave in the diagram represent in terms of frequency?

    <p>It has a frequency of about 250 Hz</p> Signup and view all the answers

    What can be inferred about the smaller repeated wave in relation to the larger wave?

    <p>It has a frequency approximately four times that of the larger wave</p> Signup and view all the answers

    Which type of software was mentioned for creating spectrograms?

    <p>Gram software</p> Signup and view all the answers

    In a waveform, what does high amplitude indicate?

    <p>A high volume sound</p> Signup and view all the answers

    Which phonetic feature would not be evident in a waveform without additional spectral analysis?

    <p>Tone quality</p> Signup and view all the answers

    What two characteristics are most important in analyzing a wave?

    <p>Frequency and amplitude</p> Signup and view all the answers

    If there are 28 repetitions of a wave captured in 0.11 seconds, what is the frequency in Hertz?

    <p>255Hz</p> Signup and view all the answers

    What does a high amplitude in a waveform indicate?

    <p>Higher than normal air pressure</p> Signup and view all the answers

    How is the perceptual correlate of frequency described?

    <p>Sound pitch</p> Signup and view all the answers

    What does the zero value on the vertical axis of a waveform represent?

    <p>Normal atmospheric pressure</p> Signup and view all the answers

    What are spectral features used to represent in acoustic processing of speech?

    <p>The distribution of frequencies in a waveform</p> Signup and view all the answers

    What relationship between amplitude and loudness is described?

    <p>Non-linear relationship</p> Signup and view all the answers

    What does LPC stand for in the context of speech signal processing?

    <p>Linear Predictive Coding</p> Signup and view all the answers

    Why is analyzing waveforms important in understanding speech?

    <p>They contain information to transcribe speech</p> Signup and view all the answers

    In acoustic processing, what is the primary role of feature extraction?

    <p>To summarize and represent time slices of a speech signal</p> Signup and view all the answers

    How are sound waves represented in signal analysis?

    <p>By a graph depicting air pressure changes over time</p> Signup and view all the answers

    If a sound has a lower frequency, how is its pitch perceived?

    <p>As lower</p> Signup and view all the answers

    What is the significance of analyzing a waveform in acoustic processing?

    <p>It enables the interpretation of sound frequencies</p> Signup and view all the answers

    What role do dialogue agents play in speech processing?

    <p>They manage conversation flow and context</p> Signup and view all the answers

    What do changes in air pressure represent in the context of speech recognition?

    <p>The sound waves generated by a speaker</p> Signup and view all the answers

    Which of the following tools is commonly used for parsing in speech recognition?

    <p>Prolog</p> Signup and view all the answers

    What is the frequency of the first formant (F1) for the vowel [iy]?

    <p>540 Hz</p> Signup and view all the answers

    What do dark bars on a spectrogram typically represent?

    <p>Spectral peaks of vowels</p> Signup and view all the answers

    Which frequency range is associated with the second formant (F2) for the vowel [iy]?

    <p>2581 Hz</p> Signup and view all the answers

    What primarily causes the differences in formant frequencies across vowels?

    <p>Size of the oral cavity and tongue position</p> Signup and view all the answers

    Which of the following phones can be identified using formants?

    <p>Nasal phones, lateral phone, and rhotic sound</p> Signup and view all the answers

    How do the formants differ between vowels such as [iy] and [ɒ]?

    <p>Both first and second formant frequencies are different</p> Signup and view all the answers

    What effect does moving the tongue have on vowel frequency production?

    <p>Creates resonant cavities that filter specific frequencies</p> Signup and view all the answers

    What role do formants play in vowel identification?

    <p>They are crucial for recognizing vowel identity</p> Signup and view all the answers

    What term describes the maximum frequency that can be measured based on the sampling rate?

    <p>Nyquist frequency</p> Signup and view all the answers

    Which of the following is NOT a step in the analogue-to-digital conversion process?

    <p>Transmission</p> Signup and view all the answers

    How many amplitude measurements are required per second for a sampling rate of 8,000 Hz?

    <p>8,000</p> Signup and view all the answers

    What is the consequence of having less than two samples per cycle during the sampling process?

    <p>Complete loss of frequency information</p> Signup and view all the answers

    Which of the following sampling rates would be sufficient to capture the majority of human speech frequencies below 10,000 Hz?

    <p>16,000 Hz</p> Signup and view all the answers

    What is typically the integer representation size used for quantisation in digital audio?

    <p>8-bit integers</p> Signup and view all the answers

    What is the purpose of quantisation in the context of digitising a waveform?

    <p>To represent real-valued numbers as integers</p> Signup and view all the answers

    To digitise a sound wave effectively, how many samples should be taken for each cycle of the wave?

    <p>Two samples</p> Signup and view all the answers

    What is the approximate frequency of the tiny wave observed on the 1000Hz waves?

    <p>2000Hz</p> Signup and view all the answers

    What does the y-axis of a spectrum represent?

    <p>Magnitude of each frequency component</p> Signup and view all the answers

    Why is an LPC spectrum utilized in speech applications?

    <p>To simplify the analysis of frequency peaks</p> Signup and view all the answers

    Which of the following statements accurately describes a spectrogram?

    <p>It visually represents how frequency components change over time.</p> Signup and view all the answers

    What characteristic does a spectrum help identify in sound waves?

    <p>Spectral signatures of different sounds</p> Signup and view all the answers

    What is the main function of the cochlea in human audition?

    <p>To compute a spectrum of incoming waveforms</p> Signup and view all the answers

    In a spectrogram, what does the darkness of a point indicate?

    <p>The loudness of the sound</p> Signup and view all the answers

    How do scientists use spectral information to analyze chemical elements?

    <p>By detecting wavelengths of light emitted when elements burn</p> Signup and view all the answers

    Study Notes

    Acoustic Processing of Speech Signals

    • Introduction to the acoustic processing of speech signals (basis of speech recognition by computers)
    • Signal Analysis
    • Feature Extraction
      • Fourier Analysis and Linear Predictive Coding (LPC)
      • Spectral Analysis and Spectra: Human Voiceprints
    • Sound Waves
      • Describing the input to a speech recognizer as a complex series of air pressure changes originating from the speaker.
      • These changes are caused by the specific way air passes through the glottis and out the oral or nasal cavities.
    • Interpreting a waveform
      • Representation of sound waves by plotting changes in air pressure over time.
      • Graph plotting of a vertical plate blocking the air pressure waves (microphone in front of a speaker, or the eardrum of a hearer).
      • The graph measures the amount of compression of air molecules at the plate.
    • Waveform example: A diagram shows a speech waveform taken from a corpus of telephone speech of someone saying "she just had a baby".
    • Characteristics of a wave:
      • Frequency: the number of times a wave repeats itself per second (measured in Hertz, Hz).
      • Amplitude: the amount of air pressure variation, measured on a vertical axis--a high value = higher air pressure at a given time, a zero value = normal atmospheric pressure, and a negative value = lower than normal air pressure.
    • Perceptual properties related to frequency and amplitude:
      • Pitch = perceptual correlate of frequency, a higher frequency = higher perceived pitch (but relationship is not linear)
      • Loudness = perceptual correlate of power, which is related to the square of the amplitude (higher amplitude = louder sound, but relationship is not linear).
    • Interpreting waveforms:
      • Humans (and computers) can understand speech given the sound wave, so the waveform must contain relevant information.
      • Visual inspection of waveforms can reveal characteristics, like the difference between vowels and consonants.
      • Vowels tend to be long and relatively loud (high amplitude).
      • Fricatives (like [sh]) create an intense irregular/noisy pattern.

    Major Topics of this Course

    • Language Structure
      • Phonology
      • Syntax
      • Language Meaning
      • Dialogue Agents
        • Discourse structures
        • Conversational Agents in software (XML/XMA)
      • Use of XML and associated application interfaces.
    • Tools and Technologies
      • Grammars (pre-existing)
      • Parsers
      • Data Structures
    • Machine Translation
      • Translation Engines
    • Speech / Audio Processing
      • Speech recognition
      • Speech synthesis systems

    Spectrograms

    • Spectra are representations of frequency components within a wave at a specific point in time.
    • They can be calculated using a Fourier transform.
    • LPC (Linear Predictive Coding) spectra are used often for speech because they are good at showing the peaks in the spectrum.
      • A spectrogram displays how the frequency components of a wave change over time.
    • In human audition, the cochlea computes a spectrum of the incoming waveform.
    • Spectral information is important for both human and machine speech recognition
      • Spectral peaks are characteristic of sounds.
      • Phones (speech sounds) have characteristic spectral signatures (similar how chemical elements have characteristic light wavelengths).
      • Speakers have varying features.
      • Formant: The dark horizontal bars on a spectrogram, representing spectral peaks, usually of vowels.
    • The x-axis shows time in a spectrogram, and the y-axis shows frequency (in Hz).
    • Darkness of a point = The magnitude/amplitude of the frequency component.
    • Examples of spectrograms for the vowel [i] in the word "she"

    Spectral Features

    • Based on Fourier's insight (complex waves can be broken down into sums of simple waves of different frequencies)
    • Analogies to musical concepts (a chord is composed of multiple notes).
    • These features are used for more advanced classification than just broad phonetic features.

    Feature Extraction

    • Process involves digitizing the input sound wave first using sampling and quantization
    • Sampling: Measuring the input sound wave's amplitude at regular intervals (sampling rate is important, commonly 8,000Hz or 16,000Hz).
    • Quantization: Assigning each sample to an integer value within a range; this involves using a certain precision (usually 8 or 16 bit; important for resolution/accuracy).
    • Maximum frequency (Nyquist Frequency): The maximum frequency that can be measured accurately by a given sampling rate is half of that sampling rate.
    • These digitized samples are then converted to a set of spectral features.
    • The final result is a feature vector which may be used in further speech processing.

    To Do This Week

    • Read chapter 7 of Jurafsky and Martin textbook
    • Record and Analyze utterances (using software):
      • 'Say hod twice', 'Say hood twice', etc.
      • Save as RAW format.
      • Make waveform recordings & spectrograms.
      • Note when the vowel starts and ends, and identify formants (frequencies).
      • Compare results with classmates.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz explores the fundamental aspects of acoustic processing for speech recognition by computers. It covers signal analysis, feature extraction techniques like Fourier Analysis and Linear Predictive Coding, and the interpretation of sound waves. Learn how these concepts form the basis for understanding human voiceprints and the mechanics of waveform representation.

    More Like This

    Audio Fundamentals: Waves and Harmonics
    10 questions
    Efectos de sonido y procesamiento
    12 questions
    Hearing and Voice Mechanics
    37 questions
    Signal Conversion and Room Acoustics
    40 questions
    Use Quizgecko on...
    Browser
    Browser