Podcast
Questions and Answers
What is the frequency range of the human auditory system?
What is the frequency range of the human auditory system?
- 4 kHz to 7 kHz
- 10 Hz to 20 kHz
- 50 Hz to 4 kHz
- 20 Hz to 20 kHz (correct)
What is the purpose of the filter bank in MPEG-1 Audio Encoding?
What is the purpose of the filter bank in MPEG-1 Audio Encoding?
- To divide the input into multiple sub-bands (correct)
- To improve the quality of the encoded audio
- To reduce the bitrate of the encoded audio
- To remove noise from the encoded audio
What is the target bitrate for Layer 3 of MPEG-1 Audio Encoding?
What is the target bitrate for Layer 3 of MPEG-1 Audio Encoding?
- 448 kbps
- 64 kbps (correct)
- 128 kbps
- 192 kbps
What is the dynamic range of the human auditory system?
What is the dynamic range of the human auditory system?
What is the primary purpose of psycho-acoustic characteristics in audio compression?
What is the primary purpose of psycho-acoustic characteristics in audio compression?
What is the primary advantage of Joint-stereo coding in MPEG-1 Audio Encoding?
What is the primary advantage of Joint-stereo coding in MPEG-1 Audio Encoding?
What is the purpose of Huffman coding in MPEG-1 Audio Encoding?
What is the purpose of Huffman coding in MPEG-1 Audio Encoding?
What is the sampling frequency of Layer 1 of MPEG-1 Audio Encoding?
What is the sampling frequency of Layer 1 of MPEG-1 Audio Encoding?
What is the primary application of Layer 2 of MPEG-1 Audio Encoding?
What is the primary application of Layer 2 of MPEG-1 Audio Encoding?
When was MPEG-1 Audio standardized?
When was MPEG-1 Audio standardized?
Flashcards are hidden until you start studying
Study Notes
Fourier Series and Transform
- Any periodic function can be expressed as the sum of a series of sines and cosines of varying amplitudes.
- The Fourier Transform maps a time series (e.g., audio samples) into the series of frequencies (their amplitudes and phases) that compose the time series.
- The Inverse Fourier Transform maps the series of frequencies (their amplitudes and phases) back into the corresponding time series.
- The two functions are inverses of each other.
Discrete Fourier Transform (DFT)
- The DFT takes a discrete signal in the time domain and transforms it into its discrete frequency domain representation.
- The DFT is extremely important in the area of frequency (spectrum) analysis.
Fast Fourier Transform (FFT)
- The FFT is a faster version of the DFT.
- The FFT utilizes some algorithms to do the same thing as the DFT, but in much less time.
Discrete Cosine Transform (DCT)
- The DCT is closely related to the DFT.
- The DCT can often reconstruct a sequence very accurately from only a few DCT coefficients, a useful property for applications requiring data reduction.
- The inverse DCT reconstructs a sequence from its DCT coefficients.
Audio Coding
- Pulse Code Modulation (PCM): sends every sample.
- Differential PCM (DPCM): sends differences between samples.
- Adaptive Differential PCM (ADPCM): sends differences, but adapts how they are coded.
- Sub-band ADPCM: uses ADPCM twice, once for lower frequencies, and again at a lower bitrate for upper frequencies.
- MP3 (MPEG-1 Audio Layer 3): a compressed audio format.
Why Compression is Needed
- Data rate = sampling rate * quantization bits * channels (+ control information).
- Compression is necessary to reduce the large amount of data generated by audio samples.
Compression Ratio
- Compression Ratio = (Original Data) / (Compressed Data).
Lossless and Lossy Compression
- Lossless compression: decoded audio is mathematically equivalent to the original one.
- Lossy compression: decoded audio is worse than the original one.
Pulse Code Modulation (PCM)
- Each sample's amplitude is represented by an integer code-word.
- Quantization error ("noise") occurs due to the limited number of code-words.
Linear PCM
- Uses evenly spaced quantization levels.
- Typically uses 16-bits per sample.
Telephony
- 8-bit linear encoding is poor quality.
- Solution: use 8 bits with an "logarithmic" encoding (non-linear sampling).
Non-linear Sampling
- If we try to use 8 bits per sample, dynamic range is reduced significantly, and quantization noise can be heard.
- Solution: sample more densely in the lower amplitudes and less densely for the higher amplitudes.
m-law and A-law
- Non-linear sampling called "companding".
- 8-bit companded provides dynamic range equivalent to 12-bits.
- m-law and A-law are companding standards.
Differential PCM
- Based on the fact that neighboring samples in a discrete audio sequence change slowly in many cases.
- Normally, the difference between samples is relatively small and can be coded with less than 8 bits.
ADPCM (Adaptive Differential PCM)
- Makes a simple prediction of the next sample, based on weighted previous n samples.
- Lossy coding of the difference between the actual sample and the prediction.
Sub-band ADPCM
- Codes the two frequency ranges (0-4KHz and 4-7KHz) separately.
- Filter into two bands: 50Hz - 4 KHz (encode at 48Kb/s) and 4KHz - 7KHz (encode at 16Kb/s).
Human Auditory System
- Human auditory system has limitations.
- Frequency range: 20 Hz to 20 kHz, sensitive at 2 to 4 KHz.
- Dynamic range (quietest to loudest) is about 96 dB.
MPEG-1 Audio
- Lossy compression of audio.
- In late 1980's ISO's MPEG group started to standardize audio compression for TV broadcasting and CD-ROM (later DVD).
MPEG-1 Audio Encoding
- Characteristics: precision 16 bits, sampling frequency: 32KHz, 44.1 KHz, 48 KHz.
- 3 compression layers: Layer 1, Layer 2, Layer 3.
- Supports one or two audio channels in one of the four modes: monophonic, dual-monophonic, stereo, and joint-stereo.
Huffman Coding
- Variable length coding, with most frequent codes using fewest bits and less frequent codes using more bits.
- Encoding done by building an encoding tree.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.