Digital Audio Slides - PDF
Document Details

Uploaded by itsFreak1795
Tags
Summary
These slides cover digital audio, including sound waves, ADC, DAC, sampling rate, bit depth and digital audio formats. The presentation explains the process of converting analog sound waves into digital signals and back.
Full Transcript
Digital Audio 2024 Jordan Miller This work may not be distributed without permission with contributions from, Mingwu Chen, MacAvon Media, Pearson Education (Yue-Ling Wong Sound Sound is a wave that is generated by vibrating objects in a medium such...
Digital Audio 2024 Jordan Miller This work may not be distributed without permission with contributions from, Mingwu Chen, MacAvon Media, Pearson Education (Yue-Ling Wong Sound Sound is a wave that is generated by vibrating objects in a medium such as air. Examples of vibrating objects: vocal cords of a person guitar strings tunning fork © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 2 reserved. Sound Sound is the propagating wave formed by changes of the air pressure. The sound wave reaching the recorder is captured and converted to changes of electrical signals over time. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 3 reserved. Sound Sound is the propagating wave formed by changes of the air pressure. The sound wave reaching the recorder is captured and converted to changes of electrical signals over time. The little grey circles in this diagram represent air molecules. The closer together they are, the higher the air pressure. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 4 reserved. Sound The changes in air pressure that propagate with the wave can be detected by a microphone and digitized. The result is a graph called a waveform that shows the pressure as a function of time. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 5 reserved. Sound This video does a good job of showing how sound waves correspond to changes in air pressure caused by a vibrating object, and how a waveform is created (through a digitization process) to represent the sound wave: https://www.youtube.com/watch?v=Z5Dd8lmbcoE Sound Be careful when interpreting a waveform graph. Do not interpret the waveform as a representation of the vertical position of air molecules. The air molecules are not moving up and down! 7 Sound Be careful when interpreting a waveform graph. Do not interpret the waveform as a representation of the vertical position of air molecules. The air molecules are not moving up and down! Higher points in the waveform correspond to higher pressure, whereas lower points correspond to lower pressure. high pressure low pressure 8 Sound: Frequency Frequency refers to the number of complete back-and-forth cycles of vibrational motion per unit of time. When we hear a sound, the pitch we perceive is proportional to the frequency. High-pitched sounds (such as a birds chirping or a kettle whistling) correspond to high-frequency sound waves, and low-pitched sounds (such as a jet flying by) correspond to low-frequency sounds. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 9 reserved. Sound: Frequency The unit for frequency is Hertz, expressed as Hz. 1 Hz = 1 cycle/second = 1/s In this context, a cycle refers to one complete up and down motion of the waveform, or one high-to-low pressure (or vice versa) transition of air molecules. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 10 reserved. Sound: Frequency a cycle a cycle © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 11 reserved. Sound: Frequency a cycle a cycle 1 second Frequency = 2 cycles/second = 2 Hz © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 12 reserved. Sound: Frequency 1 second a cycle a cycle a cycle a cycle © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 13 reserved. Sound: Frequency 1 second a cycle a cycle a cycle a cycle Frequency = 4 cycles/second = 4 Hz Higher frequency than the previous waveform. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 14 reserved. Sound: Frequency and Pitch Pitch and frequency are related: Higher frequency => higher pitch The human ear can hear sounds ranging from 20 Hz to 20,000 Hz, but is most sensitive to frequencies between 2000 Hz and 5000 Hz. (Can you guess why?) © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 15 reserved. Sound: Frequency and Pitch This video shows what the wave looks like for every frequency in the range of human hearing. Warning: it can get a bit painful to listen to at some frequencies, so please turn down the volume on your headphones! https://www.youtube.com/watch?v=qNf9nzvnd1k Sound Intensity and Loudness Loudness is a subjective perception of how loud a sound is as measured by a human listener. The perceived loudness of a sound is roughly proportional to the intensity of the sound wave. Intensity is the objective difference between the air pressure at the peak of the wave compared to the valley of the wave. However, the relationship between intensity and loudness is not so simple. For example, you probably noticed that in the video from the previous slide, some frequencies sounded louder than others. This shows us that our perception of loudness is related to frequency also. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 17 reserved. Sound Intensity and Loudness Sound intensity, on the other hand, is an objective measurement which is not related to our ears or the way our brains interpret the signals. Intensity can be measured by devices, and the unit of measurement is decibels, or dB. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 18 reserved. Sound Intensity and Loudness Many audio-editing programs represent the intensity of sounds waves using decibels, so you'll come across this unit frequently. Decibels measures relative difference between two absolute intensities: I and Iref The intensities are measured in W/m2 or power per meter squared. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 19 reserved. Sound Intensity and Loudness Many audio-editing programs represent the intensity of sounds waves using decibels, so you'll come across this unit frequently. Decibels measures relative difference between two absolute intensities: I and Iref The intensities are measured in W/m2 or power per meter squared. Iref is the reference intensity, used to compare all other intensities. We often use Iref = 10- 12 W/m 2 which is the quietest sound a human can hear. (this is called the threshold of hearing.) 20 Decibels The following formula is used to calculate the decibels of an intensity I1 based on a reference intensity Iref I1 Number of decibels = 10 log I ref V1 = 20 log V ref © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 21 reserved. Decibels Example: When I1 = 2 x Iref I1 Number of decibels = 10 log I ref = 10 log (2) 10 0.3 = 3 dB © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 22 reserved. Decibels Example: When I1 = 2 x Iref I1 Number of decibels = 10 log I ref Note that decibels is not an = 10 log (2) absolute measurement, but a 10 0.3 comparison to a reference (Iref) = 3 dB © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 23 reserved. Loudness Examine the diagram presented in the following link. How should we interpret this curve? Think about the frequency/pitch video you watched earlier and how some frequencies sounded louder than others. http://hyperphysics.phy-astr.gsu.edu/hbase/Sound/eqloud.html#c1 Complex Waveforms So far, we've only been looking at waveforms like the one above that are very simply and correspond to a very clear, pure sound. But in real life, the sounds we hear are mixtures of many different frequencies. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 25 reserved. Complex Waveforms A single frequency A second single frequency, higher than the first Added together, they make a more complex waveform, A more complex sound © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 26 reserved. Waveform Example A waveform of the spoken word "one" © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 27 reserved. Waveform Example Let's zoom in to take a closer look © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 28 reserved. Waveform Example A closer look © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 29 reserved. Waveform Example This complicated, bumpy waveform is the result of many different frequencies added together. The large bumps correspond to low frequencies and the smaller, rougher bumps correspond to higher frequencies. A closer look © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 30 reserved. Waveform Example This complicated, bumpy waveform is the result of many different frequencies Note that this concept added together. The of combining waves of large bumps different frequency is correspond to low similar to what frequencies and the happens in JPEG smaller, rougher compression... but in bumps correspond to two dimensions higher frequencies. instead of one. A closer look © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 31 reserved. 276–280 Complex Waveforms Sounds change over time e.g. musical note has attack and decay, speech changes constantly This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen 32 276–280 Complex Waveforms Sounds change over time e.g. musical note has attack and decay, speech changes constantly Frequency spectrum alters as sound changes This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen Intensity as a function of frequency and time for the spoken words “nineteenth century” (Wikipedia) 33 276–280 Complex Waveforms A waveform allows us to see how intensity changes with time. It provides a graphical view of characteristics of a changing sound, and represents the rhythm of music, quietness or loudness of a passage, etc. This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen 34 Digitizing Sound Suppose we want to digitize this sound wave: © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 35 reserved. Step 1. Sampling The sound wave is sampled at a specific rate into discrete samples of amplitude values. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 36 reserved. Step 1. Sampling The sound wave is sampled at a specific rate into discrete samples of amplitude values. Suppose we sample the waveform 10 times a second, i.e., sampleing rate = 10 Hz. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 37 reserved. Step 1. Sampling The sound wave is sampled at a specific rate into discrete samples of amplitude values. Suppose we sample the waveform 10 times a second, i.e., sampleing rate = 10 Hz. We get 10 samples per second. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 38 reserved. Step 1. Sampling The sound wave is sampled at a specific rate into discrete samples of amplitude values. Reconstructing the waveform using the discrete sample points. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 39 reserved. Step 1. Sampling What if we sample 20 times a second, i.e., sampling rate = 20 Hz? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 40 reserved. Step 1. Sampling What if we sample 20 times a second, i.e., sampling rate = 20 Hz? We get 20 samples per second. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 41 reserved. Step 1. Sampling What if we sample 20 times a second, i.e., sampling rate = 20 Hz? Reconstructing the waveform using the discrete sample points. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 42 reserved. Step 1. Sampling original waveform sampling rate = 10 Hz sampling rate = 20 Hz © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 43 reserved. Step 1. Sampling With a higher sampling rate, the reconstructed wave sounds more like the original wave. But since there are more sample points, and the file size will be larger. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 44 reserved. Sampling Rate Examples 11,025 Hz AM Radio Quality/Speech 22,050 Hz Near FM Radio Quality (high-end multimedia) 44,100 Hz CD Quality 48,000 Hz DAT (digital audio tape) Quality 96,000 Hz DVD-Audio Quality 192,000 Hz DVD-Audio Quality © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 45 reserved. Sampling Rate vs. Sound Frequency Both uses the unit Hz BUT: sampling rate sound frequency © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 46 reserved. Sampling Rate vs. Sound Frequency Both uses the unit Hz BUT: sampling rate sound frequency Sample rate: a characteristic of the digitization process Sound frequency: the pitch characteristic of sound © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 47 reserved. Sampling Rate vs. Sound Frequency Both uses the unit Hz BUT: sampling rate sound frequency Sample rate: a characteristic of the digitization process Sound frequency: the pitch characteristic of sound While these concepts are different, the maximum sound frequency is dependent on the sample rate. Can you guess how? 48 Nyquist Theorem We must sample at least 2 points in each wave cycle to be able to reconstruct the sound wave satisfactorily. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 49 reserved. Nyquist Theorem We must sample at least 2 points in each wave cycle to be able to reconstruct the sound wave satisfactorily. Nyquist rate: using a sample rate twice the maximum audio frequency © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 50 reserved. Choosing Sampling Rate Given the human hearing range (20 Hz to 20,000 Hz) and the Nyquist Theorem, what do you think would be an adequate sampling rate for audio? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 51 reserved. Choosing Sampling Rate Note that CD audio uses a sample rate 44,100 Hz. Why do you think this sample rate was chosen? Why are higher sample rates sometimes used (such as 48,000 Hz for digital audio tape) if 44,100 Hz is adequate to capture all frequencies that humans can hear? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 52 reserved. Digitization Step 2. Quantization After all the samples have been obtained, each sample is mapped and rounded to the nearest value on a scale of discrete levels. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 53 reserved. Digitization Step 2. Quantization After all the samples have been obtained, each sample is mapped and rounded to the nearest value on a scale of discrete levels. CD-quality audio has 65,536 possible levels. (16-bit audio) Bit depth of a digital audio is also referred to as resolution. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 54 reserved. Digitization Step 2. Quantization After all the samples have been obtained, each sample is mapped and rounded to the nearest value on a scale of discrete levels. CD-quality audio has 65,536 possible levels. (16-bit audio) Bit depth of a digital audio is also referred to as resolution. How does this differ from the terminology used for images? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 55 reserved. Audio Quantization Suppose we are quantizing the samples using 3 bits (i.e. 23 = 8 levels). © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 56 reserved. Audio Quantization Suppose we are quantizing the samples using 3 bits (i.e. 23 = 8 levels). © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 57 reserved. Audio Quantization Now, round each sample to the nearest level. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 58 reserved. Audio Quantization Now, reconstruct the waveform using the quantized samples. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 59 reserved. Audio Quantization Data with different original amplitudes may be quantized onto the same level loss of subtle differences of samples © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 60 reserved. Audio Quantization Data with different original amplitudes may be quantized onto the same level loss of subtle differences of samples With lower bit depth, samples with larger differences may also be quantized onto the same level. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 61 reserved. Audio Quantization Data with different original amplitudes may be quantized onto the same level loss of subtle differences of samples With lower bit depth, samples with larger differences may also be quantized onto the same level. How does this affect the sound of the audio? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 62 reserved. Audio Quantization and Sampling This video does a great job of demonstrating the audible effects of low sampling rate and high quantization: https://www.youtube.com/watch?v=UaKho805vCE Audio Quantization Some common resolutions used for audio: 8-bit usually sufficient for speech in general, too low for music 16-bit minimal bit depth for music © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 64 reserved. Dynamic Range The range of the scale, from the lowest to highest possible quantization values In the previous example: 65 66 67 287–288 Clipping If recording sensitivity is set too high, signal amplitude will exceed the maximum that can be recorded, leading to unpleasant This presentation (c) 2004, MacAvon Media distortion. The waveform Productions. Modified by Mingwu Chen essentially "doesn't fit" within its allowed range, so the tops and bottoms of the wave get cut off. 68 287–288 Clipping If recording sensitivity is set too high, signal amplitude will exceed the maximum that can be recorded, leading to unpleasant This presentation (c) 2004, MacAvon Media distortion. The waveform Productions. Modified by Mingwu Chen essentially "doesn't fit" within its allowed range, so the tops and bottoms of the wave get cut off. Here we see that the recording was so loud that the peaks and valleys of the wave were cut off. This will sound unpleasant. 69 287–288 Clipping Example of clipping: https://www.youtube.com/watch?v=iClhueWESFU This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen (just watch from 1:40 to 3:20) 70 Stereo Our two ears help us localize sounds in the space around us. Two channels of audio, played through two different speakers, are used to create the illusion of depth and space. Stereo Our two ears help us localize sounds in the space around us. Two channels of audio, played through two different speakers, are used to create the illusion of depth and space. Note that stereo audio, while sounding more realistic due to its 3D nature, has consequences for the file size since two waveforms must be saved instead of one. Audio Digitization Let's figure out the file size for a typical digital audio recording. Suppose we have a digitized recording of 60 seconds with the following digitization parameters: Sampling rate = 44100 Hz (or 44,100 samples/second) Bit depth = 16 bit (or 16 bits/sample) Stereo (2 channels, left and right) © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 73 reserved. Audio Digitization Total number of samples: = 60 seconds 44,100 samples/second = 2,646,000 samples Total number of bits required for these many samples: = 2,646,000 samples 16 bits/sample = 42,336,000 bits This is for one channel Total bits for two channels = 42,336,000 bits/channel 2 channels = 84,672,000 bits © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 74 reserved. Audio Digitization Total number of samples: = 60 seconds 44,100 samples/second = 2,646,000 samples Total number of bits required for these many samples: = 2,646,000 samples 16 bits/sample = 42,336,000 bits This is for one channel Total bits for two channels = 42,336,000 bits/channel 2 channels = 84,672,000 bits / 8 bytes/bit / 1024 bytes/KiB / 1024 KiB/MiB = 10 MiB That's 10 MiB for only 1 minute of audio! This is generally not considered to be acceptable for transmission of audio over the web. 75 Audio Digitization Some basic ways to reduce audio file size: Reduce sampling rate Reduce bit depth Apply compression Some other options: reducing the number of channels shorten the length of the audio © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 76 reserved. Audio File Compression As with images, compression is necessary to quickly transmit and stream audio. Lossless: Many lossless algorithms could work. Can you think of some examples based on what we learned about image compression? Lossy Gets rid of some data permanently and destructively, but human perception is taken into consideration and only the least noticeable data is removed. MP3 provides a good compression rate while preserving the perception of high sound quality © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 77 reserved. Perceptually-Based Compression Identify and discard data that doesn't affect the perception of the signal. Needs a psycho-acoustical model, since the ear and brain do not respond This presentation (c) 2004, MacAvon Media to sound waves in a simple way. We need a mathematical model of this Productions. Modified by Mingwu Chen response. Two main techniques: Threshold of hearing – compress sounds that are too quiet to hear more aggressively Masking – Compress sounds that are obscured by other sounds 78 The Threshold of Hearing This graph shows the intensity of the quietest sound that can be heard at each frequency. As you can see, the high and low frequencies must be This presentation (c) 2004, MacAvon Media much more to be heard, as Productions. Modified by Mingwu Chen our ears are not as sensitive to those frequencies. The Threshold of Hearing This graph shows the intensity of the quietest sound that can be heard at each frequency. As you can see, the high and low frequencies must be This presentation (c) 2004, MacAvon Media much more to be heard, as Productions. Modified by Mingwu Chen our ears are not as sensitive to those frequencies. Within these frequency and intensity ranges, sounds can be compressed more aggressively without any noticeable impact on the sound quality. 300 Masking This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen The sound tones in the neighborhood of a loud tone need higher levels to be heard. 81 300 Masking sounds in this frequency range may be more difficult to hear This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen The sound tones in the neighborhood of a loud tone need higher levels to be heard. 82 300 Masking Therefor, we can compress these sounds more aggressively without a noticeable loss of quality! sounds in this frequency range may be more difficult to hear This presentation (c) 2004, MacAvon Media Productions. Modified by Mingwu Chen The sound tones in the neighborhood of a loud tone need higher levels to be heard. 83 Choosing an Audio File Type Are there limitations on file size? Who is the intended audience? Will this be a source/master file, from which other works will be derived? Is your audio used on the Web? Use file types that offer high compression (such as MP3) Audio sharing services (such as Soundcloud, Bandcamp, etc.) will automatically recode your files for streaming. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 84 reserved. Source/Master Files If you are keeping the file for future editing, choose a file type that is: uncompressed, or uses lossless compression Why do you think this is important? © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 85 reserved. Audio File Types.wav (Windows) Compressed (lossless) or uncompressed One of the HTML5 audio formats Plays in web browsers that support the.wav format of HTML5 audio (Firefox, Safari, Chrome, and Opera) © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Audio File Types.mp3 (Cross-platform) MPEG audio layer 3 Variable compression rate (lossy) with high perceived quality of sound One of the HTML5 audio formats Plays in Web browsers that don’t support the.wav format of HTML5 audio © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Audio File Types.aiff (Mac, Windows) Audio Interchange File Format Created by Apple compressed or uncompressed © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. MIDI For some types of audio such as music, there are other ways of representing the audio other than digitization of a recording. One way is MIDI: Musical Instruments Digital Interface This presentation (c) 2004, MacAvon Media MIDI is a standard protocol for communicating with electronic Productions. Modified by Mingwu Chen instruments (synthesizers, samplers, drum machines) and recording performances. (Not recording the sound waves, but recording the key presses, drum hits, etc.) 89 MIDI Like digital sheet music Contains instructions for recreating the music Created by editing music notations and instrument assignments Can also be created by recording your performance on a MIDI keyboard connected to a computer Can also be created by simply using a mouse and a music editing program © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 90 reserved. 304 MIDI Computers can convert MIDI instructions into music by reading the notes and other instructions encoded in the MIDI data and playing the music with virtual instruments. The virtual instruments can use synthesized sounds or play audio files of music notes recorded from This presentation (c) 2004, MacAvon Media actual instruments. Productions. Modified by Mingwu Chen The computer becomes a musical instrument! How are the qualities of music played by a computer different than the quality of music played by a human? 91 Loop Music Music that is created from short music clips that are repeated Libraries of clips for loop music are commercially available You can also record your own clips and compose them into loops, and even create multiple layers of loops. Sometimes this is done during live performances using special hardware that records the musician's performance and then loops it, allowing the musician to accompany themselves. © 2016 Pearson Education, Inc., Hoboken, NJ. All rights 92 reserved. Creating Digital Audio Digital audio artists and musicians create music and other creative audio compositions by combining all these methods. Digital recordings, MIDI, and loops can be combined using special software to create amazing things! Some programs for creating and editing digital audio that I use and recommend: - Logic Pro X (Mac): loops, MIDI, recording - Garage Band (Mac): Free software that works like a stripped-down version of Logic Pro X - Adobe Audition (Mac and Windows): loops, recording