Multimedia Networking Basics of Digital Audio PDF
Document Details
Uploaded by NicerMaple
Carleton University
Tags
Related
- IVS-C01-multimedia_applications_over_internet_v1.2.pdf
- Lecture 3 Networking Fundamentals Servers and Clients PDF
- PMMK Slides Combined Fall 2024 PDF Protocols for Multimedia Communications
- Multicast Lecture 5 - Delivery of Multimedia Services PDF
- Wireless Mobile & Multimedia Networking Lecture 2 PDF
- Ad-Hoc Networks 3 PDF
Summary
This document is a chapter on multimedia networking, focusing on the basics of digital audio. It discusses the concept of sound as a wave phenomenon and explores how digital audio is created through sampling and quantization. The Nyquist theorem is also introduced.
Full Transcript
Multimedia Networking Basics of Digital Audio BIT - NET 4007 Slide 1 What is Sound Sound is a wave phenomenon like light, but is macroscopic and involves molecules of air being compressed and expanded under the action of...
Multimedia Networking Basics of Digital Audio BIT - NET 4007 Slide 1 What is Sound Sound is a wave phenomenon like light, but is macroscopic and involves molecules of air being compressed and expanded under the action of some physical device. For example, a speaker in an audio system vibrates back and forth and produces a longitudinal pressure wave that we perceive as sound. Since sound is a pressure wave, it takes on continuous values, as opposed to digitized ones. Even though such pressure waves are longitudinal, they still have ordinary wave properties and behaviors, such as reflection (bouncing), refraction (change of angle when entering a medium with a different density) and diffraction (bending around an obstacle). If we wish to use a digital version of sound waves we must form digitized representations of audio information. BIT - NET 4007 Slide 2 Digitization Digitization means conversion to a stream of numbers, and preferably these numbers should be integers for efficiency. An analog signal: continues measurement of pressure wave BIT - NET 4007 Slide 3 Digitization The graph in the above figure has to be made digital in both time and amplitude. To digitize, the signal must be sampled in each dimension: in time, and in amplitude. Sampling means measuring the quantity we are interested in, usually at evenly-spaced intervals. The first kind of sampling, using measurements only at evenly spaced time intervals, is simply called, sampling. The rate at which it is performed is called the sampling frequency. For audio, typical sampling rates are from 8 kHz (8,000 samples per second) to 48 kHz. This range is determined by Nyquist theorem discussed later. Sampling in the amplitude or voltage dimension is called quantization. BIT - NET 4007 Slide 4 Digitization BIT - NET 4007 Slide 5 Digitization Thus to decide how to digitize audio data we need to answer the following questions: What is the sampling rate? How finely is the data to be quantized, and is quantization uniform? How is audio data formatted? (file format) BIT - NET 4007 Slide 6 Nyquist Theorem The Nyquist theorem states how frequently we must sample in time to be able to recover the original sound. The following figure shows a single sinusoid: it is a single, pure, frequency (only electronic instruments can create such sounds). If sampling rate just equals the actual frequency, the following figure shows that a false signal is detected: it is simply a constant, with zero frequency. BIT - NET 4007 Slide 7 Nyquist Theorem Now if sample at 1.5 times the actual frequency, the following figure shows that we obtain an incorrect (alias) frequency that is lower than the correct one - it is half the correct one (the wavelength, from peak to peak, is double that of the actual signal). BIT - NET 4007 Slide 8 Nyquist Theorem Thus for correct sampling we must use a sampling rate equal to at least twice the maximum frequency content in the signal. This rate is called the Nyquist rate. Nyquist Theorem: If a signal is band-limited, i.e., there is a lower limit f1 and an upper limit f2 of frequency components in the signal, then the sampling rate should be at least 2(f2 − f1). Nyquist frequency: half of the Nyquist rate. BIT - NET 4007 Slide 9 Quantization ◼ Quantization is the representation of amplitudes by a certain value (step). ◼ Samples are rounded up or down to the closer step. ◼ Rounding introduces inexactness (quantization noise). BIT - NET 4007 Slide 10 Signal to Quantization Noise Ratio (SQNR) Aside from any noise that may have been present in the original analog signal, there is also an additional error that results from quantization. If voltages are actually in 0 to 1 but we have only 8 bits in which to store values, then effectively we force all continuous values of voltage into only 256 different values. This introduces a roundoff error. It is not really “noise”. Nevertheless it is called quantization noise (or quantization error). The quality of the quantization is characterized by the Signal to Quantization Noise Ratio (SQNR). Quantization noise: the difference between the actual value of the analog signal, for the particular sampling time, and the nearest quantization interval value. At most, this error can be as much as half of the interval. BIT - NET 4007 Slide 11 Signal to Noise Ratio (SNR) The ratio of the power of the correct signal and the noise is called the signal to noise ratio (SNR) - a measure of the quality of the signal. The SNR is usually measured in decibels (dB), where 1 dB is a tenth of a bel. The SNR value, in units of dB, is defined in terms of base-10 logarithms of squared voltages, as follows: The power in a signal is proportional to the square of the voltage. For example, if the signal voltage Vsignal is 10 times the noise, then the SNR is 20 * log10(10)=20dB. In terms of power, if the power from ten violins is ten times that from one violin playing, then the ratio of power is 10dB. dBm is an abbreviation for the power ratio in (dB) of the measured power referenced to one milliwatt (mW). 3 dBm is about 2 mW. BIT - NET 4007 Slide 12 Signal to Quantization Noise Ratio (SQNR) For a quantization accuracy of N bits per sample, the peak signal and peak quantization noise ratio (PSQNR) is: BIT - NET 4007 Slide 13 Linear and Non-linear Quantization Linear format: samples are typically stored as uniformly quantized values. Non-uniform quantization: set up more finely-spaced levels where humans hear with the most acuity. Weber's Law stated formally says that equally perceived differences have values proportional to absolute levels: Nonlinear quantization works by first transforming an analog signal from the raw s space into the theoretical r space, and then uniformly quantizing the resulting values. Such a law for audio is called -law encoding, (or u-law). A very similar rule, called A-law, is used in telephony in Europe. BIT - NET 4007 Slide 14 Non-linear Quantization ◼ Logarithmic quantization provides uniform SNR for all signals: Provides higher granularity for lower signals Corresponds to the logarithmic behavior of the human ear BIT - NET 4006 Slide 15 Audio Quality vs. Data Rate The uncompressed data rate increases as more bits are used for quantization. Stereo: double the bandwidth. to transmit a digital audio signal. BIT - NET 4007 Slide 16 Quantization and Transmission of Audio Coding of Audio: Quantization and transformation of data are collectively known as coding of the data. For audio, the -law technique for audio signals is usually combined with an algorithm that exploits the temporal redundancy present in audio signals. Differences in signals between the present and a past time can reduce the size of signal values and also concentrate the histogram of pixel values (differences, now) into a much smaller range. The result of reducing the variance of values is that lossless compression methods produce a bitstream with shorter bit lengths for more likely values (expanded discussion later). In general, producing quantized sampled output for audio is called PCM (Pulse Code Modulation). The differences version is called DPCM (and a crude but efficient variant is called DM). The adaptive version is called ADPCM. BIT - NET 4007 Slide 17 Pulse Code Modulation (PCM) Assuming a bandwidth for speech from about 50 Hz to about 10 kHz, the Nyquist rate would dictate a sampling rate of 20 kHz. If each sample is represented by 8 bits, the bit-rate for mono speech is 160 kbps. However, the standard approach to telephony in fact assumes that the highest-frequency audio signal we want to reproduce is only about 4 kHz. Therefore, the bit-rate is 64 kbps. Since only sounds up to 4 kHz are to be considered, we should remove high and low frequencies from the analogy input signal. This is done using a band-limiting filter. A discontinuous signal contains not just frequency components due to the original signal, but also higher-frequency components. Therefore, the output of the digital-to-analog converter goes to a low-pass filter to remove the high-frequency components. BIT - NET 4007 Slide 18 Differential Coding of Audio Audio is often stored not in simple PCM but instead in a form that exploits differences - which are generally smaller numbers, so offer the possibility of using fewer bits to store. So if we then go on to assign bit-string codewords to differences, we can assign short codes to prevalent values and long codewords to rarely occurring ones. BIT - NET 4007 Slide 19 Lossless Predictive Coding Predictive coding: simply means transmitting differences - predict the next sample as being equal to the current sample; send not the sample itself but the difference. Predictive coding consists of finding differences, and transmitting these using a PCM system. Note that differences of integers will be integers. Denote the integer input signal as the set of values. Then we predict values as simply the previous value, and define the error en as the difference between the actual and the predicted signal: BIT - NET 4007 Slide 20 Lossless Predictive Coding But it is often the case that some function of a few of the previous values provides a better prediction. Typically, a linear predictor function is used. BIT - NET 4007 Slide 21 Lossless Predictive Coding The idea of forming differences is to make the histogram of sample values more peaked. For example, Fig.6.15(a) plots 1 second of sampled speech at 8 kHz, with magnitude resolution of 8 bits per sample. A histogram of these values is actually centered around zero, as in Fig. 6.15(b). Fig. 6.15(c) shows the histogram for corresponding speech signal differences: difference values are much more clustered around zero than are sample values themselves. As a result, a method that assigns short codewords to frequently occurring symbols will assign a short code to zero and do rather well: such a coding scheme will much more efficiently code sample differences than samples themselves. BIT - NET 4007 Slide 22 Lossless Predictive Coding BIT - NET 4007 Slide 23 Lossless Predictive Coding One problem: suppose our integer sample values are in the range of 0..255. Then differences could be as much as -255..255 – we’ve increased our dynamic range (ratio of maximum to minimum) by a factor of two! We need more bits to transmit some difference. A clever solution for this: define two new codes, denoted SU (Shift-Up) and SD (Shift-Down). Then we can use codewords for only a limited set of signal differences, say only the range −15..16. Differences which lie in the limited range can be coded as is, but with the extra two values for SU, SD, a value outside the range −15..16 can be transmitted as a series of shifts, followed by a value that is indeed inside the range −15..16. For example, 100 is transmitted as: SU, SU, SU, 4, where (the codes for) SU and for 4 are what are transmitted (or stored). BIT - NET 4007 Slide 24 Lossless Predictive Coding Lossless Predictive coding: the decoder produces the same signal as the original. As a simple example, suppose we devise a predictor for as follows: BIT - NET 4007 Slide 25 Lossless Predictive Coding Let's consider an explicit example. Suppose we wish to code the sequence f1; f2; f3; f4; f5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we'll invent an extra signal value f0, equal to f1 = 21, and first transmit this initial value, uncoded: BIT - NET 4007 Slide 26 Lossless Predictive Coding The error does center around zero, we see, and coding (assigning bit- string codewords) will be efficient. The following figure shows a typical schematic diagram used to encapsulate this type of system. BIT - NET 4007 Slide 27 Differential PCM Differential PCM is exactly the same as Predictive Coding, except that it incorporates a quantizer step. ˆ ~ f n - the original signal, n - the predicted signal, and f n - reconstructed signal f BIT - NET 4007 Slide 28 Distortion The distortion is the average squared error. N 1 ~ D= N n n ( f n =1 − f ) 2 BIT - NET 4007 Slide 29 ADPCM ADPCM (Adaptive DPCM) takes the idea of adapting the coder to suit the input much farther. The two pieces that make up a DPCM coder: the quantizer and the predictor. In DPCM, we can change the step size as well as decision boundaries, using a non-uniform quantizer. We can carry this out in two ways. Forward adaptive quantization: use the properties of the input signal. Backward adaptive quantization: use the properties of the quantized output. If quantized errors become too large, we should change the non- uniform quantizer. BIT - NET 4007 Slide 30 ADPCM We can also adapt the predictor, again using forward or backward adaptation. Making the predictor coefficients adaptive is called adaptive predictive coding (APC). Recall that the predictor is usually ~ taken to be a linear function of previous reconstructed quantized values, f n. The number of previous values used is called the “order” of the predictor. For example, if we use M previous values, we need M coefficients, ai, i = 1, 2,..., M in a predictor. BIT - NET 4007 Slide 31 ADPCM However we can get into a difficult situation if we try to change the prediction coefficients, that multiply previous quantized values, because that makes a complicated set of equations to solve for these coefficients. Suppose we decide to use a least-squares approach to solving a minimization trying to find the best values of the ai: But because fˆn depends on the quantization, we have a difficult problem to solve. Instead, one usually resorts to solving the simpler problem that results from using the signal f n itself. BIT - NET 4007 Slide 32 NET 4007 Multimedia Networking Lossless and Lossy Compression BIT - NET 4007 Slide 33 Compression The process of coding that will effectively reduce the total number of bits needed to represent certain information. If the compression and decompression processes induce no information loss, then the compression scheme is lossless; otherwise, it is lossy. B0 – number of bits before compression; B1 – number of bit after compression. BIT - NET 4007 Slide 34 Why Compression ◼ Advanced compression algorithms enable multimedia to become killer applications over networks ◼ Voice: 64 Kbps ➔ 3 Kbps: 20 times ◼ Audio: 1.4 Mbps ➔ 14 Kbps: 100 times ◼ Video: 100 Mbps ➔ 20 Kbps: 5000 times 35 Quiz ◼ Can we transmit one movie using just 1 bit in the future? ◼ We need a little bit knowledge of Information Theory to answer this question. ◼ Information Theory is the fundamental theory for today’s information technologies, including multimedia, the Internet, wireless communications, and etc. 36 What is Information ◼ Which of the following has more information? a. Today is Thu. b. There will be an earthquake tonight. ◼ Which one has smaller probability? a. Today is Thu. b. There will be an earthquake tonight. 37 Basics of Information Theory ◼ Intuitively, information is related to probability ◼ The smaller probability, the more information 1 Informatio n ( ) = Probability ( p) ◼ In engineering, we usually use logarithmic measure: 1 = log 2 p Binary, 0 or 1 38 Basics of Information Theory (cont’d) The famous Information theory was developed by Dr. C.E. Shannon around 1948 The entropy of an information source with alphabet S = {s1, s2, …, sn} is: pi – probability that symbol si will occur in S. log 2 (1 pi ) -- indicates the amount of information (self-information as defined by Shannon) contained in si, which corresponds to the number of bits needed to encode si. Dr. Shannon told us that you cannot compress one information source, such as a video, below it’s entropy, which is usually greater than one 39 Logarithmic Measure C.E. Shannon, “A mathematical theory of communication,” the Bell System Technical Journal, vol. 27, July, 1948. 1. It is practically more useful. Parameters of engineering importance such as time, bandwidth, number of relays, etc., tend to vary linearly with the logarithm of the number of possibilities. 2. It is nearer to our intuitive feeling as to the proper measure. This is closely related to the above reason since we intuitively measures entities by linear comparison with common standards. For example, adding one relay to a group doubles the number of possible states of the relays; two identical channels twice the capacity of the one for transmitting information. 3. It is mathematically more suitable. BIT - NET 4007 Slide 40 Entropy and Code Length In the Equation, the entropy is a weighted-sum of term log 2 (1 pi ); hence it represents the average amount of information contained per symbol in the source S. The entropy specifies the lower bound for the average number of bits to code each symbol in S, i.e., -- the average length (measured in bits) of the codewords produced by the encoder. BIT - NET 4007 Slide 41 Shannon-Fano Algorithm Shannon-Fano Algorithm – a top-down approach Sort the symbols according to the frequency count of their occurrences Recursively divide the symbols into two parts, each with approximately the same number of counts, until all parts contain only one symbol. An example: coding of “HELLO” BIT - NET 4007 Slide 42 Shannon-Fano Algorithm A Coding tree for HELLO by Shannon-Fano Algorithm Result of performing Shannon-Fano on HELLO BIT - NET 4007 Slide 43 Entropy of HELLO BIT - NET 4007 Slide 44 Huffman Coding Huffman Coding Algorithm – a bottom-up approach, used in JPEG and MPEG Initialization: Put all symbols on a list sorted according to their frequency counts From the list pick two symbols with the lowest frequency counts. Make a Huffman subtree that has these two symbols as child nodes and create a parent node. Repeat until the list has no symbol left. Assign the sum of the children’s frequency counts to the parent and put a new symbol from the list into the tree such that the order is maintained. Assign a codeword for each leaf based on the path from the root. BIT - NET 4007 Slide 45 Properties of Huffman Coding Unique Prefix Property: No Huffman code is a prefix of any other Huffman code – precludes any ambiguity in decoding. Optimality: minimum redundancy code – proved optimal for a given data model (i.e., a given, accurate, probability distribution): The two least frequent symbols will have the same length for their Huffman codes, differing only at the last bit. Symbols that occur more frequently will have shorter Huffman codes than symbols that occur less frequently. The average code length for an information source S is strictly less than +1 BIT - NET 4007 Slide 46 Dictionary-Based Coding The Lempel-Ziv-Welch (LZW) algorithm employs an adaptive, dictionary- based compression technique, used in GIF for images, V.42 bis for modems, UNIX compress. Unlike the variable-length coding, in which the lengths of the codewords are different, LZW uses fixed-length codewords to represent variable- length strings of symbols/characters that commonly occur together, e.g., words in English text. LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary. BIT - NET 4007 Slide 47 LZW Compression for string “ABABBABCABABBA” BIT - NET 4007 Slide 48 LZW Coding Implementation requires some practical limit for the dictionary size – For example, a maximum of 4,096 entries for GIF. In real applications, the code length l is kept in the range of [l0, lmax]. The dictionary initially has a size of 2 l0. When it is filled up, the code length will be increased by 1; this is allowed to repeat until l = lmax. When lmax is reached and the dictionary is filled up, it needs to be flushed (as in UNIX compress, or to have the LRU (least recently used) entries removed). BIT - NET 4007 Slide 49 Arithmetic Coding Arithmetic coding is a more modern coding method that usually out- performs Huffman coding. Huffman coding assigns each symbol a codeword which has an integral bit length. Arithmetic coding can treat the whole message as one unit. BIT - NET 4007 Slide 50 Run-Length Coding Memoryless Source: an information source that is independently distributed. Namely, the value of the current symbol does not depend on the values of the previously appeared symbols. Instead of assuming memoryless source, Run-Length Coding (RLC) exploits memory present in the information source. Rationale for RLC: if the information source has the property that symbols tend to form continuous groups, then such symbol and the length of the group can be coded. RLC is a very simple form of data compression. Runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. Used in fax machine, JPEG, ect. BIT - NET 4007 Slide 51 Run-Length Coding scan A single scan line of the above figure (B represents a black pixel and W represents white): wwwwwwwwwwBwwwwwwBBBwwwwwwwwww If we apply the RLC data compression algorithm, we get the following: 10wB6w3B10w Compression ratio = 30/11 = 3. BIT - NET 4007 Slide 52 Lossy Compression Lossless compression algorithms do not deliver compression ratios that are high enough. Hence, most multimedia compression algorithm are lossy. What is lossy compression? The compressed data is not the same as the original data, but a close approximation of it. Yields a much higher compression ratio than that of lossless compression. BIT - NET 4007 Slide 53 Quantization Reduce the number of distinct output values to a much smaller set. Main source of the “loss” in lossy compression. Three different forms of quantization. Uniform: midrise and midtread quantizers Non-uniform: companded quantizer Vector quantization BIT - NET 4007 Slide 54 Granular Distortion The quantization error caused by the quantizer is referred to as granular distortion. Signal-to-Quantization-Noise Ratio (SQNR) = 6.02 n (dB), where n is the number of bits of the quantizer. For a quantization accuracy of N bits per sample, the peak signal and peak noise ratio (PSQNR) is: Increasing one bit in the quantizer will increase the signal-to-quantization noise ratio by 6.02 dB, and therefore will increase the quality of the multimedia application. However, the data rate will also be increased. More bandwidth will be needed for the multimedia application. BIT - NET 4007 Slide 55 Vector Quantization (VQ) According to Shannon's original work on information theory, any compression system performs better if it operates on vectors or groups of samples rather than individual symbols or samples. Form vectors of input samples by simply concatenating a number of consecutive samples into a single vector. Instead of single reconstruction values as in scalar quantization, in VQ code vectors with n components are used. A collection of these code vectors from the codebook. Finding the appropriate codebook and searching for the closest code vector at the encoder may require considerable computational resources. However, the decoder can execute quickly. This is desirable in most multimedia applications. BIT - NET 4007 Slide 56 Vector Quantization (VQ) BIT - NET 4007 Slide 57 Transform Coding Rationale: If Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X. If most information is accurately described by the first few components of a transformed vector, then the remaining components can be coarsely quantized, or even set to zero, with little signal distortion. Discrete Cosine Transform (DCT) will be studied first. In addition, we will examine the Karhunen-Loeve Transform (KLT) which optimally decorrelates the components of the input X. BIT - NET 4007 Slide 58 NET 4007 Multimedia Networking Audio Compression Techniques BIT - NET 4007 Slide 59 ADPCM in Speech Coding ◼ ADPCM forms the heart of the ITU's speech compression standards G.721, G.723, G.726, and G.727. ◼ The difference between these standards involves the bit-rate (from 3 to 5 bits per sample) and some algorithm details. ◼ The default input is µ-law coded PCM 16-bit samples. BIT - NET 4007 Slide 60 ADPCM BIT - NET 4007 Slide 61 Backward Adaptive Quantizer ◼ Backward adaptive works in principle by noticing either of the cases: ◼ too many values are quantized to values far from zero -- would happen if quantizer step size were too small. ◼ too many values fall close to zero too much of the time -- would happen if the quantizer step size were too large. BIT - NET 4007 Slide 62 Vocoders ◼ The coders (encoding/decoding algorithms) we have studied so far could have been applied to any signals, not just speech. ◼ Vocoders -- voice coders, which cannot be usefully applied when other analog signals, such as modem signals, are in use. ◼ concerned with modeling speech so that the salient features are captured in as few bits as possible. ◼ use either a model of the speech waveform in time (LPC (Linear Predictive Coding) vocoding), or ◼ break down the signal into frequency components and model these (channel vocoders and formant vocoders). ◼ Vocoder simulation of the voice is not very good yet. Automated voice is strangely lacking in zest. BIT - NET 4007 Slide 63 Phase Insensitivity in Speech ◼ A complete reconstituting of speech waveform is really unnecessary, perceptually: all that is needed is for the amount of energy at any time to be about right, and the signal will sound about right. ◼ Phase is a shift in the time argument inside a function of time. ◼ Suppose we strike a piano key, and generate a roughly sinusoidal sound, cos(t ) with = 2f ◼ Now if we wait sufficient time to generate a phase shift / 2 and then strike another key, with sound cos(2t + / 2) , we generate a waveform like the solid line in Fig. 13.3. ◼ This waveform is the sum cos(t ) + cos(2t + / 2). BIT - NET 4007 Slide 64 Phase Insensitivity in Speech = 2f ◼ If we did not wait before striking the second note, then our waveform would be cos(t ) + cos(2t ). But perceptually, the two notes would sound the same sound, even though in actuality they would be shifted in phase. BIT - NET 4007 Slide 65 Channel Vocoder ◼ Vocoders can operate at low bit-rates, 1--2 kbps. To do so, a channel vocoder first applies a filter bank to separate out the different frequency components. Since only the energy is important, the filter bank derives relative power levels for each frequency range. ◼ A channel vocoder also analyzes the signal to determine the general pitch of the speech – low (bass), or high (tenor) – and also the excitation of the speech. Speech excitation is mainly concerned with weather a sound is voiced (e.g., a) or unvoiced (e.g., s, f). ◼ Because voiced sounds can be approximated by sinusoids, a periodic pulse generator recreates voiced sounds. Since unvoiced sounds are noise-like, a pseudo-noise generator is applied, and all values are scaled by the band-pass filter set. ◼ A channel vocoder can achieve an intelligent but synthetic voice using 2.4 kbps. ◼ Used in G.722. BIT - NET 4007 Slide 66 Channel Vocoder BIT - NET 4007 Slide 67 Formant Vocoder ◼ Not all frequencies present in speech are equally represented. Instead, only certain frequencies show up strongly, and others are weak. ◼ The important frequency peaks are called formants. ◼ A formant vocoder works by encoding only the most important frequencies. ◼ Formant vocoders can produce reasonably intelligible speech at only 1k bps. BIT - NET 4007 Slide 68 Linear Predictive Coding (LPC) ◼ LPC vocoders extract salient features of speech directly from the waveform, rather than transforming the signal to the frequency domain ◼ LPC Features: ◼ uses a time-varying model of vocal-tract sound generated from a given excitation. ◼ transmits only a set of parameters modeling the shape and excitation of the vocal-tract, not actual signals or differences ➔ small bit-rate. ◼ About “Linear”: The speech signal generated by the output vocal tract model is calculated as a function of the current speech output plus a second term linear in previous model coefficients. BIT - NET 4007 Slide 69 LPC Coding Process ◼ LPC starts by deciding whether the current segment is voiced or unvoiced: ◼ For unvoiced: a wide-band noise generator is used to create sample values f(n) that act as input to the vocal tract simulator. ◼ For voiced: a pulse train generator creates values f(n). ◼ Model parameters ai: calculated by using a least-squares set of equations that minimize the difference between the actual speech and the speech generated by the vocal tract model, excited by the noise or pulse train generators that capture speech parameters. BIT - NET 4007 Slide 70 Code Excited Linear Prediction (CELP) ◼ CELP is a more complex family of coders that attempts to mitigate the lack of quality of the simple LPC model. ◼ CELP uses a more complex description of the excitation: ◼ An entire set (a codebook) of excitation vectors is matched to the actual speech, and the index of the best match is sent to the receiver. ◼ The complexity increases the bit-rate to 4,800-9,600 bps. ◼ The resulting speech is perceived as being more similar and continuous. ◼ Quality achieved this way is sufficient for audio conferencing BIT - NET 4007 Slide 71 Algebraic Code Excited Linear Prediction (ACELP) ◼ Algebraic Code Excited Linear Prediction or ACELP is a speech encoding algorithm where a limited set of pulses is distributed as excitation to linear prediction filter. ◼ ACELP is a registered trademark of VoiceAge Corporation, incorporated in 1999. VoiceAge is solidly rooted in its founding members, Sipro Lab Telecom and the Universitéde Sherbrook. ◼ The main advantage of ACELP is that the algebraic codebook it uses can be made very large (> 50 bits) without running into storage (RAM/ROM) or complexity (CPU time) problems. ◼ The ACELP method is widely employed in current speech coding standards such as AMR, EFR, AMR-WB and ITU-T G-series standards G.729 and G.723.1. BIT - NET 4007 Slide 72 Adaptive Multi-Rate (AMR) ◼ AMR is a data compression scheme optimized for speech coding. AMR was adopted as the standard speech codec by 3GPP in October 1998 and is now widely used in GSM. It uses link adaptation to select from one of eight different bit rates based on link conditions. ◼ Sampling frequency 8 kHz/16-bit (160 samples for 20 ms frames). ◼ The bit rates 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kb/s are based on frames which contain 160 samples and are 20 milliseconds long. ◼ AMR utilizes Discontinuous Transmission (DTX), with Voice Activity Detection (VAD) and Comfort Noise Generation (CNG) to reduce bandwidth usage during silence periods. BIT - NET 4007 Slide 73 Voice Activity Detection (VAD) ◼ Voice activity detection or voice activity detector is an algorithm used in speech processing wherein, the presence or absence of human speech is detected from the audio samples. ◼ The main uses of VAD are in speech coding and speech recognition. A VAD may not just indicate the presence or absence of speech, but also whether the speech is voiced or unvoiced. BIT - NET 4007 Slide 74 Discontinuous transmission (DTX) ◼ Discontinuous transmission (DTX) is a method of momentarily powering-down, or muting, a mobile or portable wireless telephone set when there is no voice input to the set. ◼ In a typical two-way conversation, each individual speaks slightly less than half of the time. ◼ This conserves battery power, eases the workload of the components in the transmitter amplifiers, and reduces interference. ◼ A common misconception is that DTX improves capacity by freeing up TDMA time slots for use by other conversations. In practice, the unpredictable availability of time slots makes this difficult to implement. ◼ However, reducing interference is a significant component in how GSM and other TDMA based mobile phone systems make better use of the available spectrum. BIT - NET 4007 Slide 75 Comfort Noise ◼ Comfort noise (or comfort tone) is artificial background noise used in radio and wireless communications to fill the silence in a transmission resulting from voice activity detection or from the clarity of modern digital lines. ◼ In a communication channel, if transmission is stopped, because it's not speech, then the other side may assume that link has been cut. ◼ To counteract these effects, comfort noise is added, usually on the receiving end in wireless or VoIP systems, to fill in the silent portions of transmissions with artificial noise. The noise generated is at a low but audible volume level, and can vary based on the average volume level of received signals BIT - NET 4007 Slide 76 Adaptive Multi-Rate – WideBand (AMR-WB) ◼ Adaptive Multi Rate – WideBand (AMR-WB) is a speech coding standard developed after the AMR using same technology like ACELP. ◼ AMR-WB is codified as G.722.2, an ITU-T standard speech codec. ◼ AMR-WB operates like AMR with various bit rates. The bit rates are the following: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 and 23.85 kbit/s. ◼ AMR-WB is already standardized for future usage in networks such as UMTS. ◼ In October 2006, first AMR-WB tests have taken place in a deployed network by T-Mobile in Germany together with Ericsson. ◼ WIND Mobile in Canada launched HD Voice (AMR-WB) on its 3G+ network in February, 2011. BIT - NET 4007 Slide 77 NET 4007 Multimedia Networking Cisco VoIP Implementations BIT - NET 4007 Slide 78 Benefits of a VoIP Network ◼ More efficient use of bandwidth and equipment ◼ Lower transmission costs ◼ Consolidated network expenses ◼ Improved employee productivity through features provided by IP telephony: IP phones are complete business communication devices. Directory lookups and database applications (XML) Integration of telephony into any business application Software-based and wireless phones offer mobility. ▪ Access to new communications devices (such as PDAs and cable set-top boxes) BIT - NET 4007 Slide 79 Components of a VoIP Network BIT - NET 4007 Slide 80 Legacy Analog and VoIP Applications Can Coexist BIT - NET 4007 Slide 81