Summary

This document provides an overview of data compression techniques, including different schemes like pattern substitution, run-length encoding, and Lempel-Ziv-Welch. The document also covers lossy and lossless compression methods and information theory concepts.

Full Transcript

**Data Compression** **Why do we need data compression?** - **Storage Requirements** - Stored data may need manipulation and, hence, will have: - **Memory Requirements** - Intense processor activities - Communication speed. - **Exploring data redundancy → Compression.*...

**Data Compression** **Why do we need data compression?** - **Storage Requirements** - Stored data may need manipulation and, hence, will have: - **Memory Requirements** - Intense processor activities - Communication speed. - **Exploring data redundancy → Compression.** **What is data compression?** - **Definition:** - Reduction in physical size of information. - Simple, efficient digital representation of source data in as few bits as possible without loss of fidelity. - Source: Text, images, speech, audio, video. - **Other Names:** - Signal compression or signal coding. - **Components:** - **Compressor** → Reduces the size of data. - **Decompressor** → Reconstructs the original data. - **Compression/Decompression Tradeoff.** **Data Compression Schemes** - Commonly used schemes: - **Pattern substitution** - **Run-length encoding (RLE)** - **Lempel-Ziv-Welch (LZW)** - **Huffman encoding** - **Discrete Cosine Transform (DCT)** - **Motion Picture Experts Group (MPEG)** (H.261, H.264) - **Differential Pulse Code Modulation** (used in TIFF, PCX, PBM) **What portions of images are compressed?** - **Bitmap Files:** - Only the bitmap data is compressed. - Headers, color palette, footer, etc., remain unchanged. - **Vector Files:** - Image data is already compact. - Fast to read, but slow to reconstruct. - If compressed, the entire file is compressed. **Data Compression Terminology** - **Unencoded Data (Raw Data):** - Data before compression. - **Encoded Data (Compressed Data):** - Data after it has been compressed. - **Compression Ratio:** - Ratio of uncompressed data (B₀) to compressed data (B₁). - **Compression Ratio = B₀ / B₁.** **Types of Compression** 1. **Symmetric and Asymmetric Compression:** - **Symmetric Methods:** - Similar algorithms in both directions & same amount of work. - **Asymmetric Methods:** - More work in one direction than the other. - Example: Image Database. - Compression is usually more expensive. - Asymmetric algorithms are less common. **Encoder Schemes (Types)** 1. **Dictionary-based Encoders:** - Non-adaptive encoders - Adaptive encoders 2. **Non-dictionary-based Encoders:** - **Non-adaptive Encoders:** - Static dictionary - Prior statistics need to be known in advance - Work only on selected types of input data. - **Adaptive Encoder:** - No preconceived heuristics - Data independence - Adjusts to any type of data input/output. - **Semi-adaptive Encoder:** - Initial pass: Build dictionary - Second pass: Perform actual encoding. **Lossy and Lossless Compression** - **Lossless Compression:** - Information remains unchanged. - If the data is compressed and then decompressed, it will be identical to the original data. - **Lossy Compression:** - If the data is compressed and then decompressed, it will not be the same as the original data. **Classification: Entropy, Source, and Hybrid Coding\ ** 1. **Entropy Coding:** - Lossless process. - Used regardless of media characteristics. - The data stream is a simple digital sequence. 2. **Source Coding:** - Lossy coding. - Considers semantics of data. - Variable compression ratio. - Extensive use of specific medium characteristics. 3. **Hybrid Coding:** - Combination of entropy encoding and source encoding. - Involves both lossy and lossless encoding schemes. **Classification of Compression Techniques in Multimedia Systems** - **Entropy Coding:** - Run-Length Coding - Huffman Coding - LZW - Arithmetic Coding - **Source Coding:** - Prediction - DPCM (Differential Pulse Code Modulation) - Transformation (FFT, DCT) - **Layered Coding:** - Bit Position - Subsampling - Sub-band Coding - Vector Quantization - **Hybrid Coding:** - JPEG - MPEG - H.261 - DVI RTV, DVI PLV **Basics of Information Theory** - **Information Theory:** - Provides a lower limit on the size to which information can be reduced. - Based on the distribution of actual values present. - **Entropy (H):** - The number of bits required to code each symbol. - **Entropy formula:**H=−∑i=1npi⋅log⁡2(1pi)H=−i=1∑n​pi​⋅log2​(pi​1​) Where pipi​ is the probability of a symbol sisi​ occurring. - **Shannon\'s Law:** - The theoretical limit of compressibility is governed by Shannon\'s entropy formula. - **Example:** - If the probability of having symbol 'n' is 1/32, the amount of information associated with this character is 5 bits. - Thus, the character string 'nnn' will require 15 bits. - **Establishes the theoretical limit** to the number of bits per pixel required to represent an image. ![A screenshot of a computer program Description automatically generated](media/image2.png) **Pixel Packing** - **Lossless Data Compression:** - Efficient storage of data in contiguous bytes of memory. - Many bitmap formats use pixel packing. - **Example:** - Image data with 4096 4-bit pixels requires 4096 bytes of memory (half wasted). - By pixel packing, store two 4-bit pixels per byte → memory requirement drops from 4096 Bytes to 2048 Bytes. - **Memory-based display hardware** usually organizes image data as an array of bytes, each storing one pixel or less. - **Tradeoff:** - Faster to store and read only one 4-bit pixel per byte. **Lossless Data Compression: Pattern Substitution** - **Pattern Substitution:** - A simple form of statistical encoding. - Substitute a frequently repeating pattern with predefined codes. - Codes are smaller than the repeating symbols → compression. - **Process:** - Count the occurrence of similar patterns. - Sort them in descending order. - Assign symbols to the highest-count patterns. - A predefined symbol table may be used (assign code ii to token ii). **Lossless Data Compression: Simple Repetition Suppression** - **Repetition Suppression:** - Replace a series of tokens with: - The token itself - The count of its occurrences. - A special flag indicates the repeated token appears. - **Example:** - Replace:\ 89400000000000000000000000000000000\ with\ 894f32, where 'f' is the flag for zero. - **Compression Savings:** - Depends on the contents of data. **Applications of Repetition Suppression** - Suppression of zeros in a file (zero-length suppression). - Silence in audio data. - Pauses in conversation, etc. - Bitmaps (blanks/tabs in text or program source files). - Backgrounds in images. **Data Compression: Run-Length Encoding (RLE)** **Run-Length Encoding (RLE)** - **Supported by many bitmap file formats:** - BMP, PCX, TIFF, PDF. - **Suitability:** - Suitable for compressing any type of data, but compression ratios vary based on data type (e.g., palette-based vs. continuous tone images). - **Tradeoff:** - Compression ratio vs. implementation/execution. - **How it works:** - Works by reducing the physical size of repeating strings of characters (referred to as \"runs\"). - A repeating string, called a **run**, is typically encoded into two bytes: - **1st byte:** Run count (number of characters in the run). - **2nd byte:** Run value (value of the character in the run, ranging from 0--255). - The packet formed is called a **run packet** or **RLE packet**. **Run-Length Encoding (RLE) -- Example** - **Original Bit Stream:**\ AAAAAAAAAAAAAAA - **Without compression:** - Requires 15 \* 8 = 120 bits. - **With RLE encoding:** - The encoding would be: (count, value) packet → (15, A). - Requires only 1 packet. - Total bits: 1 \* 16 = 16 bits. - **Limitations:** - Cannot be applied blindly---may increase data size. - Requires markers to know when a \"run\" terminates. **Run-Length Encoding (cont\'d.)** - **When to create a new packet?** - Encoding requires a minimum of two characters. - **Considerations:** - Run of single characters? - 2-character runs? - **Compression Efficiency & Image Data Types:** - Compression efficiency varies based on the type of image data. - **Example:** - Encoding a mostly white black-and-white image versus encoding an image with many colors. - **Modifications:** - Leave 1 or 2-byte runs as they are to avoid overhead. **Variants of Run-Length Encoding** - **Number of Variants:** - In normal bitmap data encoding, the bitmap is encoded starting from the upper left corner and proceeds from left to right across each scan line, treating the data as a **1D array** rather than a **2D array** of pixels. - **Alternative RLE schemes could encode data:** - Down the length of the image. - Into 2D tiles. - Diagonally in a zigzag fashion. - **Odd variants** are used in highly specialized applications and are quite rare. **Variants on Run-Length Encoding (cont\'d.)**A diagram of a graph Description automatically generated - **Diatomic Encoding:** - A variation of RLE. - Combines two bytes. - Determines the most frequently occurring pairs of bytes. - **Example in English (most frequent pairs):** - \"E \", \"T \", \"A \", \"S \", \"TH\", \"RE\", \"IN\", \"HE\". - **Reduction:** - Results in a reduction of \> 10% by replacing frequent pairs. **Vertical Replication Packets** - Also known as **"Repeat scan line packet"**. - **Purpose:** - Doesn't store actual scan-line data. - Indicates a repeat of the previous scan line. - **Example:** - Assume a 640x480 image where all pixels in a scanline are the same color. - Using RLE with up to 256 bytes per packet, each packet is 2 bytes, requiring 6 bytes to run-length encode a scan line. **Vertical Replication Packet (cont\'d.)\ **![A screenshot of a computer Description automatically generated](media/image4.png) - **Scenario:** - Assume the first 100 scanlines of the image are the same color. - Without vertical replication packets: - 6 bytes/scanline \* 100 scanlines = 600 bytes of run-length encoded data. - **Using vertical replication packet:** - Run-length encode the first scanline (3 2-byte packets = 6 bytes). - Follow it with 99 vertical replication packets (99 bytes). - The resulting data is reduced to only 105 bytes. - **Using a count byte for the scanlines to repeat:** - A vertical replication packet with a count value of 99 (run count = 0, run value = 99). - Results in 6 bytes of scan-line data and 2 bytes of vertical replication packets for the first 100 scan lines. - **Total size:** 8 bytes. **RLE Scheme with 1- and 2-byte Vertical Replication Packets** - **Example of a scheme using vertical replication packets.** **The Shannon-Fano Algorithm** - **Overview:** - A top-down approach to encoding symbols based on their occurrence frequencies. - **Steps:** - **Sort symbols** based on their decreasing frequencies/probabilities. - Example: ABCDE. - **Divide** the symbols into two parts with approximately equal frequencies. - **Assign codes:** - Assign a **0** to one group and a **1** to the other. - **Repeat steps 2 and 3** recursively until no further split is possible. - **Traverse the tree top-down** to get the code for each symbol.\ A screenshot of a test Description automatically generated **The Shannon-Fano Algorithm (cont\'d.)** - **Example:** - Symbol frequencies: - Symbol A: 15, B: 7, C: 6, D: 6, E: 5. - **Probability Calculation:** H(S)=1539log⁡2(3915)+739log⁡2(397)+639log⁡2(396)+639log⁡2(396)+539log⁡2(395)≈2.2 bits/symbol.H(S)=3915​log2​(1539​)+397​log2​(739​)+396​log2​(639​)+396​log2​(639​)+395​log2​(539​)≈2.2 bits/symbol. - **Number of bits required:** - With 8 bits per symbol, total bits = 39 \* 8 = 312 bits. - **Average bits per symbol:** 2.82. - **Note:** - Codes produced by Shannon-Fano **are not unique**. ![A screenshot of a computer code Description automatically generated](media/image6.png) **Huffman Coding** **Overview:** - **Huffman Coding** is a simple and widely used compression algorithm. - Savings: **20% - 90%**. - **Observation:** - Not all symbols are used equally; some symbols occur more frequently. - **Objective:** - Determines the optimal code using the minimum number of bits. - **Example:** - In text, the most frequently occurring alphabet gets the shortest code. - In graphics/images, the most frequently occurring color gets the shortest code. **Huffman Coding (cont\'d.)** - **Binary Tree Construction:** - **Leaves:** Represent the symbols to be encoded. - **Nodes:** Contain the occurrence probabilities of the symbols. - **Branches:** Assigned 0 or 1, depending on the left or right child. - **Encoding Process:** - A **bottom-up approach**. **Encoding for Huffman Algorithm:** 1. **Initialization:** - Place the set of leaves (symbols) in a sorted list (OPEN) and keep it sorted at all times. 2. **Repeat until only one node is left in the list (OPEN):**\ a. Pick two nodes with the lowest frequencies and create a parent node.\ b. Assign the sum of the children\'s frequencies to the parent node and reinsert it into the list, keeping it sorted.\ c. Assign **0** to the left subtree and **1** to the right subtree, then delete the two children from the list. 3. **Traverse the tree top-down** to get the codes for each symbol. **Encoding for Huffman Algorithm (cont\'d.)** - **Example:** - Symbols and probabilities: - Symbol A: 0.16, B: 0.51, C: 0.09, D: 0.13, E: 0.11. - **Step-by-step calculation:** - **Tree structure** and **code assignment** are shown in the diagram. A screenshot of a math test Description automatically generated **Data Compression** **Encoding for Huffman Algorithm (contd.):** - **Example 1:** Consider following symbols and their probability of occurrence:\ Symbol \| A \| B \| C \| D \| E\ Probability \| 0.16 \| 0.51 \| 0.09 \| 0.13 \| 0.11\ p(ADCEB) = 1.00\ p(ADCE) = 0.49\ p(B) = 0.51\ p(A) = 0.16\ p(CE) = 0.20\ p(AD) = 0.29\ p(C) = 0.09\ p(E) = 0.11 ![A diagram of a graph Description automatically generated with medium confidence](media/image8.png) **37**\ **Data Compression**\ **Encoding for Huffman Algorithm (contd.):** - **Example 1 (contd.):** If the total number of symbols = 100, then\ Number of symbols: A = 16, B = 51, C = 9, D = 13, and E = 11 - If a symbol is represented by 8 bits, total number of bits required = 100 \* 8 = 800 bits - With Huffman encoding: - A = 16 \* 3 = 48 - B = 51 \* 1 = 51 - C = 9 \* 3 = 27 - D = 13 \* 3 = 39 - E = 11 \* 3 = 33 - Total \# of bits = 198 - Compression of about 1:4 A screenshot of a computer Description automatically generated **38**\ 11/18/2024\ **38** **Data Compression** **Encoding for Huffman Algorithm (contd.):** - **Example 2:** Consider following symbols and their occurrence probability:\ Symbol \| A \| B \| C \| D \| E\ Probability \| 0.39 \| 0.26 \| 0.21 \| 0.07 \| 0.07\ p(ABCDE) = 1.00\ p(A) = 0.39\ p(BCDE) = 0.61\ p(CDE) = 0.35\ p(DE) = 0.14\ p(C) = 0.21\ p(D) = 0.07\ p(E) = 0.07 ![A screenshot of a graph Description automatically generated](media/image10.png) **39**\ **Data Compression**\ **Encoding for Huffman Algorithm (contd.):** - **Example 3:** Consider the input: abracadabra\ Symbols and their occurrence probability:\ Symbol \| a \| b \| r \| c \| d\ Count \| 5 \| 2 \| 2 \| 1 \| 1\ p(abcdr) = 11\ p(a) = 5\ p(bcdr) = 6\ p(r) = 2\ p(c) = 1\ p(bcd) = 4\ p(cd) = 2\ p(b) = 2\ p(d) = 1\ A screenshot of a graph Description automatically generated **40**\ 11/18/2024\ **40** **Huffman Decoding:** - **Data Compression** - Read Bit-by-Bit - Output character when code is complete. - To decode the encoded data, need the Huffman tree. - Iterate through the binary encoded data. To find character corresponding to current bits, the following simple steps are needed: 1. Start from the root and do the following until a leaf is found. 2. If the current bit is 0, move to the left node of the tree. 3. If the bit is 1, move to the right node of the tree. 4. If during the traversal, encounter a leaf node, print the character of that leaf node and then again continue the iteration of the encoded data starting from step 1. **Example:**\ A: 0, B: 10, C: 111, D: 1100, E: 1101\ String to be decoded: 11010101100\ \ ![A diagram of a tree Description automatically generated](media/image12.png) **41**\ 11/18/2024\ **41** **Encoding for Huffman Algorithm (contd.):** - **Example 4:**\ String: XXYXZY\ Length = 6 \* 8 = 48 bits\ Probabilities: - P(X) = 1/2 or 0.5 - P(Y) = 1/3 or 0.333 - P(Z) = 1/6 or 0.167\ p(XYZ) = 1.0\ p(X) = 0.5\ p(YZ) = 0.5 **Encoding:**\ Output String: - X = 0 - Y = 11 - Z = 10\ Length = 9 bits A diagram of a mathematical equation Description automatically generated **42**\ 11/18/2024\ **42** **Data Compression** **Huffman Decoding:** - Read Bit-by-Bit - Output character when code is complete. - Essentially, a finite state machine. **Labeling legend:**\ 0/X → input = 0, output = X\ 1/Y → input = 1, output = Y\ 0/Z → input = 0, output = Z ![A diagram of a code Description automatically generated](media/image14.png) **43**\ **Notes:**\ **Data Compression** - Height of the tree indicates number of bits in the longest code - Number of symbols = Number of leaves - Decoding is trivial provided coding table (i.e., statistics) is sent before the data. - **Unique Prefix Property:** no code is a prefix to any other code → Great for decoder, unambiguous. - Huffman code is a minimum-redundancy code, i.e., optimal - Huffman maps fixed length symbols to variable length codes. - Optimal only when symbol frequencies are powers of 2. - Average length of code in an information source is less than η + 1. A diagram of a complex structure Description automatically generated with medium confidence **44**\ 11/18/2024\ **44** **Data Compression** **Huffman Coding of Images:** - To encode images: - Number of symbols = Number of leaves - Divide image up into 8x8 blocks - Each block is a symbol to be coded - Compute Huffman codes for set of blocks - Encode blocks accordingly **45**\ **Data Compression**\ **Adaptive Huffman Coding:** - Motivations -- Leave for now, if time permits (2020):\ a. Previous algorithm requires statistical knowledge which is often not available (e.g., live audio, video).\ b. Availability could be a heavy overhead especially when many tables had to be sent when a non-order model is used (i.e., taking into account the impact of the previous symbol to the probability of the current symbol, e.g., \"qu\" often come together). - Adaptive compression allows dynamic collection and update as the data stream arrives. - Probabilities are not based on prior knowledge but of the actual data received so far. - Especially useful for MM applications (e.g., audio) when the statistics can change rapidly. **46**\ 11/18/2024\ **46** **Data Compression** **Adaptive Huffman Coding:** - Adaptive compression allows dynamic collection and update as the data stream arrives. **ENCODER** Initialize\_model();\ while ((c = getc (input)) != eof) {\ encode (c, output);\ update\_model (c);\ } **DECODER** Initialize\_model();\ while ((c = decode (input)) != eof)\ putc (c, output);\ update\_model (c); ![A white paper with black text Description automatically generated](media/image16.png) **47**\ **Data Compression** - The key is to have both encoder and decoder use exactly the same initialization and update\_model routines. - update\_model does two things: - \(a) increment the count, - \(b) update the Huffman tree. - During the update, the Huffman tree will maintain its sibling property, i.e., the nodes (internal and leaf) are arranged in order of increasing counts (see figure in the next slide). - When swapping is necessary, the farthest node with count N (or W) is swapped with the node whose count has just been increased to N+1 (or W+1). Note: If the node with count N (or W) has a subtree beneath it, then the subtree will go with it. - The Huffman tree could look very different after node swapping, e.g., in the third tree, node A is again swapped and becomes the \#5 node. It is now encoded using only 2 bits. A diagram of a tree Description automatically generated **49**\ **Data Compression**\ **Lempel-Ziv-Welch (LZW) Compression** - A most common adaptive, dictionary-based compression technique. - Lossless data compression. - **Input:** fixed-length symbols - **Output:** variable-length string of code words/symbols that commonly occur together. - LZW encoder and decoder build up the same dictionary dynamically while receiving the data. - A single code for more than one symbol → compression. **50**\ 11/18/2024\ **50** **Data Compression** **Lempel-Ziv-Welch (LZW) Compression (contd.):** - Originally developed by Jacob Ziv and Abraham Lempel in 1977 and 1978, called LZ77 and LZ78. - The LZ77 compression algorithms: text compression. - The LZ78 compression algorithms: binary data. - Modified by Terry Welch in 1984, now commonly known as LZW Compression. - **Applications include:** - Unix compress - GIF for images - V.42bis for modems ![A diagram of a computer Description automatically generated with medium confidence](media/image18.png) **51**\ **Data Compression**\ **Lempel-Ziv-Welch (LZW) Compression (contd.):** - A general compression algorithm. - Fast in both directions. - Identical on both big-endian and little-endian systems. - Also referred to as substitution or dictionary-based encoding. - Patterns of data in input stream are identified and matched with dictionary entries. - If the pattern does not exist in the dictionary, a phrase is created based on the contents of the substring and stored in the dictionary, and code is written to the compressed output stream. **52**\ 11/18/2024\ **52** **Data Compression** **Lempel-Ziv-Welch (LZW) Compression (contd.):** - If there is a reoccurrence of a substring: - The code of the substring stored in the dictionary is written to the output. - The code is smaller than the substring → compression! - **Decoding** is the reverse of encoding. - Decoder reads code from the encoded data stream and if not already in the data dictionary, writes it to the dictionary. - If the code is there, it is translated into the string it represents, and the string is written to the uncompressed output stream. **53**\ **Data Compression**\ **Lempel-Ziv-Welch (LZW) Compression (contd.):** - In LZW compression, the dictionary is not part of the data. - For text files, the first 256 entries of the dictionary are initialized with 8-bit ASCII character set as phrases. - All the substrings are built from these codes. - LZW encoders and decoders begin with dictionaries initialized to these values. **54** **Data Compression** **LZW Compression Algorithm:** lua Copy code s = next input character; while (not EOF) { read a character c; if s + c exists in the dictionary s = s + c; else { output the code for s; add string s + c to the dictionary; s = c; } } output the code for s; A screenshot of a computer code Description automatically generated **55**\ **Data Compression**\ **Example:**\ ![A graph with numbers and symbols Description automatically generated](media/image20.png) **\ ** **Data Compression** **Example string:** \^MED\E\\\B\T **s** **k** **Output** **Index** **Symbols (in dictionary)** ------- --------- ------------ ----------- ----------------------------- NIL \^ \^ \^M \^M M M 256 \^M M E E 257 ME ME D D 258 ED ED D \ \^M 259 D\^ E E 260 \^ME \^ME \^M E \ \^ME \^ME \^M \ E\^ \^ME \^ME E \^M 262 \^MEE \^MEE E\^ ME 263 E\^M E\^M B B 264 MEB MEB \^ME B \ \^ME \^ME T T 265 \^MET \^MET A table of numbers and symbols Description automatically generated **58** **Implementation Notes:** **Data Compression** - Can be made fast: grab fixed number of bits from the input stream. - Makes bit parsing very easy. - Table lookup is automatic. - To improve efficiency and speed, necessary to decide: 1. When to discard phrases from dictionary. 2. How to search data dictionary during encoding. Dr. I. Ahmad\ MM: Compression **59**\ 11/19/2024\ **59** **Data Compression** **Implementation Notes (contd.):** - **TIFF method:** - Pixel data is packed into bytes before submitting to LZW. - LZW source byte might be: - A pixel value - Part of a pixel value - Several pixel values - Always stores compressed codes with most significant bit first. - **GIF** requires each LZW input symbol to be a pixel value. Dr. I. Ahmad\ MM: Compression **60** **Data Compression** **Implementation Notes (contd.):** - For 1- to 8-bit images in GIF, LZW dictionary is initialized accordingly. - It is irrelevant how pixels might have been packed into the storage originally, LZW deals with them as a sequence of symbols. - GIF stores compressed codes with least significant bit first, regardless of the native bit order of the machine architecture. Dr. I. Ahmad\ MM: Compression **61**\ 11/19/2024\ **61** **Data Compression** **Implementation Notes (contd.):** - **TIFF approach does not work well for odd-size pixels, why?** - Pixel packing into bytes creates byte sequences that do not match original pixel sequences, obscuring patterns in pixels. - Works well when pixel boundaries and byte boundaries agree. - **GIF approach works better for odd-size bit depths, but difficult to extend it to more than eight bits per pixel. Why?** - Because the LZW dictionary must become very large to achieve useful compression on large input alphabets. Dr. I. Ahmad\ MM: Compression **62** **Data Compression** **Variations on the LZW Algorithm** - **Variable length index pointers:** usually start at 9 bits and grow upward to 12/13 bits. - Constantly monitor compression process for drop in efficiency. - **Least recently used (LRU)** phrases in dictionary are discarded. - Discard entire dictionary and rebuild it. - The **LZMW** variant builds phrases by concatenating two phrases together. Dr. I. Ahmad\ MM: Compression **63**\ 11/19/2024\ **63** **JPEG Compression** **Data Compression** - **JPEG → Joint Photographic Experts Group** - JPEG is a standards committee with origins in the International Standard Organization (ISO). - In 1982, ISO formed the Photographic Experts Group (PEG) to research methods of transmitting video, still images, and text over ISDN (Integrated Services Digital Network) lines. - **Goal:** - Produce a set of industry standards for transmission of graphics and image data over digital communications networks. Dr. I. Ahmad\ MM: Compression **64** **Why JPEG Compression?** **Data Compression** - **Problems with GIF:** - Limited pixel depth. - LZW compression doesn't work well on typical scanned images. **Why?** Think about noise. - TIFF and BMP formats are capable of handling 24-bit data, but their pre-JPEG versions were based on LZW and/or RLE encoding and would not compress: - Images with lots of noise - Continuous tone images Dr. I. Ahmad\ MM: Compression **65**\ 11/19/2024\ **65** **JPEG Compression (contd.)** **Data Compression** - Works with all types of images, e.g., satellite, medical, etc. - Capable of compressing continuous-tone image data: - Photographs - Video and complex graphics resembling natural subjects - **Lossy compression.** - A suite of compression methods that may be altered to fit needs. Dr. I. Ahmad\ MM: Compression **66** **JPEG Compression (contd.)** **Data Compression** - **Compression choices:** - Highly compressed -- relatively poor-quality images. - Lightly compressed -- good quality images. - Amount of compression depends on data: - For a photographic-quality image, compression ratio from **20:1 - 25:1**. - Higher compression → noticeable difference in quality. Dr. I. Ahmad\ MM: Compression **67** **JPEG Compression (contd.)** **Data Compression** - User can "tune" quality of JPEG encoder through **quality setting** or **Q-factor:** - A factor of 1 → smallest but worst quality images. - A factor of 100 → largest but best quality images. - The optimal Q factor depends on image contents. - **How to find the lowest Q-factor?** - An art, depends on image contents. **Data Compression** **JPEG Compression (contd.)** - JPEG is not an ideal compression solution and doesn\'t fit all compression needs. - Creates \"artifacts\" in images with large areas of a single color. - Effect in images with large blocks of the same color. - Effect in images with \"busier\" compositions. - Software implementation can be slow (fast with hardware support). - Difficult to implement. Dr. I. Ahmad\ MM: Compression **69** **Data Compression** **JPEG Compression (contd.)** - **Uses transform coding**, based on observations: - **Observation 1:** - Large majority of useful image contents change relatively slowly across the image. - i.e., it is unusual for intensity values to alter up and down several times within a small area (for example, within an 8x8 image block). - In terms of spatial frequency domain, generally, **lower spatial frequency components** contain more information than high-frequency components, which often correspond to less useful details and noise. - **Higher spatial frequency** → changes are abrupt. Dr. I. Ahmad\ MM: Compression **70**\ 12/2/2024\ **70** **Data Compression** **JPEG Compression (contd.)** - **Observation 2:** - Psychophysical experiments suggest that humans are more receptive (tolerant) to loss of higher spatial frequency components than loss of lower frequency components. Dr. I. Ahmad\ MM: Compression **71** **Baseline JPEG** **Data Compression** - JPEG specification defines a minimal subset of standards called **baseline JPEG**, which all JPEG-aware applications are required to support. - Uses encoding scheme based on **Discrete Cosine Transform (DCT)**. - **DCT:** A generic name for a class of operations. - DCT-based encoding algorithms are always **lossy** by nature. - DCT algorithms are capable of achieving a high degree of compression with only minimal loss of data. ![A diagram of a process flow Description automatically generated](media/image22.png) **72**\ 12/2/2024\ **72** **Baseline JPEG (contd.)** **Data Compression** - Effective only for compressing continuous-tone images. - Baseline standard requires a minimum of **8-bits per input sample**. - Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but results are quite bad for low-bit-depth source data. - Large jumps between adjacent pixel values. Dr. I. Ahmad\ MM: Compression **73** **Baseline JPEG (contd.)** **Data Compression** - **Typical JPEG compression and decompression process:** **74**\ 12/2/2024\ **74** **Color Transformation:** **Data Compression** - JPEG is capable of encoding images with any type of **color space**, independent of any color-space model such as RGB, CMY, HSV, etc. - Encodes each component of a color model separately. - Best compression ratios when **luminance/chrominance color space** (such as YUV, YIQ, or YCbCr) is used. Dr. I. Ahmad\ MM: Compression **75** **Luminance / Chrominance Color Model** **Data Compression** - Different from other colorimetric models. - A method to capture all the luminance information in one value and put the color (i.e., chrominance) information in the other two values → a linear transformation of RGB image data. - Widely used to encode color for use in television transmission. - Color models based on this approach are: **YIQ**, **YUV**, **YCbCr**, and **YPbPr**. - The **YIQ** model is a simple translation of the RGB model, separating out the information in a way that is more efficient for television broadcasting. Dr. I. Ahmad\ MM: Compression **76**\ 12/2/2024\ **76** **Luminance / Chrominance Color Model (contd.)** **Data Compression** - **Y** is the luminance component, and **I** and **Q** are chrominance. - Most visual information sensitive to the human eye is in the **high-frequency grayscale luminance** component of the color space. - For a pixel represented in RGB, the equivalent **Y**, **I**, and **Q** color components in the **YIQ** color model are given by: \[YIQ\]=\[0.2990.5870.1140.596−0.275−0.3210.212−0.5230.311\]\[RGB\]​YIQ​​=​0.2990.5960.212​0.587−0.275−0.523​0.114−0.3210.311​​​RGB​​ - **Inverse of this matrix** is used to convert from YIQ to RGB. A number with numbers on it Description automatically generated with medium confidence **77** **Luminance / Chrominance Color Model (contd.)** **Data Compression** - The coefficients in the matrix are based on primary colors that are appropriate for the standard **National Television System Committee (NTSC)** RGB phosphor. - **YIQ** is the model used in U.S. commercial television broadcasting. - **YUV** color model is equivalent to the European **PAL** analog video standard, also based on luminance and chrominance. - The **YCbCr** model is closely related to **YUV**, with its chrominance values scaled and shifted. - **YCbCr** is used in **JPEG** and **MPEG** compression. Dr. I. Ahmad\ MM: Compression **78**\ 12/2/2024\ **78** **Data Compression** - **Sample YUV Decomposition:** Y,U,VY,U,V ![A collage of different animals Description automatically generated](media/image24.png) **79** **Data Compression** **Transform Image (contd.)** - Chrominance components (**Cb** & **Cr**) contain high-frequency color information. - The human eye is less sensitive, meaning most of the information can be discarded. - Information is spread across color components in RGB, HSI, and CMY color models → selective discarding of information is difficult. - All color components need to be encoded at the highest quality → poorer compression ratio. - **Transformation of Gray-scale images?** Dr. I. Ahmad\ MM: Compression **80**\ 12/2/2024\ **80** **Data Compression** **Downsample Chrominance Components** - Converting the image file from RGB to a model like **YCbCr** makes it possible to achieve an even greater compression rate by means of **chrominance subsampling**. - JPEG compressor must reduce resolution of the chrominance channels by **downsampling** or averaging together groups of pixels. - Several different choices for sampling ratios of downsampled channels. - **Luminance channel** is always left at full resolution. - Typically, both chrominance channels are downsampled **2:1 horizontally and 2:1 vertically**. **81** **Data Compression** **Downsample Chrominance Components (contd.)** - Three commonly used subsampling rates are: - **4:1:1**, **4:2:0**, and **4:2:2**. - Derived from television customs. - Hardly any gain in perceived quality. - Conventional notation for luminance/chrominance subsampling is in the form **a : b : c**. - **a:** Luma samples - **b & c:** Chroma samples (odd line & even line respectively) - Chrominance subsampling is **not** a JPEG-required step. A screenshot of a test Description automatically generated **82** **Subsampling:** **Data Compression**![A screenshot of a computer screen Description automatically generated](media/image26.png) **83** **Data Compression** **Apply a Discrete Cosine Transform (DCT)** - From now on, each color component is processed independently. - Temporary padding may be used to make an image an exact multiple of 8 pixels in width or height. - Image data is divided up into **8x8 blocks of pixels**. **Apply a Discrete Cosine Transform (contd.)**A diagram of a function Description automatically generated with medium confidence **85** **Data Compression** **Apply a Discrete Cosine Transform (contd.)** - DCT converts spatial image representation into a frequency domain representation c=(Fu,Fv)c=(Fu​,Fv​),\ where **c** is the coefficient and **F\_u** and **F\_v** are the respective spatial frequencies for each direction using the transformation: F(u,v)=∑x=07∑y=07CuCvf(x,y)cos⁡((2x+1)uπ16)cos⁡((2y+1)vπ16)F(u,v)=x=0∑7​y=0∑7​Cu​Cv​f(x,y)cos(16(2x+1)uπ​)cos(16(2y+1)vπ​) where: Cu,Cv={12for u,v=01otherwiseCu​,Cv​={2​1​1​for u,v=0otherwise​ - Output is a set of 64 values known as **DCT coefficients**. - These represent the frequency and not the amplitude of the signal at the sampled position (x,y)(x,y). - The coefficient corresponding to the vector **(0, 0)** is called the **DC coefficient** and represents the average brightness value in the block.![A screenshot of a math equation Description automatically generated](media/image29.png) **86**\ A table of numbers and a number of images Description automatically generated with medium confidence\ **86** **Data Compression** **Discrete Cosine Transform (contd.)** - Strong correlation between the **DC coefficients** of adjacent **8x8 blocks**. - Remaining coefficients are called **AC coefficients**. - **AC coefficients** represent the strength of rapid changes across the width or height of the block. - The highest AC term represents the strength of a cosine wave alternating from maximum to minimum at adjacent pixels. - **Top-left corner** has low-frequency components (likely large in value). - **Bottom-right corner** has high-frequency components (likely small in value). Dr. I. Ahmad\ MM: Compression **88**\ 12/2/2024\ **88** **Data Compression** **Discrete Cosine Transform (contd.)** - **DCT calculation** is fairly complex → **costliest step**. - Why do we do it? - To separate high- and low-frequency information in the image → discard high-frequency data without losing low-frequency information. - **DCT** is a **lossless step**, except for the round-off error. ![A screenshot of a computer error Description automatically generated](media/image31.png) **Data Compression - Quantization** - DCT output (64 DCT coefficients) is uniformly quantized in conjunction with a 64-element quantization table. - **Goal:**\ To throw out bits. - Example: - 101101 = 45 (6 bits) - Truncate to 4 bits: 1011 = 11 - Reconstruct with 6 bits: 44 (101100 in binary) - Truncate to 3 bits: 101 = 5 - Reconstruct with 6 bits: 40 (101000 in binary) Dr. I. Ahmad\ MM: Compression **91**\ 12/2/2024\ **Quantization (contd.)** - **Table** is specified by the application (or user) as encoder input. - Each element is an integer (1 -- 255) and specifies step size of the quantifier for its corresponding DCT coefficient. - The coefficients for high-frequency components are typically small, so they often round down to 0, effectively discarding them. - The larger the quantization coefficient, the more data is lost since the actual DCT value is represented less accurately. **A sample Quantization Table** **92**\ 12/2/2024\ **Quantization (contd.)** - **Why quantization?** - To discard the appropriate amount of information with the precision necessary to achieve the desired image quality. - An inherently **lossy** step. - **Quantization formula:**FQ(u,v)=Round(F(u,v))→integer value Q(u,v)FQ(u,v)=Round(F(u,v))→integer value Q(u,v) - A black symbols on a white background Description automatically generated - Each element of the DCT output block has a quantization coefficient. - Higher-order terms have larger quantization coefficients than lower-order terms. - Separate quantization tables for luminance and chrominance data. - More quantization of chrominance data than luminance data. **93**\ 12/2/2024\ **Quantization (contd.)** - **Step controlled by \"Q-factor\" setting** in most JPEG compressors: - 1 = low quality/highest compression - 100 = high quality/lowest compression - Values smaller than the threshold are set to zero. - Compressor starts with a built-in table, which is appropriate for a medium-quality setting. - Change the value of each table entry in inverse proportion to requested quality. Dr. I. Ahmad\ MM: Compression **94**\ 12/2/2024\ **Quantization (contd.)** - Complete quantization tables used are recorded in the compressed file. **Why?** - To let the decompressor know how to (approximately) reconstruct DCT coefficients. - The selection of an appropriate quantization table is often considered a \"black art.\" - Generally, compressors start with a sample table developed by the **ISO JPEG committee**. - **At what Q factor does JPEG become lossless?** - **Never**. DCT-based encoders are always lossy. **Quantization (contd.)**\ **How does it work?** ![A screenshot of a graph Description automatically generated](media/image33.png) **96**\ 12/2/2024\ **Encoding of Coefficients** - Coefficients contain a significant amount of redundant data. - **Arrange values in a zigzag order** and do run-length (or arithmetic) encoding. **Why Zigzag order?** - Frequencies generally increase from left to right and from top to bottom. - Thus, if we want to order the coefficients in order of increasing frequency, the zigzag order is an effective method. Dr. I. Ahmad\ MM: Compression **97**\ 12/2/2024\ **Encoding of Coefficients (contd.)** A diagram of a graph Description automatically generated - **Huffman compression** can losslessly remove redundancies, reducing JPEG data size. - **Optional extension** allows arithmetic encoding rather than Huffman to provide a greater compression ratio. - The data stream is then ready for transmission or encapsulation in an image file format. Dr. I. Ahmad\ MM: Compression **98**\ 12/2/2024\ **JPEG Extensions** - Number of extensions, including: - Progressive image buildup - Improved compression ratios using arithmetic encoding - Lossless compression scheme - These features are: - Beyond the needs of most JPEG implementations - Not required to be supported in JPEG standards. Dr. I. Ahmad\ MM: Compression **99**\ 12/2/2024\ ![A screenshot of a computer Description automatically generated](media/image35.png) **100**\ 12/2/2024\ **JPEG Extensions (contd.)** - **Conventional compression method:** - Displaying scan lines as they are decoded, or - Displaying the image after the complete reception of data.A butterfly on a flower Description automatically generated **101**\ 12/2/2024\ **JPEG Extensions (contd.)**\ **Progressive Image Buildup:** - An extension for use in applications receiving JPEG data streams for **on-the-fly** display. - **Baseline JPEG**: Decoding only after all data is received. - Some applications require the image to be displayed after only some of the data is received. - **Progressive buildup** allows an image to be sent in layers rather than scan lines. - Instead of transmitting each bit-plane or color channel in sequence, a succession of images built from approximations of the original image is sent. - First, a low-accuracy representation of the entire image is sent. - Then, the image is gradually refined by increasing the effective quality factor. Dr. I. Ahmad\ MM: Compression **102**\ 12/2/2024\ **JPEG Extensions (contd.)** - **Limitation:** - Each scan takes essentially a full JPEG decompression cycle to display. - With typical data transmission rates, a very fast JPEG decoder is needed. - A related JPEG extension provides for **hierarchical storage** of the same image at multiple resolutions. - Image stored at resolutions like 250x250, 500x500, 1000x1000, and 2000x2000 pixels to support display on low-resolution screens, medium-resolution laser printers, and high-resolution image setters. - Higher-resolution images are stored as differences. **Digital Video**\ Dr. I. Ahmad\ MM: Video ![A table with numbers and symbols Description automatically generated](media/image37.png) **3**\ 12/2/2024\ **Data Compression**\ **Why do we need data compression?** - **Storage Requirements** - Stored data may need manipulation and, hence, will have: - **Memory Requirements** - Intense processor activities - Communication speed - Possible to explore data redundancy → Compression Dr. I. Ahmad\ MM: Compression **4**\ **Video and Animation**\ **How Big Is a Video?** - **Fact:** Digital images are generally bigger than the current television size. - **Fact:** TV image resolutions are smaller than current workstations. - **Fact:** For digital video, a new frame is required every 1/25th second for PAL and every 1/30th second for NTSC. - To calculate the amount of disc space required for full-motion video, assume: - 24 bits per pixel in the digital video - 30 frames per second Dr. I. Ahmad\ MM: Video **5**\ 12/2/2024\ **Video and Animation**\ **How Big Is a Video? (contd.)** **Time** **3840x2160** **1920x1080** **1280x720** **640x480** **320x240** **160x120** ---------------- --------------- --------------- -------------- ------------- ------------- ------------- **1 sec** 711 GB 0.17 GB 0.08 GB 27 MB 6.75 MB 1.68 MB **1 min** 42.714 TB 10.43 GB 4.9 GB 1.6 GB 400 MB 100 MB **1 hour** 256.28 PB 626 GB 299 GB 97 GB 24 GB 6 GB **1000 hours** \-\-- 611 TB 292 TB 97 TB 24 TB 6 TB **Amount of data for full-motion digital video (no compression):** - A 1-hour Ultra HD (4K) video would require **256.28 PB** (Peta) of disc space. - Binary: - K: 1024 (2¹⁰), - M: 1024² (2²⁰), - G: 1024³ (2³⁰), - T: 1024⁴ (2⁴⁰), - P (Peta): 1024⁵ (2⁵⁰), - E (Exa): 1024⁶ (2⁶⁰) A table with numbers and symbols Description automatically generated with medium confidence **6**\ **TV Resolution Evolution** - **LDTV (Low Definition TV):** - 240i60, 288i50 - **SDTV (Standard Definition TV):** - 480i60, 480p30, 576i50, 576p25 - **EDTV (Enhanced Definition TV):** - 480p60, 576p50, 720i50/60, 720p24/25/30 - **HDTV (High Definition TV):** - 720p50/60, 1080p24/25/30, 1080i50, 1080i60 - **4K Ultra HDTV (High Definition):** - 3840 x 2160p **7**\ **Comparison of Frame Size**\ ![A blue and black background with red and black squares Description automatically generated](media/image39.png) **8**\ **Video and Animation**\ **Video Bit Rate Calculations** - A math equations and numbers Description automatically generated with medium confidence **9**\ 12/2/2024\ **Video and Animation**\ **Video Compression (contd.)**\ **How to reduce video data?** - It is possible to dramatically reduce the amount of data by: - Reducing the perceived frame rate of video from 30 fps by limiting the number of frames through a bandwidth limitation mechanism. - Example: In multicast conferences, bandwidth is 15 - 64 Kbps. - **Frame rate vs. quality of full-motion video:** - Adequate for many situations, e.g., multimedia conferencing. - For a scene with a human face, only 64 pixels square and 10 fps might suffice for a meaningful image. - For 64 x 64 pixels, 3 bytes/pixel, 10 fps: - Needs 120 KB/s or just under 1 Mbps. Dr. I. Ahmad\ MM: Video **10**\ **Video and Animation**\ **Video Compression (contd.)** - If a frame contains a lot of image that is the same: - Encode with fewer bits without losing any information. - Use features of natural scenes to reduce the number of bits. - Nature is very self-similar. - If some details are left out, the human eye is often fooled. - Many encoding methods to compress video data. Dr. I. Ahmad\ MM: Video **11**\ 12/2/2024\ **Video and Animation**\ **Video Compression (contd.)** - The majority of video compression methods involve **transform-coding** schemes. - Transforms physically reduce the size of video data. - Usually, 10 - 25% or more of the original video data is discarded. - This depends on the content and the acceptable quality. - Usually, a transform is performed on individual video frames. - The transform discards data not used by the human eye. - The transformed data is then compressed further to reduce its size. - Each frame may be compressed using Huffman or arithmetic encoding algorithms or even more complex compression schemes. **12**\ **Video and Animation - Codecs**\ **Compression and Decompression** - Compression and decompression can be done in **hardware** or **software**. - **Hardware Codecs:** - Possible to capture video signals, store them on a computer, and play them back in full-motion video to an external monitor (e.g., TV). - Can also play back on computer monitors, but most hardware codecs cannot provide full-screen width. - Cannot assume the availability of hardware codecs. Dr. I. Ahmad\ MM: Video **13**\ **Video and Animation - Codecs** - **Software Codecs:** - Implemented in software alone. - Slower than hardware codecs. - Can be both **symmetric** and **asymmetric key frames**: - Complete frames that start different frame series. Dr. I. Ahmad\ MM: Video **14**\ **Video and Animation**\ **Selection of CODEC**\ Three primary criteria to consider when selecting a codec: 1. Compression level 2. Quality of compressed video 3. Compression/decompression speed **Trade-offs** within criteria. Dr. I. Ahmad\ MM: Video **15**\ **Video and Animation**\ **Compression/Decompression Standards** - **Basic Principles & Terminology of Standards for Moving Pictures:** - The terms \"frame\" and \"picture\" are used interchangeably. - The **JPEG algorithm** can be applied to individual frames, known as **Moving JPEG (MJPEG)**. - Typical compression ratios range from **10:1 to 20:1**---not good enough to produce effective compression ratios. **Video and Animation** **In video, two types of redundancies are present:** 1. **Spatial redundancy**: Present in each frame. 2. **Temporal redundancy**: Between a set of consecutive frames. A small portion of a frame is involved with any motion.\ **Examples:** - Movement of lips or eyes - Movement of a person or vehicle across the screen. It is possible to make a prediction based on the contents of many frames. Dr. I. Ahmad\ MM: Video **17**\ **Video and Animation**\ **Principles and Terminology of Standards** For **prediction**: - A set of selected frames is sent as individually compressed frames. - For the remaining frames, the difference is sent. The difference is between the actual frame contents and the predicted frame contents. - The accuracy of the prediction operation is determined by how well any movement between successive frames is estimated. This process is known as **motion estimation**. - Additional information is needed to compensate for small differences between predicted and actual positions of the moving segments. This is known as **motion compensation**. Dr. I. Ahmad\ MM: Video **18**\ 12/3/2024\ **Video and Animation**\ **Frame Types** Two basic types of compressed frames: - **I-frames (Intracoded frames):** - Encoded independently from other frames (to reduce spatial redundancy). - **Intercoded frames (Predicted frames):** - Predicted and interpolated to reduce temporal redundancy. - **Types of predicted frames:** 1. **P-frames** (Predictive frames) 2. **B-frames** (Bidirectional frames) **19**\ **Video and Animation**\ **I-frames** - **Independent of other frames.** - Encoded independently for **Y**, **Cb**, and **Cr** matrices. - **Not much compression** in each frame. - **Why must an I-frame be present at regular intervals?** - If an I-frame is corrupted during transmission, a complete subscene or scene may be lost. Dr. I. Ahmad\ MM: Video **20**\ **Video and Animation**\ **I-frames (contd.)** - The number of frames between two I-frames is called the **Group of Pictures (GOP)**. - **GOP is denoted by N.** - A typical value for N: 3--12. - Decoding of I-frames can start as soon as it is received to recreate the original frame. Dr. I. Ahmad\ MM: Video **21**\ **Video and Animation**\ **P-frames** - Created by predicting the difference between the current and the closest preceding I- or P-frame. - Involves encoding using a combination of **motion estimation** and **motion compensation**. - Provide **significantly higher levels of compression**. - **Why limit the number of P-frames between successive I-frames?** - To avoid amplification of errors in motion estimation. - The number of frames between a P-frame and the immediately preceding I- or P-frame is called the **prediction span**. Dr. I. Ahmad\ MM: Video **22**\ 12/3/2024\ **Video and Animation**\ **P-frames (contd.)** - **Prediction span** is denoted by **M**. - Typical values range from **1 to 3**. - A typical sequence of I- and P-frames only. ![A diagram of a graph Description automatically generated with medium confidence](media/image41.png) **23**\ **Video and Animation**\ **P-frame (contd.)** - Contents for P-frames are encoded by considering the contents of the current (un-encoded) frame relative to the contents of the immediately preceding (un-encoded) frame. - To derive the decoded frame contents: - Received information is first decoded. - Resulting information is then used together with the decoded contents of the preceding I- or P-frame. Dr. I. Ahmad\ MM: Video **24**\ 12/3/2024\ **Video and Animation**\ **B-frames** - **MPEG encoder** has the option of using forward/backward interpolated prediction. - These frames are commonly referred to as **bi-directional interpolated prediction frames**, or **B-frames**. - B-frames are coded based on: - **Forward prediction** from a previous I or P frame. - **Backward prediction** from a succeeding I or P frame. - Since I and P frames are used to predict other P and B frames, they are called **Reference Frames**. - This technique is called **Motion Compensation**. - **Advantage:** Increased compression efficiency. Dr. I. Ahmad\ MM: Video **25**\ **Video and Animation**\ **B-frames (contd.)** **Disadvantages:** - Frame reconstruction memory buffers within the encoder and decoder must be doubled in size to accommodate the two anchor frames. - A delay throughout the system as the frames are delivered out of order. - **Computational complexity:** - Motion estimation involves comparing small segments of two consecutive frames for differences. - If there's a difference, a search is made to determine the segment of movement. - The search region is limited to just a few neighboring segments. **B-frames** help accommodate the amount of motion between search segments and fast-moving objects. - Contents are predicted using search regions in both past and future frames. Dr. I. Ahmad\ MM: Video **26**\ 12/3/2024\ **Video and Animation**\ **Motion Estimation / Extraction** - **B-frames**: Motion estimation in MPEG operates on **Macroblocks**. - A **macroblock** is a 16x16 pixel range in a frame. - Two primary types of motion estimation: - **Forward prediction**: Predicts how a macroblock from the previous reference frame moves forward into the current frame. - **Backward prediction**: Predicts how a macroblock from the next reference frame moves back into the current frame. **Forward prediction** in red (left-to-right) and **Backward prediction** in green (right-to-left).A diagram of a diagram Description automatically generated **27**\ **Video and Animation**\ **B-frames**\ **Motion Estimation / Extraction (contd.)** - **Motion estimation** operates as follows: - Compare a macroblock of the current frame against all 16x16 regions of the frame you are predicting from. - Select the 16x16 region with the least mean-squared error from the current macroblock and encode a **motion vector**, which specifies the 16x16 region you are predicting from and the error values for each pixel in the macroblock. - This is done only for the combined **Y**, **U**, and **V** values. - **Subsampling** and separation of the Y, U, and V bands come later. Dr. I. Ahmad\ MM: Video **28 B frames continued**\ ![A diagram of a graph Description automatically generated](media/image43.png) **29**\ **Video and Animation**\ **B-frames (contd.)** - To minimize the time required to decode each B-frame, the order of encoding (and transmission) of encoded frames is changed. **Example:**\ Let the un-encoded frame sequence for 10 frames be:\ **IBBPBBPBBI\... (0123456789)** Then, the reordered sequence would be:\ **IPBBPBBIBB\... (0312645978)** Finally,\ **IBBPBBPBBI\... (0123456789)** Dr. I. Ahmad\ MM: Video **30**\ 12/3/2024\ **Video and Animation**\ **D-frames** - Called **DC-coded frames**. - Used only in specific applications (e.g., **video on demand**). - Allow the user to **rewind** or **fast-forward** through the movie → Requires decompression at much higher speeds. - Encoded video stream contains D-frames, inserted at regular intervals throughout the stream. - **Highly compressed frames**, ignored in decoding. Dr. I. Ahmad\ MM: Video **31**\ **Video and Animation**\ **D-frames (contd.)** - **DC coefficients** associated with each 8x8 block of pixels in the **DCT transformation** are the mean of all the values in the related block. - The encoded **DC coefficients** of each block of pixels in the periodically inserted D-frames provide a low-resolution sequence of frames. - These frames are decoded at higher speeds, expected in **rewind** and **fast-forward**.

Use Quizgecko on...
Browser
Browser