Digital Image Processing and Image Compression PDF - University of Technology

Digital Image Processing \ Image Compression MSc Study / General Branch University of Technology Computer Science Department MSc Study General Branch Digital Image Processing – Image Compr...

Digital Image Processing \ Image Compression MSc Study / General Branch University of Technology Computer Science Department MSc Study General Branch Digital Image Processing – Image Compression ‫ ﻋﺒﺪ اﻷﻣﯿﺮ ﻋﺒﺪ ﷲ ﻛﺮﯾﻢ‬.‫ د‬.‫ أ‬:‫أﺳﺘﺎذ اﻟﻤﺎدة‬ Digital Image Processing \ Image Compression MSc Study / General Branch Digital Image Processing – Image Compression Preview Lossy / lossless compression: Certain compression methods are lossy. They achieve better compression by losing some information. When the compressed stream is decompressed, the result is not identical to the original data stream. Such a method makes sense especially in compressing images, movies, or sounds. If the loss of data is small, we may not be able to tell the difference. In contrast, text files, especially files containing computer programs may become worthless if even one bit gets modified. Such files should be compressed only by a lossless compression method, also special purpose images like medical images, forensic images, NASA images are compressed using lossless compression methods. 1. RLE Image Compression Images redundancy is illustrated by the fact that in a nonrandom image, adjacent pixels tend to have similar colors. Compressing an image using RLE is based on the observation that if we select a pixel in the image at random, there is a good chance that its neighbors will have the same color. The compressor thus scans the bitmap row by row, looking for runs of pixels of the same color. Run Length Encoding (RLE) is image compression methods that work by counting the number of adjacent pixels with the same gray level value. This count called the run length, is then coded and stored. Here we will explore several methods of run-length coding including basic method that are used primarily for binary images and extended version for gray scaled images. 1.1 Basic RLE Basic RLE is used primarily for binary images but can be work with complex images that have been preprocessed by thresholding to reduce the number of gray level to two. To implement basic RLE, we often use horizontal RLE, counting along the rows. The compressor thus scans the bitmap row by row, looking for runs of pixels of the same color. In basic horizontal RLE the number of bits required for the coding depends on the number of pixels in a row. If the row has 2n pixels, then the Digital Image Processing \ Image Compression MSc Study / General Branch required number of bits is n (at minimum) , so that a run that is the length of the entire row can be coded. Example For a 256 × 256 image. Requires 8 bit to encode the count since 28 = 256. ‫ ﻛﺤﺪ ادﻧﻰ ﻟﺘﺮﻣﯿﺰ اﻟﻌﺪاد وذﻟﻚ ﻓﻲ ﺣﺎل ﻛﺎﻧﺖ ﻛﻞ ال‬8 bit ‫اي ان اﻟﺼﻒ اﻟﻮاﺣﺪ ﯾﺤﺘﺎج اﻟﻰ‬.( ‫ ﻣﺜﻼ‬1 ) ‫ ﻓﻲ ذﻟﻚ اﻟﺼﻒ ﻣﻦ ﻧﻮع واﺣﺪ‬Pixels For a 512 × 512 image. Requires 9 bit to encode the count since 29 = 512. The compressor assumes that the bitmap starts with white pixels. If this is not true, then the bitmap starts with zero white pixels, and the output stream should start with 0. The resolution of the bitmap should also be saved at the start of the output stream. The size of the compressed stream depends on the complexity of the image. The more detail, the worse the compression. Example What would be the compressed file in the case of the following 6 × 8 bitmap image? Sol 0,1,3,1,3 1,1,3,1,2 2,1,3,1,1 3,1,3,1 2,2,2,2 6,1,1 Digital Image Processing \ Image Compression MSc Study / General Branch Note that in the first row, the first RLE number is 0, which means that the bitmap starts with zero white pixels, since we are using the conversion that the bitmap always starts with white pixels. 1.2 Gray-Level RLE RLE can also be used to compress grayscale images. Each run of pixels of the same intensity (gray level) is encoded as a pair (run length, pixel value). The run length usually occupies one byte, allowing for runs of up to 255 pixels. The pixel value occupies several bits, depending on the number of gray levels (typically between 4 and 8 bits). Example: An 8-bit deep grayscale bitmap that starts with 12,12,12,12,12,12,12,12,12,35,76,112,67,87,87,87,5,5,5,5,5,5,1,... Is compressed into: 9,12,35,76,112,67,3,87,6,5,1,... , where the Red numbers indicate counts. The problem is to distinguish between a byte containing a grayscale value (such as 12) and one containing a count (such as 9). Here are some solutions (although not the only possible ones): 1. If the image is limited to just 128 grayscales, we can devote one bit in each byte (the most significant bit) to indicate whether the byte contains a grayscale value or a count. 2. If the number of grayscales is 256, it can be reduced to 255 with one value reserved as a flag to precede every byte with a count. If the flag is, say, 255, then the sequence above becomes: 255,9,12,35,76,112,67,255,3,87,255,6,5,1,.... 3. Again, an extra bit (a flag) is devoted to each byte to indicate whether the byte contains a grayscale value or a count. This time, however, these extra bits are accumulated in groups of 8, and each group is written on the output stream preceding (or following) the 8 bytes it "belongs to". Example: the sequence 9,12,35,76,112,67,3,87,6,5,1,... becomes 10000010,9,12,35,76,112,67,3,87,100.....,6,5,1,.... The total size of the extra bytes is, of course, 1/8 the size of the output stream (they contain one bit for each byte of the output stream), so they increase the size of that stream by 12.5%. Digital Image Processing \ Image Compression MSc Study / General Branch Two more points should be mentioned: 1. In color images it is common to have each pixel stored as three bytes, representing the intensities of the Red, Green, and Blue components of the pixel. In such a case, runs of each color should be encoded separately. Thus the pixels (171,85,34), (172,85,35), (172,84,34), and (173,83,33) should be separated into the three sequences (171,172,172,173,... ), (85,85,84,83,... ), and (34,35,34,33,... ). Each sequence should be run-length encoded separately. This means that any method for compressing grayscale images can be applied to color images as well. 2. It is preferable to encode each row of the bitmap individually. Thus if a row ends with four pixels of intensity 87 and the following row starts with 9 such pixels, it is better to write... ,4,87,9,87,... on the output stream rather than... , 13,87,.... It is even better to write the sequence... ,4,87, eol, 9, 87,... , where "eol" is a special end-of-line code. RLE disadvantage The following two points can be considers as the Disadvantage of image RLE: 1. When the image is modified, the run lengths normally have to be completely redone. 2. The RLE output can sometimes be bigger than pixel-by-pixel storage ( i.e., an uncompressed image )..‫ أﻋﻼه‬۲ ‫ ﻣﺘﻰ ﺗﺤﺪث اﻟﺤﺎﻟﺔ ﻓﻲ اﻟﻨﻘﻄﺔ‬.‫س‬ Digital Image Processing \ Image Compression MSc Study / General Branch 2. JPEG Image Compression The name JPEG is an acronym that stands for Joint Photographic Experts Group. JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images (not videos). It does not handle bi-level (black and white) images very well. It also works best on continuous-tone images, where adjacent pixels have similar colors. An important feature of JPEG is its use of many parameters, allowing the user to adjust the amount of the data lost (and thus also the compression ratio) over a very wide range. Often, the eye cannot see any image degradation even at compression factors of 10 or 20. There are two operating modes, lossy (also called baseline) and lossless (which typically produces compression ratios of around 0.5). Most implementations support just the lossy mode. JPEG has been designed as a compression method for continuous-tone images. The main goals of JPEG compression are the following: 1. High compression ratios, especially in cases where image quality is judged as very good to excellent. 2. The use of many parameters, allowing knowledgeable users to experiment and achieve the desired compression/quality trade-off. 3. Obtaining good results with any kind of continuous-tone image, regardless of image dimensions, color spaces, pixel aspect ratios, or other image features. 4. A sophisticated, but not too complex compression method, allowing software and hardware implementations on many platforms. 5. Several modes of operation: (a) A sequential mode (raster scanned) where each image component (color) is compressed in a single left-to- right, top-to-bottom scan; (b) A progressive mode where the image is compressed in multiple blocks (known as “scans”) to be viewed from coarse to fine detail; (c) A lossless mode that is important in cases where the user decides that no pixels should be lost (the trade-off is low compression ratio compared to the lossy modes); and (d) A hierarchical mode (pyramid) where the image is compressed at multiple resolutions allowing lower-resolution blocks to be viewed without first having to decompress the following higher-resolution blocks. Digital Image Processing \ Image Compression MSc Study / General Branch The main JPEG compression steps are outlined here, and each step is then described in detail later. 1. Color images are transformed from RGB into a luminance/chrominance color space. The eye is sensitive to small changes in luminance but not in chrominance, so the chrominance part can later lose much data, and thus be highly compressed, without visually impairing the overall image quality much. This step is optional but important because the remainder of the algorithm works on each color component separately. Without transforming the color space, none of the three color components will tolerate much loss, leading to worse compression. 2. Color images are down sampled by creating low-resolution pixels from the original ones (this step is used only when hierarchical compression is selected; it is always skipped for grayscale images). The down sampling is not done for the luminance component. Since the luminance component is not touched, there is no noticeable loss of image quality. grayscale images don't go through this step. 3. The pixels of each color component are organized in groups of 8×8 pixels called data units, and each data unit is compressed separately. If the number of image rows or columns is not a multiple of 8, the bottom row and the rightmost column are duplicated as many times as necessary. 4. The discrete cosine transform (DCT) is then applied to each data unit to create an 8×8 map of frequency components. They represent the average pixel value and successive higher-frequency changes within the group. This prepares the image data for the crucial step of losing information. Since DCT involves the transcendental function cosine, it must involve some loss of information due to the limited precision of computer arithmetic. This means that even without the main lossy step (step 5 below), there will be some loss of image quality, but it is normally small. 5. Each of the 64 frequency components in a data unit is divided by a separate number called its quantization coefficient (QC), and then rounded to an integer. This is where information is irretrievably lost. Large QCs cause more loss, so the high frequency components typically have larger QCs. Each of the 64 QCs is a JPEG parameter and can, in principle, be specified by the user. In practice, most JPEG implementations use the QC tables recommended by the JPEG standard for the luminance and chrominance image components. 6. The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded using a combination of RLE and Huffman coding. Digital Image Processing \ Image Compression MSc Study / General Branch 7. The last step adds headers and all the required JPEG parameters, and outputs the result. The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression method.) 2.1. Luminance Luminance is proportional to the power of the light source. It is similar to intensity, but the spectral composition of luminance is related to the brightness sensitivity of human vision. The eye is very sensitive to small changes in luminance, which is why it is useful to have color spaces that use Y as one of their three parameters. A simple way to do this is to subtract Y from the Blue and Red components of RGB, and use the three components Y, B − Y, and R − Y as a new color space. The last two components are called chroma. They represent color in terms of the presence or absence of blue (Cb) and red (Cr) for a given luminance intensity. The YCbCr color space was developed as part of Recommendation ITU- R BT.601 during the development of a worldwide digital component video standard. Y is defined to have a range of 16 to 235; Cb and Cr are defined to have a range of 16 to 240, with 128 equal to zero. Conversions between RGB with a 16–235 range and YCbCr are linear and thus simple. Transforming RGB to YCbCr is done by (note the small weight of blue): Y = (77/256)R + (150/256)G + (29/256)B, Cb = −(44/256)R − (87/256)G + (131/256)B + 128, Cr = (131/256)R − (110/256)G − (21/256)B + 128; while the opposite transformation is R = Y+1.371(Cr − 128), G = Y− 0.698(Cr − 128) − 0.336(Cb − 128), B = Y+1.732(Cb − 128). When performing YCbCr to RGB conversion, the resulting RGB values have a nominal range of 16–235, with possible occasional values in 0–15 and 236–255. Digital Image Processing \ Image Compression MSc Study / General Branch 2.2. DCT The JPEG standard calls for applying the DCT not to the entire image but to data units (blocks) of 8×8 pixels. The reasons for this are: (1) Applying DCT to large blocks involves many arithmetic operations and is therefore slow. Applying DCT to small data units is faster. (2) Experience shows that, in a continuous-tone image, correlations between pixels are short range. A pixel in such an image has a value (color component or shade of gray) that's close to those of its near neighbors, but has nothing to do with the values of far neighbors. The JPEG DCT is therefore executed by Equation below, duplicated here for n=8: The DCT is JPEG's key to lossy compression. The unimportant image information is reduced or removed by quantizing the 64 DCT coefficients, especially the ones located toward the lower-right. If the pixels of the image are correlated, quantization does not degrade the image quality much. For best results, each of the 64 coefficients is quantized by dividing it by a different quantization coefficient (QC). All 64 QCs are parameters that can be controlled, in principle, by the user. 2.3. Quantization After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized. This is the step where information is lost (except for some unavoidable loss because of finite precision of DCT calculations). Each number in the DCT coefficients matrix is divided by the corresponding number from the particular “quantization table” used, and the result is rounded to the nearest integer. As has already been mentioned, three such tables are needed, for the three color components. The JPEG standard allows for up to four tables, and the user can select any of the four for quantizing each color component. The 64 numbers that constitute each quantization table are all JPEG parameters. In principle, they can all be specified and fine-tuned by the user for maximum compression. In practice, few users have the patience or expertise to experiment with so many parameters. Digital Image Processing \ Image Compression MSc Study / General Branch Table 1. Recommended Quantization Tables. Table 2. Each element of F(u,v) is divided by the corresponding element of Q(u,v) and then rounded off to the nearest integer to get the F (u,v) matrix. Digital Image Processing \ Image Compression MSc Study / General Branch If the quantization is done correctly, very few nonzero numbers will be left in the DCT coefficients matrix, and they will typically be concentrated in the upper-left region. These numbers are the output of JPEG, but they are further compressed before being written on the output stream. In the JPEG literature this compression is called “entropy coding”. Three techniques are used by entropy coding to compress the 8 × 8 matrix of integers: 1. The 64 numbers are collected by scanning the matrix in zigzags (Figure 1). This produces a string of 64 numbers that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero numbers are output (after further compressing them) and are followed by a special end-of block (EOB) code. This way there is no need to output the trailing zeros (we can say that the EOB is the run-length encoding of all the trailing zeros). 2. The nonzero numbers are compressed using Huffman coding. 3. The first of those numbers (the DC coefficient) is treated differently from the others (the AC coefficients). Figure 1. Scanning the 8 × 8 matrix in zigzags then map the matrix to a 1 × 64 vector. Digital Image Processing \ Image Compression MSc Study / General Branch 2.4. Coding We first discuss point 3 above. Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at position (0, 0), the top left corner] and 63 AC coefficients. The DC coefficient is a measure of the average value of the 64 original pixels, constituting the data unit (i.e. 8×8 Block). Experience shows that in a continuous-tone image, adjacent data units of pixels are normally correlated in the sense that the average values of the pixels in adjacent data units are close. We already know that the DC coefficient of a data unit is a multiple of the average of the 64 pixels constituting the unit. This implies that the DC coefficients of adjacent data units don’t differ much. JPEG outputs the first one (encoded), followed by differences (also encoded) of the DC coefficients of consecutive data units. Example: If the first three 8×8 data units of an image have quantized DC coefficients of 1118, 1114, and 1119, then the JPEG output for the first data unit is 1118 (Huffman encoded, see below) followed by the 63 (encoded) AC coefficients of that data unit. The output for the second data unit will be 1114 − 1118 = −4 (also Huffman encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the output for the third data unit will be 1119 − 1114 = 5 (also Huffman encoded), again followed by the 63 (encoded) AC coefficients of that data unit. This way of handling the DC coefficients is worth the extra trouble, because the differences are small. Coding the DC differences is done with Table 4.63, so first here are a few words about this table. Each row has a row number (on the left), the unary code for the row (on the right), and several columns in between. Each row contains greater numbers (and also more numbers) than its predecessor but not the numbers contained in previous rows. The first DC coefficient to be encoded in our example is 1118. It resides in row 11 column 930 of the table (column numbering starts at zero), so it is encoded as 111111111110|01110100010 (the unary code for row 11, followed by the R-bit length (11-bit) binary value of 930). The second DC difference is −4. It resides in row 3 column 3 of Table 4.63, so it is encoded as 1110|011 (the unary code for row 3, followed by the R- bit length (3-bit) binary value of 3). The third DC difference is 5. It resides in row 3 column 5 of Table 4.63, so it is encoded as 1110|101 (the unary code for row 3, followed by the R- bit length (3-bit) binary value of 5). Digital Image Processing \ Image Compression MSc Study / General Branch Figure 2. mapping all of the 8×8 data units to 1 × 64 vectors Point 2 above has to do with the precise way the 63 AC coefficients of a data unit are compressed. It uses a combination of RLE and either Huffman or arithmetic coding. The idea is that the sequence of AC coefficients normally contains just a few nonzero numbers, with runs of zeros between them, and with a long run of trailing zeros. For each nonzero number x, the encoder: (1) Finds the number Z of consecutive zeros preceding x; (2) Finds x in Table 4.63 and prepares its row and column numbers (R and C); (3) The pair (R, Z) [that’s (R, Z), not (R, C)] is used as row and column numbers for Table 4.66; and (4) The Huffman code found in that position in the table is concatenated to C (where C is written as an R-bit number) and the result is (finally) the code emitted by the JPEG encoder for the AC coefficient x and all the consecutive zeros preceding it. As an example consider the sequence 1118, 2, 0,−2, 0,... , 0,-1,0,0,0,0,…..,0,0. 13 The first AC coefficient 2 has no zeros preceding it, so Z = 0. It is found in Table 4.63 in row 2, column 2, so R = 2 and C = 2. The Huffman code in position (R, Z) = (2, 0) of Table 4.66 is 01, so the final code emitted for 2 is: 01|10 , where 10 is column 2 written in R-bit (2 bits). The next nonzero AC coefficient,−2, has one zero preceding it, so Z = 1. It is found in Table 4.63 in row 2, column 1, so R = 2 and C = 1. The Digital Image Processing \ Image Compression MSc Study / General Branch Huffman code in position (R, Z) = (2, 1) of Table 4.66 is 11011, so the final code emitted for 2 is: 11011|01 , where 01 is column 1 written in R-bit (2 bits). The last nonzero AC coefficient −1, has 13 zero preceding, so Z = 13, It is found in Table 4.63 in row 1, column 0, so R = 1 and C = 0. The Huffman code in position (R, Z) = (1,13) of Table 4.66 is 1110101, so the final code emitted for -1 is: 1110101|0 where 0 is column 0 written in R-bit (1 bits). Finally, the sequence of trailing zeros is encoded as 1010 (EOB), so the output for the above sequence of AC coefficients is: 01101101101111010101010 We saw earlier that the DC coefficient is encoded as: 111111111110|01110100010 So the final output for the entire 64-pixel data unit is the 46-bit number: 1111111111100111010001001101101101111010101010 These 46 bits encode one color component of the 64 pixels of a data unit. Let’s assume that the other two color components are also encoded into 46- bit numbers. If each pixel originally consists of 24 bits, then this corresponds to a compression factor of: CF = 64 × 24/(46 × 3) ≈ 11.13; very impressive! Notice that the DC coefficient of 1118 has contributed 23 of the 46 bits. Subsequent data units code differences of their DC coefficient, which may take fewer than 10 bits instead of 23 (because the difference between successive DC's is much less than the value of the DC itself). They may feature much higher compression factors as a result. Exercise: a. Consider that the first three 8 × 8 data unit of an image have quantized DC coefficient of 64, 59, 65 give the JPEG encoding for those DC coefficients. b. Consider the sequence ( after scanning the 8 × 8 matrix in zigzag ) 15 64 3 0 -6 0 …… 0 -5 0 0 0 ……… 0 Where 64 represent the DC coefficient of the quantized 8 × 8 data unit and the rest are the AC coefficient. Give the JPEG encoding for the entire block coefficient, then calculate the compression factor. Digital Image Processing \ Image Compression MSc Study / General Branch Z: 0 1... 15 R 0: 1010 11111111001(ZRL) 1: 00 1100... 1111111111110101 2: 01 11011... 1111111111110110 3: 100 1111001... 1111111111110111 4: 1011 111110110... 1111111111111000 5: 11010 11111110110... 1111111111111001 : : : : Table 4.66: Coding AC Coefficients.

Digital Image Processing and Image Compression PDF - University of Technology

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue