RLE on Text Data PDF
Document Details
Uploaded by NimbleFrenchHorn
Tags
Summary
This document explains Run-Length Encoding (RLE) and its application on text and image data. The document examines how RLE reduces file sizes, using examples of black and white, and coloured images. It explains the basic concepts of RLE and demonstrates its application to image data, showing how to convert data into the RLE format.
Full Transcript
# RLE on text data Using the following text string: 'aaaaabbbbccddddd'. Assuming each character requires 1 byte then this string needs 16 bytes. If we assume ASCII code is used, then the string can be coded as follows: | a | a | a | a | A | b | b | b | b | c | c | d | d...
# RLE on text data Using the following text string: 'aaaaabbbbccddddd'. Assuming each character requires 1 byte then this string needs 16 bytes. If we assume ASCII code is used, then the string can be coded as follows: | a | a | a | a | A | b | b | b | b | c | c | d | d | d | d | d | |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| | 05 97 | 04 98 | 02 99 | 05 100 | This means we have five characters with ASCII code 97, four characters with ASCII code 98, two characters with ASCII code 99 and five characters with code 100. Assuming each number in the second row requires 1 byte of memory, the RLE code will need 8 bytes. This is half the original file size. One issue occurs with a string such as 'cdcdcdcdcd' where RLE compression is not very effective. To cope with this, we use a flag. A flag preceding data indicates that what follows are the number of repeating units (for example, 255 05 97, where 255 is the flag and the other two numbers indicate that there are five items with ASCII code 97). When a flag is not used, the next byte(s) are taken with their face value and a run of 1 (for example, 01 99 means one character with ASCII code 99 follows). Consider this example: | String | aaaaaaaa | bbbbbbbbbb | |---|---|---| | Code | 08 97 | 10 98 | 01 99 | 01 100 | 01 99 | 01 100 | 01 99 | 01 100 | The original string contains 32 characters and would occupy 32 bytes of storage. The coded version contains 18 values and would require 18 bytes of storage. Introducing a flag (255 in this case) produces: 255 08 97 || 255 10 98 || 99 100 99 100 99 100 || 255 08 This has 15 values and would, therefore, require 15 bytes of storage. This reduction in file size of about 53% when compared to the original string. # Example 1: Black and white image Figure 1.12 shows the letter 'F' in a grid where each square requires 1 byte of storage. A white square has a value 1 and a black square a value of 0: * 11111111 * 10000001 * 10111111 * 10111111 * 10000011 * 10111111 * 10111111 * 10111111 In compressed RLE format this becomes: * 9W 6B 2W 1B 7W 18 7W 5B 3W 18 7W 1B 7W 1B 6W Using W = 1 and B = 0 we get: * 91 60 21 10 71 10 71 50 31 10 71 10 71 10 61 The 8 x 8 grid would need 64 bytes; the compressed RLE format has 30 values, and therefore needs only 30 bytes to store the image. # Example 2: Coloured images Figure 1.13 shows an object in four colours. Each colour is made up of red, green and blue (RGB) according to the code on the right: |Square Colour | Red | Green | Blue | |---|---|---|---| | | 0 | 255 | 0 | | | 0 | 0 | 255 | | | 255 | 255 | 0 | | | 0 | 0 | 0 | This produces the following data: 2 0 0 0 4 0 255 0 3 0 0 0 6 255 255 255 1 0 0 0 2 0 255 0 4 255 0 0 4 0 255 0 1 255 255 255 2 255 0 0 1 255 255 255 4 0 255 0 4 255 0 0 4 0 255 0 4 255 255 255 2 0 255 0 1 0 0 0 2 255 255 255 3 0 0 0 4 0 255 0 2 0 0 0 The original image (8 x 8 square) would need 3 bytes per square (to include all three RGB values). Therefore, the uncompressed file for this image is 8 x 8 x 3 = 192 bytes. The RLE code has 92 values, which means the compressed file will be 92 bytes in size. This gives a file reduction of about 52%. It should be noted that the file reductions in reality will not be as large as this due to other data which needs to be stored with the compressed file (e.g. a file header).