Full Transcript

Algorithms and Data Structures School year 2024-2025 Lecture 3 – String Processing : Compression Sumário String processing  Compression ◼ Encoding according to series lengths ◼ Case of text files ◼ Case of bonary files ◼ Enco...

Algorithms and Data Structures School year 2024-2025 Lecture 3 – String Processing : Compression Sumário String processing  Compression ◼ Encoding according to series lengths ◼ Case of text files ◼ Case of bonary files ◼ Encoding according to variable lengths (Huffman method) 2 String processing - Compression Objetive (algorithms of compression): Reduce space occupied by data without much concern about the time spent on the process Opposite perspective to that which normally presides over the development of algorithms: “Minimize execution time while conserving if possible, space” 3 String processing - Compression Approach: Take advantage of data redundancy: In text files, where there are characters that are repeated very frequently In image files (“master files”), where there are large homogeneous areas In files for the digital representation of sound and other analog signals, with highly repeated patterns 4 String processing - Compression Results (typical): Savings of 20% to 50% on text files Savings of 50% to 90% on binary files that do not have a random distribution of bits (in which case the gains are few!) 5 String processing - Compression Absurd: “Having such a good method available which, once applied to any file, reduces the space occupied” If so, applying the method to the original file would cause a first reduction, creating a new compressed file. A 2nd application of the method to the latter would cause a new reduction, and so on In other words, it would be possible to produce a file as small as you wanted! 6 String Processing - Compression Conclusion: Any compression method, no matter how good it is, will, in certain cases, result in files larger than the original file! 7 String Processing - Compression Methods: ❖SERIES LENGTHS ❖VARIABLE LENGTHS (Huffman Method) 8 Series lengths Encoding according to series lengths This compression process is of interest when there are sequences of repeated characters in the file to be compressed The idea of ​the method is to replace each sequence by the character that is repeated and a counter For example, we can say that the text DDDDBBCCC2222AA.... is made up of 4 D’s, followed by 2 B’s, 3 C’s, 4 2’s, etc. 9 Series lengths This idea can be realized in different ways, depending, for example, on the type of characters that make up the text a) Text files containing only letters........................ N repeated letters........................ n........................ Counter Letter that was repeated in the original text Note: There is no interest to encode the sequences where n

Use Quizgecko on...
Browser
Browser