Introduction to Compression Techniques Lecture 8 - Information Theory (PDF)

Lecture 8: Data Compression and Information Theory Why Compression? Multimedia applications generates a lot of data Need to compress data for efficient storage. Need to compress data for efficient transmission. Examples of applications that use compression. Video: MPEG Image: JPEG Audio: MP3 Text: Winzip Why Compression? Why data compression is possible? Redundancy exists in many places Texts Video and images Audio Eliminate redundancy – keep essential information Assume 8 bits per character Uncompressed: aaaaaaaaab: 10*8 = 80 bits Compressed: 9ab = 3*8 = 24 bits Why data compression is possible? Always possible to compress? Consider a three-bit sequence. Can you compress it to two bits or even to one bit? Information theory is needed to understand the limits of compression and give clues on how to compress in an efficient way. The Origin of Information Theory The term “entropy” was first used in thermodynamics and in statistical mechanics. However, Shannon’s work evolved from the field of electrical communication. Entropy in Thermodynamics In thermodynamics, entropy is a measure of thermal energy of a body of gas, i.e. the degree of disorder or randomness in the system. Statistical mechanics says that an increase in entropy means a decrease in predictability. Back to Information Theory The complexity of a system depends on our knowledge of the system. The more we know about the system, the less words we need to “describe” the system. In information theory, “the amount of information conveyed by a message increases as the amount of uncertainty as to what message actually will be produced becomes greater”. What is information? Can we measure information? Consider the two following sentences: 1. There is a traffic jam on I5 2. There is a traffic jam on I5 near Exit 234 Sentence 2 seems to have more information than that of sentence 1. From the semantic viewpoint, sentence 2 provides more useful information. What is information? It is hard to measure the “semantic” information! Consider the following two sentences 1. There is a traffic jam on I5 near Exit 160 2. There is a traffic jam on I5 near Exit 234 It’s not clear whether sentence 1 or 2 would have more information! What is information? Let’s attempt at a different definition of information. How about counting the number of letters in the two sentences: 1. There is a traffic jam on I5 (22 letters) 2. There is a traffic jam on I5 near Exit 234 (33 letters) Definitely something we can measure and compare! Frequency-based Coding Morse code Invented in 1838 by Morse for electrical telegraph, and expanded by Vail in 1844 To shorten the transmission of messages, English text was coded based on relative frequencies of occurrence Context-based Coding Braille code, by Louis Braille in 1825. Measure of Information Consider symbols si and the probability of occurrence of each symbol P(si) In case of fixed-length coding , smallest number of bits per symbol needed is log2(N) bits per symbol, N: Number of symbols. e.g. Message with 8 symbols need 3 bits ( log28) Entropy The number of bits needed to encode a media source is lower-bounded by its “Entropy”. Self information (the information we get when receiving each symbol) of an event A is defined as -logbP(A) where P(A) is the probability of event A. If b equals 2, the unit is “bits”. If b equals e, the unit is “nats” If b is 10, the unit is “hartleys” Variable-Length Coding- Entropy What is the minimum number of bits per symbol? Answer: Shannon’s result – theoretical minimum average number of bits per code word is the Entropy (H) n H   P( si ) log 2 P ( si ) i 1 Example A source outputs two symbols (the alphabet has 2 symbols) 0 or 1. P(0) = 0.25, P(1) = 0.75. Information we get when receiving a 0 is log2 (1/0.25) = 2 bit ; when receiving a 1 is log2 (1/0.75) = 0.4150 bit. Entropy (cont) Example: A source outputs two symbols (the alphabet has 2 letters) 0 or 1. P(0) = 0.25, P(1) = 0.75. What is the entropy of that system? H = 0.25 * log2 (1/0.25) +0.75 * log2(1/0.75) = 0.8113 bits We need at least 0.8113 bits per symbol in encoding. The Entropy of Text A text file with 256 possible values (characters). A={0, 1, 2, …, 255}. Assuming the characters are independent and they have equal probabilities, H = 256 * 1/256 *log2(256) = 8 bits What about a file with only 2 characters {a,b}? Assuming, P(a) = 0.5 and P(b) = 0.5. H = 1 bit Estimate the Entropy aaabbbbccccdd Assuming the symbols are independent: P(a) = 3/13 P(b) = 4/13 P(c) = 4/13 P(d) = 2/13 H = [-P(a)log2P(a)] + [-P(b)log2 P(b)] +[-P(c)log2P(c)]+ [-P(d)log2P(d)] = 1.95bits Entropy: Three properties 1. It can be shown that 0 ≤ H ≤ log2 N. 2. Maximum entropy (H = log2 N) is reached when all symbols are equiprobable, i.e., Pi = 1/N. 3. The difference log2 N – H is called the redundancy of the source. Data Compression Concepts x– original data, xc– compressed representation, y – reconstruction Lossless compression: when y is equal to x. Lossy compression: when y is different from x Lossless and Lossy Compressions Text compression techniques are often lossless Any counter examples? Image, audio, video compression techniques are often lossy. Any counter examples? Distortion: the difference between the original and the reconstruction. If the distortion is small, we say the “quality” or “fidelity” is high. Or, we say the reconstruction is a “high-definition” copy of the original Data Compression Concepts Compression ratio: |xc| : |x| or (|xc|/ |x|) *100% For example, |x| = 65536 bytes, |xc| = 16384 bytes, the compression ratio is 1:4 or 25%.

Introduction to Compression Techniques Lecture 8 - Information Theory (PDF)

Document Details

Tags

Related

Summary

Full Transcript