Introduction to Information Theory PDF

Document Details

TenderOak

Uploaded by TenderOak

UP Diliman

Tags

information theory communication theory digital communication computer science

Summary

This document provides an introduction to information theory, focusing on the fundamental concepts and principles. It explores the roots of digital communication, Shannon's work, and the applications of information theory in various fields. Concepts like source coding theorem and channel coding theorem are discussed.

Full Transcript

Introduction to Information Theory Father of Digital Communication The roots of modern digital communication stem from the ground-breaking paper “A Mathematical Theory of Communication” by Claude Elwood Shannon in 1948. Model of a Digital Communication System...

Introduction to Information Theory Father of Digital Communication The roots of modern digital communication stem from the ground-breaking paper “A Mathematical Theory of Communication” by Claude Elwood Shannon in 1948. Model of a Digital Communication System Message Encoder e.g. English symbols e.g. English to 0,1 sequence Information Coding Source Communication Channel Destination Decoding Can have noise or distortion Decoder e.g. 0,1 sequence to English Communication Channel Includes Shannon’s Definition of Communication “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” “Frequently the messages have meaning” “... [which is] irrelevant to the engineering problem.” Shannon Wants to… Shannon wants to find a way for “reliably” transmitting data throughout the channel at “maximal” possible rate. Information Coding Source Communication Channel Destination Decoding For example, maximizing the speed of ADSL @ your home And he thought about this problem for a while… He later on found a solution and published in this 1948 paper. In his 1948 paper he build a rich theory to the problem of reliable communication, now called “Information Theory” or “The Shannon Theory” in honor of him. Shannon’s Vision Source Channel Data Encoding Encoding Channel Source Channel User Decoding Decoding Example: Disk Storage Data Zip Add CRC Channel Verify User Unzip CRC In terms of Information Theory Terminology Source Zip = Encoding Data Compression = Source Unzip Data Decompression Decoding Channel Error Protection Add CRC = Encoding Verify Channel CRC = Decoding Error Correction Example: VCD and DVD MPEG RS Movie Encoder Encoding CD/DVD MPEG RS TV Decoder Decoding RS stands for Reed-Solomon Code. Example: Cellular Phone Speech CC Encoding Encoding Channel Speech CC Decoding Decoding GSM/CDMA CC stands for Convolutional Code. Example: WLAN IEEE 802.11b CC Data Zip Encoding Channel CC User Unzip Decoding IEEE 802.11b CC stands for Convolutional Code. Shannon Theory The original 1948 Shannon Theory contains: 1. Measurement of Information 2. Source Coding Theory 3. Channel Coding Theory Measurement of Information Shannon’s first question is “How to measure information in terms of bits?” = ? bits = ? bits Or Lottery!? = ? bits All events are probabilistic! Using Probability Theory, Shannon showed that there is only one way to measure information in terms of number of bits: H ( X ) = −  p( x ) log 2 p( x ) x called the entropy function For example Tossing a dice: – Outcomes are 1,2,3,4,5,6 – Each occurs at probability 1/6 – Information provided by tossing a dice is 6 6 H = −  p(i ) log 2 p(i ) = −  p(i ) log 2 p(i ) i =1 i =1 6 1 1 = −  log 2 = log 2 6 = 2.585 bits i =1 6 6 Wait! It is nonsense! The number 2.585-bits is not an integer!! What does you mean? Shannon’s First Source Coding Theorem Shannon showed: “To reliably store the information generated by some random source X, you need no more/less than, on the average, H(X) bits for each outcome.” Meaning: If I toss a dice 1,000,000 times and record values from each trial 1,3,4,6,2,5,2,4,5,2,4,5,6,1,…. In principle, I need 3 bits for storing each outcome as 3 bits covers 1-8. So I need 3,000,000 bits for storing the information. Using ASCII representation, computer needs 8 bits=1 byte for storing each outcome The resulting file has size 8,000,000 bits But Shannon said: You only need 2.585 bits for storing each outcome. So, the file can be compressed to yield size 2.585x1,000,000=2,585,000 bits Optimal Compression Ratio is: 2,585,000 = 0.3231 = 32.31% 8,000,000 Let’s Do Some Test! File Size Compression Ratio No 8,000,000 100% Compression bits Shannon 2,585,000 32.31% bits Winzip 2,930,736 36.63% bits WinRAR 2,859,336 35.74% bits The Winner is I had mathematically claimed my victory 50 years ago! Follow-up Story Later in 1952, David Huffman, while was a graduate student in MIT, presented a systematic method to achieve the optimal (1925-1999) compression ratio guaranteed by Shannon. The coding technique is therefore called “Huffman code” in honor of his achievement. Huffman codes are used in nearly every application that involves the compression and transmission of digital data, such as fax machines, modems, computer networks, and high-definition television (HDTV), to name a few. So far… but how about? Source Channel Data Encoding Encoding Channel Source Channel User Decoding Decoding The Simplest Case: Computer Network Communications over computer network, ex. Internet The major channel impairment herein is Packet Loss Binary Erasure Channel Impairment like “packet loss” can be viewed as Erasures. Data that are erased mean they are lost during transmission… 1-p 0 0 p Erasure p 1 1 1-p p is the packet loss rate in this network Once a binary symbol is erased, it can not be recovered… Ex: Say, Alice sends 0,1,0,1,0,0 to Bob But the network was so poor that Bob only received 0,?,0,?,0,0 So, Bob asked Alice to send again Only this time he received 0,?,?,1,0,0 and Bob goes CRAZY! What can Alice do? What if Alice sends 0000,1111,0000,1111,0000,0000 Repeating each transmission four times! What Good Can This Serve? Now Alice sends 0000,1111,0000,1111,0000,0000 The only cases Bob can not read Alice are for example ????,1111,0000,1111,0000,0000 all the four symbols are erased. But this happens at probability p4 Thus if the original network has packet loss rate p=0.25, by repeating each symbol 4 times, the resulting system has packet loss rate p4=0.00390625 But if the data rate in the original network is 8M bits per second 8Mbps Alice p=0.25 Bob With repetition, Alice can only transmit at 2 M bps 8Mbps 2 Mbps X4 Alice Bob p=0.00390625 Shannon challenged: Is repetition the best Alice can do? And he thinks again… Shannon’s Channel Coding Theorem Shannon answered: “Give me a channel and I can compute a quantity called capacity, C for that channel. Then reliable communication is possible only if your data rate stays below C.” ? ? ? ? What does Shannon mean? Shannon means In this example: 8Mbps p=0.25 Alice Bob He calculated the channel capacity C=1-p=0.75 And there exists coding scheme such that: 8Mbps ? Alice 8 x (1-p) =6 Mbps p=0 Bob Channel Capacity in terms of Signal to Noise Ratio C = B log2 (S/N) bits/s Unfortunately… I do not know exactly HOW? Neither do we…  But With 50 Years of Hard Work We have discovered a lot of good codes: – Hamming codes – Convolutional codes, – Concatenated codes, – Low density parity check (LDPC) codes – Reed-Muller codes – Reed-Solomon codes, – BCH codes, – Finite Geometry codes, – Cyclic codes, – Golay codes, – Goppa codes – Algebraic Geometry codes, – Turbo codes – Zig-Zag codes, – Accumulate codes and Product-accumulate codes, – … We now come very close to the dream Shannon had 50 years ago! ☺ Nowadays… Source Coding Theorem has applied to JPEG 2000 Image Audio/Video MPEG Compression Compression Data Audio Compression MP3 Compression Channel Coding Theorem has applied to VCD/DVD – Reed-Solomon Codes Wireless Communication – Convolutional Codes Optical Communication – Reed-Solomon Codes Computer Network – LT codes, Raptor Codes Space Communication Shannon Theory also Enables Space Communication In 1965, Mariner 4: Frequency =2.3GHz (S Band) Data Rate= 8.33 bps No Source Coding Repetition code (2 x) In 2004, Mars Exploration Rovers: Frequency =8.4 GHz (X Band) Data Rate= 168K bps 12:1 lossy ICER compression Concatenated Code In 2006, Mars Reconnaissance Orbiter Communicates Faster than Frequency =8.4 GHz (X Band) Data Rate= 12 M bps 2:1 lossless FELICS compression (8920,1/6) Turbo Code At Distance 2.15 x 108 Km And Information Theory has Applied to All kinds of Communications, Stock Market, Economics Game Theory and Gambling, Quantum Physics, Cryptography, Biology and Genetics, and many more…

Use Quizgecko on...
Browser
Browser