Chapter 7 - Introduction to Information Theory.pdf

Full Transcript

CMPF134 Chapter 7 – Introduction to Information Theory 1 Chapter Objectives ◦ To be able to describe the difference between data and information ◦ To be able to differentiate both analog and digital information ◦...

CMPF134 Chapter 7 – Introduction to Information Theory 1 Chapter Objectives ◦ To be able to describe the difference between data and information ◦ To be able to differentiate both analog and digital information ◦ To be exposed to the model of information theory’s concept ◦ To be able to understand all processes in the model of information theory ◦ To be able to understand two different types of data compression technique ◦ To be exposed to the concept of entropy 2 Introduction ◦ Imagine this, there is an irresistible promotion in Shopping Mall A (information) and you just can’t wait to share this good new with your best friends. ◦ How will you (sender) choose to spread out the information to your friends (receiver)? Will you call them? Or you would prefer to email them instead? ◦ If your friends are just standing right next to you, you would have probably share the information with them via face-to-face conversation ◦ What if your friends are so far away from you (in their hometowns)? You might want to consider emailing them then 3 Introduction (Cont.) ◦ Do you have any ideas about all these information theories?  Are the information analog @ digital?  Who @ what is sender @ receiver?  Where should the information travel in order to reach the receiver?  What should be done, if we want any information to reach the receiver quicker?  Why do information being received differently from what have been sent out (sometimes)? ◦ Perhaps you will obtain all the required information after understanding this chapter 4 Data vs Information ◦ Data is raw and unorganized facts (can be text, words, numbers, pictures, sound or video) that need to be processed by the computer ◦ When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. ◦ Information carries knowledge, and it can be passed from one person to another 5 Data vs Information (Cont.) Data Information Example: Example: Student-A CMPF134 average CMPF134 examination result examination result for whole class 6 Data vs Information (Cont.) ◦ Many of our earlier topics deal with different information representation: A number can be represented with different number systems A character can be represented by using different computer encoding systems ◦ The appropriate information representation to be chosen depends on: Purpose of information Who will use the information (computer or human?) The environment under which the information exists 7 Analog vs Digital Information ◦ There are two ways of representing information: analog and digital ◦ Analog information: Continuous signal Can take on an infinite set of values Examples: time shown in an analog clock, your voice in lecture class ◦ Digital information: Discreet signal Restricted to a finite set of values, represented in binary form Examples: time shown with digital clock, digitized image 8 How does digital device accepts analog information?? ◦ A system which uses both digital and analog circuits is compact disk (CD) player Music is stored on compact disk in digital form A laser diode optical system picks up the digital data from rotating disk and transfer them to digital-to-analog converter (DAC) DAC changes digital data to analog signal that is an electrical reproduction of the original music The signal is amplified and sent to speaker to be heard by audience When original music is recorded on the CD, a reverse process happened with an analog-to-digital converter (ADC) 9 How does digital device accepts analog information??(Cont.) 10 Digital Information ◦ Information in a computer is represented in digital form ◦ The basic representation of digital information is in bit ◦ If we are about to represent any digital information in 2-bit form, we will get four different symbols Symbol Bit 1 Bit 0 S0 0 0 S1 0 1 S2 1 0 S3 1 1 11 Digital Information (Cont.) ◦ We can get eight different symbols from 3-bit data representation form Symbol Bit 2 Bit 1 Bit 0 S0 0 0 0 S1 0 0 1 S2 0 1 0 S3 0 1 1 S4 1 0 0 S5 1 0 1 S6 1 1 0 S7 1 1 1 12 Model of Information Theory o Modern digital communication depends on information theory o Information theory elaborates about representation and transmission of information from one point to another o It is a mathematical representation of the conditions and parameters affecting the transmission and processing of information o The model of information theory is introduced by Claude E. Shannon (1948) 13 Model of Information Theory (Cont.) Source Channel Drain ◦ The most basic model of information theory consists of three main elements: Source: the location where information is generated Drain: the destination where information will be sent to Channel: a medium to transmit information from source to drain 14 Expanded Model of Information Theory ◦ However, information has to be encoded before transmission through channel ◦ The encoded information will then be decoded back before reaching the drain, so that it can be used by the drain ◦ Therefore, the basic model of information theory has been expanded as follows: SOURCE CHANNEL DRAIN 15 Expanded Model of Information Theory (Cont.) ◦ Source encoding: a process to transform information to appropriate format for presentation Another objective is to reduce size of source data (data compression) ◦ Channel encoding: A process of transforming information into appropriate format for transmission Another objective is to generate codes that can detect @ correct errors ◦ Source and channel decoding: Reverse the encoding processes so that information can be utilized by drain 16 Expanded Model of Information Theory (Cont.) Example 1: Let’s say Cinderella wants to invite Snow White for “Star Wars” movie ◦ Source: Cinderella (as information is in her brain) ◦ Source encoding: Forming sentences from idea in brain ◦ Channel encoding: Converting sentences into voice (to be spoken out from mouth as a sound wave) ◦ Channel: air ◦ Channel decoding: eardrums convert sound waves back to sentences ◦ Source decoding: brain converts sentence into information that Snow White can interpret ◦ Drain: Snow white (information is sent to her) 17 Expanded Model of Information Theory (Cont.) Example 2: Luke Skywalker is using his computer to send his CMPF134 assignment to Madam Low via email ◦ Source: Luke Skywalker’s computer (as information is in the PC, not his brain) ◦ Source encoding: Converting characters into ASCII codes (in bits) ◦ Channel encoding: Bits will be changed into electrical signals so that it can travel through channel ◦ Channel: computer networks ◦ Channel decoding: electrical signals are converted back to ASCII codes ◦ Source decoding: ASCII codes will be translated into characters to be shown on the monitor of drain 18 ◦ Drain: Madam Low’s PC Data Compression ◦ Data compression is the representation of an information source (e.g. a text file, an image, or a audio/video signal) by using the fewest number of bits ◦ File size can be reduced after data compression, which allows information to be sent through channel in a faster pace, besides reducing the possibilities of transmission errors (since lesser bits of data are transmitted) ◦ The main goal of data compression is to remove redundancy in information ◦ Data compression can be categorized into two main types: Lossless compression Lossy compression 19 Lossless Compression oWe can get back original data once it is decoded oHowever, compression ratio is lesser oMainly used for text data compression but also applicable for other types of information compression (such as image compression) oExample: Run Length Coding (RLC), Huffman Coding, Lempel–Ziv– Welch (LZW) Coding , Entropy encoding PNG and GIF are image file formats using Lossless Compression 20 Lossless Compression (Cont.) Example: Assuming that we have a set of data 111111222222222333333444444, what would be the compression ratio (data compression rate) by using RLC method? Solution: Step 1: identify number of digits in original data  27 digits Step 2: If each digit is represented as an 8-bit number, how many bits do we need?  27 x 8 = 216 bits Step 3: New data representation {(1,6),(2,9),(3,6),(4,6)} Step 4: How many digits do we need this time?  8 digits 21 Lossless Compression (Cont.) Solution (cont.): Step 5: If each digit is represented as an 8-bit number, how many bits do we need? ◦ 8 x 8 = 64 bits ◦ Initially, we need 216 bits. ◦ We have saved 216 – 64 = 152 bits! Step 6: Calculating compression ratio ◦ Number of bits before compression = 216 ◦Number of bits after compression = 64 ◦Compression rate = (216 – 64)/216 * 100 = 70.37% 22 Lossy Compression ◦ Reduces a file size by permanently eliminating certain redundant information ◦ Mainly applied to audio, image, and video file ◦ Has higher compression rate but slightly lower data accuracy than lossless compression (the decompressed image is not exactly identical to the original image) ◦ Example:  Transform Coding, Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), Fractal Compression  JPEG (lossy compression image file format), MP3 (lossy compression audio file format), MPEG (lossy compression video file format) 23 Channel Capacity ◦ As mentioned previously, information is generated at the source, sent through the channel and consumed in the drain ◦ The analogy of channel capacity is similar to the example of a pipe channel water to fill up the basin 24 Channel Capacity (Cont.) ◦ The time taken to fill up the basin depends very much on the diameter of the pipe ◦ Similarly, a channel with more capacity can transmit more information from the source to the drain within a specified period of time The capacity of a channel is commonly called bandwidth Specified in bits per second ◦ If a channel has a capacity to transmit N bits/s, and M bits have to be sent, it takes M/N seconds for the message to get through 25 Channel Capacity (Cont.) Example 1: How long does it take to send an email of 12.4 KB across a channel of 56 Kbps? Solution: Transmission time = (12.4 * 8 Kbits) / (56 Kbps) = 1.8 seconds 26 Channel Capacity (Cont.) Example 2: If a channel has a capacity of 10 MBps (MByte/s), how many audio signals of 128 Kbps can it carry? Solution: ◦ 10 MBps = 10 x 103 x 8 = 80000 Kbps ◦ No. of signals = 80000 / 128 = 625 27 Noise in Channel ◦ Noise is one of the main factor which limits the performance of a communication system, also called interference ◦ In electronic communication, noise can be any unwanted signals that enter communication system via channel and interferes with the transmitted message, which cause transmission errors NOISE SOURCE CHANNEL DRAIN 28 Noise in Channel (Cont.) ◦ Noise make it difficult for receiver (drain) to understand information that is transferred  Example: When your lecturer (source) is conducting CMPF134 lecture class in a very noisy classroom (channel), the students (drain) couldn’t be able to understand information sent out by the lecturer clearly ◦ Errors happen when the drain receive information which is different from what has been sent out from source  Using human speaking analogy: when speaker mentions a word, the audiences hear another word  In digital communication system: original bits of data has been changed to other combination (1 to 0 or vice versa) 29 Error Control ◦ A good communication system should allow information to be transferred from one device to another accurately ◦ Two general types of error can occur: Single-bit error (only 1-bit of data is being changed, usually happen in parallel transmission) Burst error (more than one bit of data has been changed from 1-0 or 0-1, mostly happen in serial transmission) 30 Error Control (Cont.) ◦ For reliable communication, errors should be controlled well, which include: Error detection Example: Parity Check, Cyclic Redundancy Check (CRC) Error correction Example: Hamming Code 31 Parity Check ◦ The simplest error detection mechanism is parity check ◦ A parity bit is attached to a group of bits to make the total number of 1s in a group always even or odd ◦ There are two main schemes: ◦ Odd parity: makes the total number of 1s odd ◦ Even parity: makes the total number of 1s even 32 Parity Check (Cont.) Even Parity Odd Parity ◦ The parity bit can be attached to P BCD P BCD the code at either the beginning or 0 0000 1 0000 the end, depending on system 1 0001 0 0001 design 1 0010 0 0010 ◦ Table on the right shows the 0 0011 1 0011 illustration of how a parity bit, P is 1 0100 0 0100 attached to a BCD code 0 0101 1 0101 0 0110 1 0110 1 0111 0 0111 1 1000 0 1000 0 1001 1 1001 33 Parity Check (Cont.) ◦Limitations of parity checking error detection: - Detects any odd number of errors - Cannot detect even number of errors 34 Exercise 1 An odd parity system receives the following code groups: 10110, 11010, 110011, 110101110100 and 1100010101010. Determine which groups (if any) are in error. Explain your answer. Solution: ◦ 110011 and 1100010101010 ◦ Since odd parity is required, any group with an even number of 1s are in error 35 Cyclic Redundancy Check ◦ It is a widely used code for detecting one- and two-bit transmission errors when digital data are transferred on a channel  Channel can be between:  two computers that are connected to a network  a digital storage device (e.g. CD, DVD, USB thumb drive) and a PC ◦ A certain number of CRC check bits (sometimes called a checksum) are appended to the data bits (added to end) that are being transmitted ◦ The transmitted data are tested by the receiver for errors using the technique of CRC ◦ Not every possible error can be identified by using this technique, but it is more efficient than simple parity check 36 Cyclic Redundancy Check (Cont.) ◦ CRC is performed based on the division of two binary numbers (using modulo-2 operations), and executed as follows: 1. Select a fixed generator code (agreed by both source and drain), lets say with the word size of n-bit 2. Append (n-1) of 0s to the end of data bits 3. Divide data bits (including the appended bits) by generator code bits  If the remainder is 0, the data and appended bits are sent as it is  If the remainder is not 0, it will be treated as CRC code (size = n-1 bit) and to be appended to the end of original data bits before being sent out Remainder should be 0 now when newly appended data bits is divided by generator code 37 Cyclic Redundancy Check (Cont.) 4. At the receiving end, receiver divides the incoming appended data bits [original data + (n-1) bit CRC code] by the same generator code as sent by sender  If the remainder is 0, there is no error detected  If remainder is not 0, an error has been detected and a re-transmission is requested by the receiver 38 Exercise 2 Determine the transmitted CRC for the following byte of data (D) and generator code (G). Verify that the remainder is 0. D = 1101 0011, G = 1010 Solution: Step 1: identify fixed generator code, G = 1010 Step 2: Since generator code has 4 bits, add three 0s to the data bits D D’ = 1101 0011 000 39 Exercise 2 (Cont.) Solution (Cont.): Step 3: Divide appended data (D’) by the generator code (G) until all bits are used 1010 11010011000 1010 1110 1010 1000 1010 1011 1010 1000 1010 010 ◦ Remainder = CRC code = 010 40 Exercise 2 (Cont.) Solution (Cont.): 1010 11010011010 1010 1110 1010 1000 1010 1011 1010 1010 1010 ◦ Remainder = 0 (verified) 41 Hamming Code ◦ It is used to detect and correct a single-bit error in a transmitted code ◦ Hamming(7,4) is a single-error correcting code that uses a 7-bit code to transmit 4-bit original data ◦ The sender computes 3-bits parity check bits for each 4-bit data word to form a 7-bit code, then it will be sent to drain ◦ The receiver will then computes another 3-bits parity check bits from the received 7-bit word 42 Hamming Code (Cont.) ◦ If no error occurred in transmission, the comparison of both sets of 3-bit parity check bits will return zero ◦ If a single bit has been changed in transmission, the value returned from comparison between these two sets of 3-bit parity check bits will indicate the position of the error, which can then be corrected 43 Entropy ◦ In information theory, Shannon Entropy is defined as the measurement of average information (in bits) contained in a received message ◦ Or, we can define entropy as a measure of the content of a message evaluated with respect to its probability of occurrence, or uncertainty 44 Entropy (Cont.) ◦ Imagine that machine A and machine B are generating output to form a string, from a group of alphabets ‘A’, ‘B’, ‘C’ and ‘D’ ◦ Machine A generates output randomly, which means the possibility of selecting each alphabet is equally same: ◦ P(A) = P(B) = P(C)= P(D) = 0.25 ◦ Machine B generates output based on probability as given below: ◦ P(A) = 0.5 ◦ P(B) = P(C) = 0.125 ◦ P(D) = 0.25 45 Entropy (Cont.) ◦ Question: Which machine is producing MORE information? ◦ Shannon had rephrased the question in such a way: “What would be the MINIMUM “Yes” or “No” questions that we would ask in order to guess the next symbol that will be generated by each machine?” 46 Entropy (Cont.) ◦ Let’s have a look at machine A Q1: is it A or B? Y N Q2: is it A? Q2: is it C? Y N Y N A B C D ◦ We can start by asking any possible symbols (either AB or CD) since all symbols share the same probability ◦ Therefore, 2 questions are asked to get every single symbol 47 Entropy (Cont.) ◦ How about machine B? Q1: is it A? N Y Q2: is it D? A Y N D Q3: is it B? Y N B C ◦ We will start by asking question about symbol A since the probability is the highest, followed by question about symbol D (with second highest probability) 48 Entropy (Cont.) ◦ Number of questions asked to predict the next symbol in machine B is generated as below: Number of questions = P(A)*1 + P(B)*3 + P(C)*3 + P(D)*2 = 0.5*1 + 0.125*3 + 0.125*3 + 0.25*2 = 0.5 + 0.375 + 0.375 + 0.5 = 1.75 49 Entropy (Cont.) ◦ We can summarize that machine A requires 2 questions and machine B requires 1.75 questions to predict the next possible symbol to be shown ◦ In other words, if we need to guess the next possible 100 symbols for machine A and machine B, we will need to ask 200 questions and 175 questions to each machine respectively ◦ This means that machine B is producing LESS information because there is less uncertainty or surprise of its possible output ◦ Shannon called this measure of “average of uncertainty” as Entropy, and the unit measurement is known as bit 50 Entropy (Cont.) ◦ Entropy is maximum (more uncertainty) when the possibility for all outcomes are equally same ◦ Entropy is minimum when we move away from equally likely outcomes ◦ Which means that, we shall ask fewer questions to guess the outcome, should the entropy goes down 51 Why Entropy? ◦ Shannon’s Entropy is a measure of uncertainty reduction in the receiver’s knowledge ◦ It is very useful in digital communication as it helps in finding ways to optimize physical encoding of information for the purpose of transmission 52

Use Quizgecko on...
Browser
Browser