1 Unit_Data Representation.pdf
Document Details
Uploaded by UpscaleFlashback2648
Irushadhiyya School
Cambridge IGCSE
Tags
Full Transcript
UNIT 1 DATA REPRESENTATION Binary Number System and Number Conversions Hexadecimal Number System Binary Addition File size calculation Data compression Irushadhiyya school Computer science IGCSE 0478 DATA REP...
UNIT 1 DATA REPRESENTATION Binary Number System and Number Conversions Hexadecimal Number System Binary Addition File size calculation Data compression Irushadhiyya school Computer science IGCSE 0478 DATA REPRESENTATION UNIT I Number System Bits and Bytes Theory A binary number is either a 0 or a 1 and is known as a 'bit' or binary digit. However, the CPU cannot deal with just one bit at a time, it is just too small. It usually deals with 8 bits at a time, which is known as a Byte. 11100101 is a byte, and 10000111 is also a byte or any other combination you can think of which contains 8 zeros and ones. A group of four bits is a nibble, two nibbles (eight bits) are a byte. File File size Photo 3MB Song 5MB Film 700MB ASCII ASCII (American Standard Code for Information Interchange) A type of code for data transmission. The ASCII translates all letters, characters, and symbols into code that was widely used in most computer systems for many years. There are two types of ASCII codes; The standard code uses a seven-bit encoding system, while the extended uses an eight-bit system. 128 possible characters are defined. The ASCII table is divided into 3 different sections. Non-printable, system codes between 0 and 31. Lower ASCII, between 32 and 127. This table originates from the older, American systems, which worked on 7-bit character tables. Higher ASCII, between 128 and 255. This portion is programmable; characters are based on the language of your operating system or program you are using. Foreign letters are also placed in this section. Each ASCII character occupies just one byte. The eight-bit binary code or byte which represents the letter ‘And ‘Z’ is shown below. Value 128 64 32 16 8 4 2 1 A On/Off Signal 0 1 0 0 0 0 0 1 Always the same. As “A” is the first letter in the alphabet, the ASCII code must equate to 1. Value 128 64 32 16 8 4 2 1 Z On/Off Signal 0 1 0 1 1 0 1 0 Always the same. As “Z” is the last letter in the alphabet, the ASCII code must equate to 26. i.e 16 + 8 + 2. IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 1 DATA REPRESENTATION UNIT I It will take 5 bytes of memory (RAM) to store the word ‘HELLO’. All information that needs to be accessed by the CPU is held in the RAM. It does this through a method called, 'addressing'. Every storage location in RAM has a unique address. Each address contains a byte that represents data in the form of: a number a character or string of characters a computer instruction part of a picture etc But remember - all this information is stored as strings of 1s and 0s i.e. binary code. Standard or Lower ASCII characters and codes: 7-bit Character Tables Extended or Higher ASCII characters and codes: Extended ASCII uses eight instead of seven bits, which adds 128 additional characters. This gives extended ASCII the ability for extra characters, such as special symbols, foreign language letters, and drawing characters as shown below Dec Char Dec Char Dec Char Dec Char Dec Char Dec Char 33 ! 49 1 65 A 81 Q 97 a 113 q 34 " 50 2 66 B 82 R 98 b 114 r 35 # 51 3 67 C 83 S 99 c 115 s 36 $ 52 4 68 D 84 T 100 d 116 t 37 % 53 5 69 E 85 U 101 e 117 u 38 & 54 6 70 F 86 V 102 f 118 v 39 ' 55 7 71 G 87 W 103 g 119 w 40 ( 56 8 72 H 88 X 104 h 120 x 41 ) 57 9 73 I 89 Y 105 i 121 y 42 * 58 : 74 J 90 Z 106 j 122 z 43 + 59 ; 75 K 91 [ 107 k 123 { 44 , 60 < 76 L 92 \ 108 l 124 | 45 - 61 = 77 M 93 ] 109 m 125 } 46. 62 > 78 N 94 ^ 110 n 126 ~ 47 / 63 ? 79 O 95 _ 111 o 127 _ 48 0 64 @ 80 P 96 ` 112 p Unicode: A system of encoding text in computing widely used on the internet. Difference between ASCII and Unicode Each character only takes up 8 bits, meaning that storing data in ASCII may take up less memory than in Unicode ASCII stores a much smaller character set than Unicode, meaning that you are limited to the Latin character set and cannot represent characters from other languages. Unicode, as it would allow you to display non-Latin characters, sets such as Hindi and Cyrillic Unicode is a superset of ASCII, and the numbers 0–128 have the same meaning in ASCII as they have in Unicode. For example, the number 65 means "Latin capital 'A'". Because Unicode characters don't generally fit into one 8-bit byte, there are numerous ways of storing Unicode characters in byte sequences, such as UTF-32 and UTF-8. IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 2 DATA REPRESENTATION UNIT I ASCII UNICODE Uses only 8 bits Unicode has a much longer character set ASCII standard only for English Unicode uses 16 bits Languages. For foreign languages, it allotted 128 to 255 which is not enough. As the Unicode character set is longer, any language can be represented. Unicode is the superset of ASCII Number Bases: Decimal and Binary Decimal: It is thought that humans use base ten (decimal) because we have ten fingers on our hands and feet. In base ten there are ten different digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. When you want to represent the next number i.e. ten, you use a combination of these digits i.e. 10. Binary: In contrast to humans, computers use base two (binary) in which there are just two different digits: 0 and 1. This is because 0 and 1 are easy to represent electrically i.e. off is 0 and on is 1. Binary starts off in the same way as a decimal: 0 (zero), 1 (one), We have to use a combination of 0’s (zeroes) and 1’s (ones) so that 00000010 represents two. In the same way that columns of digits in decimal represent different (successive) powers of ten, the columns in binary represent successive powers of two. Easy reference:` Decimal Binary 0 0 first column 10 1 2 1 second column 101 10 21 2 third column 102 100 22 4 fourth column 103 1000 23 8 fifth column 104 1 0000 24 16 sixth column 105 10 0000 25 32 seventh column 106 100 0000 26 64 eighth column 107 1000 0000 27 128 ninth column 108 1 0000 0000 28 256 tenth column 109 10 0000 0000 29 512 eleventh column 1010 100 0000 0000 210 1024 A decimal number with a few digits can be expressed in binary form using a large number of digits. Thus the number 65 can be expressed in binary form as 1000001. Octal Number The binary form can be expressed more compactly by grouping 3 binary digits together to form an octal number. An octal number with base 8 makes use of the EIGHT digits 0,1,2,3,4,5,6 and 7. Hexadecimal Number A more compact representation is used by Hexadecimal representation which groups 4 binary digits. It can make use of 16 digits, but since we have only 10 digits, the remaining 6 digits are made up of the first 6 letters of the alphabet. Thus the hexadecimal base uses 0,1,2,….8,9,A,B,C,D,E,F as digits. IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 3 DATA REPRESENTATION UNIT I To summarize Decimal: base 10 Binary: base 2 Octal: base 8 Hexadecimal: base 16 Conversion of binary to decimal (base 2 to base 10) Each position of a binary digit can be replaced by an equivalent power of 2 as shown below. Thus to convert any binary number replace each binary digit (bit) with its power and add up. Example: convert (1011)2 to its decimal equivalent Represent the weight of each digit in the given number using the above table. Now add up all the powers after multiplying by the digit values, 0 or 1 (1011)2 = 23 x 1 + 22 x 0 + 21 x 1 + 20 x 1 =8+0+2+1 = 11 Example2: convert (1000100)2 to its decimal equivalent = 26 x 1 + 25 x 0 +24 x 0+ 23 x 0 + 22 x 1 + 21 x 0 + 20 x 0 = 64 + 0 + 0+ 0 + 4 + 0 + 0 = (68)10 Another example: convert 110112 to decimal 11011 \ \ \_________1 x 20 = 1 \ \ \__________1 x 21 = 2 \ \_____________1 x 23 = 8 \______________1 x 24 = 16 27 Conversion of decimal to binary (base 10 to base 2) Here we keep on dividing the number by 2 recursively till it reduces to zero. Then we print the remainders in reverse order. Example: convert (68)10 to binary 2 68 2 34 -0 2 17 –0 2 8 –1 2 4 -0 2 2 -0 2 1 -0 IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 4 DATA REPRESENTATION UNIT I We stop here as the number has been reduced to zero and collect the remainders in reverse order. Answer = 1 0 0 0 1 0 0 Note: the answer is read from the bottom (MSB, most significant bit) to the top (LSB least significant bit) as (1000100)2. You should be able to write a recursive function to convert a binary integer into its decimal equivalent. Hexadecimal Number System Base or radix 16 number system. 1 hex digit is equivalent to 4 bits. Numbers are 0,1,2…..8,9, A, B, C, D, E, F. B is 11, E is 14 Numbers are expressed as powers of 16. 160 = 1, 161 = 16, 162 = 256, 163 = 4096, 164 = 65536, … Conversion of hex to decimal ( base 16 to base 10) Example: convert (F4C)16 to decimal = (F x 162) + (4 x 161) + (C x 160) = (15 x 256) + (4 x 16) + (12 x 1) Conversion of decimal to hex ( base 10 to base 16) Example: convert (4768)10 to hex. = 4768 / 16 = 298 remainder 0 = 298 / 16 = 18 remainder 10 (A) = 18 / 16 = 1 remainder 2 = 1 / 16 = 0 remainder 1 Answer: 1 2 A 0 Note: the answer is read from bottom to top, same as with the binary case. = 3840 + 64 + 12 + 0 = = (3916)10 Conversion of binary to hex Conversion of binary numbers to hex simply requires grouping bits in the binary numbers into groups of four bits. Groups are formed beginning with the LSB and progressing to the MSB. 1110 01112 = E716 1 1000 1010 1000 01112 = 0001 1000 1010 1000 01112 = 1 8 A 8 716 IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 5 DATA REPRESENTATION UNIT I IRUSHADIHYYA SCHOOL/COMPUTER SCIENCE/0478 6 Hexadecimal Numbers UNIT I Hexadecimal Numbers in Computer Why are the Hexadecimal numbers used? It is easier to express the binary numbers in hexadecimal than in any other base number. It is for users convenient to write and remember hexadecimal numbers more than binary numbers. Uses of Hexadecimal Numbers Hexadecimal numbers are used in computer registers and main memory : Most software programmers convert binary to hex before moving values around. The main advantage is that 16-bit words can be represented in only four Hex, which saves paper and screen space. Programs such as DEBUG use only Hexadecimal to display binary bytes of a memory Dump instead of ones and zeros. Hexadecimal in HTML: In HTML, hexadecimal numbers are mainly used for color-coding. The different intensities of the three primary colors (Red, Green, and Blue) are determined by their hexadecimal values so that any color user can create. #FF0000 - Red #FFFF00 – Yellow #00FF00 - Green #0000FF - Blue #FF00FF – Violet #00FFFF – Cyan Hexadecimal in MAC Address: The MAC address is usually made up of 48 bits, which shows as six groups of hexadecimal digits. 00-1c-b3-4f-25-fe Hexadecimal in Web addresses: Each character on the keyboard has its own ASCII code. These codes can be represented using hexadecimal values or decimal values. The URL or web address is mostly replaced by Hexadecimal numbers. Example: www.hodder.co.uk will becomes %77%77%77%2E%68%6F%64%64%65%72%2E%63%6F%2E%75%6B Hexadecimal in Assembly code and machine code: Using hexadecimal numbers in assembly code and machine code is easier, faster, and less error-prone to write code compare to binary. IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 7 Binary Addition UNIT I Representation of Binary Integers Whole numbers such as 5, 7, 12, and 3988, are called integers. Unsigned integers have positive values by definition, while signed integers can be positive or negative. Unsigned integers in binary: The unsigned integers are represented as 2n (known knowledge) The minimum value that can be represented with n bits is 0, for which all the n bits are set to 0. The maximum value occurs when all the n bits are set to 1, and it is calculated as 1 + 2 + …….… + 2n-2 + 2 n-1, which is equal to 2n−1. Note: 1. What is the highest value unsigned integer that can be represented with 16 bits? 216 -1 =65,536-1 = 65,535 2. What is the lowest value unsigned integer that can be represented with 16 bits? 20 -1 =1-1 = 0 Adding unsigned integers The basic rules for binary addition are: 0+0=0 0+1=1 1+0=1 1 + 1 = 0 and carry 1 1 + 1 + 1 = 1 and carry 1 Example 1: IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 8 Binary Addition UNIT I Double check the above value: (00100111)2 =(39)10 (01001010)2 =(74)10 74 + 39 = 113 = 64 +32+16 +1 = 01110001 Overflow: The largest number that can be stored in an 8-bit binary register is 255. What happens when two 8- bit binary values are added, and the result is larger than 255? If you now try to do the financial calculations, that would be 1 + 1 = 10. You write the answer section, but there is nowhere to carry the 1. This is because the two 8-bit binary values added the result is 306, which is greater than 255. The arithmetic result exceeds the available number of bits; this is called an overflow error. Logical Binary Shifts: Computers can carry out a logical shift on a sequence of binary numbers. The logical shift means moving the binary number to the left or the right. Each shift left is equivalent to multiplying the binary number by 2. Each shift right is equivalent to dividing the binary number by 2. As the bit shifts, any empty positions are replaced with zero. IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 9 Binary Addition UNIT I Example 1 (Left Shift): Suppose we shift the original number to two places left: The binary number 1010100 is 84 in denary --- this is 21 X 22 And Suppose we shift the original number three places left: The binary number 10101000 is 168 in denary --- this is 21 X 23 Suppose we shift the original number 00010101 four places left: The leftmost 1-bit has been lost. In our 8-bit register, the result of 21 X 24 is 80 which is incorrect. This error is because we have exceeded the maximum number of left shifts possible using this register. IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 10 Binary Addition UNIT I Example 2 (Right Shift): The value of the binary bits is now 200 ÷ 21 is 100. Convert the new binary number 01100100 to denary (64+32+4) Suppose we shift the original number two places to the right: The binary number 00110010 is 50 in denary --- this is 200 ÷ 22 And Suppose we shift the original number three places to the right: The binary number 00011001 is 25 in denary --- this is 200 ÷ 23 Suppose if we shift the original number four places Right: IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 11 Binary Addition UNIT I The rightmost 1-bit has been lost. In our 8-bit register, the result of 200 ÷ 24 is 12 which is incorrect. This error is because we have exceeded the maximum number of right shifts possible using this register. Signed integers in binary In the denary, negative integers are represented using a minus symbol before the value of the number, e.g. −19. In computer systems, there are two ways to represent signed integers in binary: two’s complement, and sign and magnitude. Two’s Complement: the place values for a number with 8 bits are: For example, using two’s complement representation, the place values for a number with 6 bits are: -32 16 8 4 2 1 To represent a positive number using two's complement, the first (most significant) binary digit has to be 0. For example, 78, start subscript, 10, end subscript,7810 is represented as: -128 64 32 16 8 4 2 1 0 1 0 0 1 1 1 0 Double-check using the place values: 64+8+4+2 = 78 Method 1: To represent a negative number using two's complement, the first (most significant) binary digit has to be 1,1. For example, minus, 95, start subscript, 10, end subscript,−9510 is represented as: -128 64 32 16 8 4 2 1 1 0 1 0 0 0 0 1 Double-check using the place values,−128+32+1 = −95 Method 2: The method of two’s Complement: First, take one’s complement(invert all the values – 0’s to 1 and 1’s to 0) for the given binary number Then, add 1 to the result. Example1: Representation of – 35 in binary numbers IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 12 Binary Addition UNIT I Convert 35 to a binary number Take one’s complement of the above number The add 1 11011101 is the binary representation of -35 using two’s complement Example 2: IRUSHADHIYYA SCHOOL/COMPUTER SCEINCE/0478 13 Data Storage Unit 1 Data Storage Text Representation: All types of data like numbers, text, images, and sound must be held in the binary form inside the computer, The Whole numbers are called an integer. One byte of storage can store all the numbers from 0 to 255. Fractions or numbers with decimal points are called floating-point numbers. The computer uses an extra byte of data to store the position of the point. Every letter of the alphabet is given a code number. The basic character code is ASCII. Each letter occupies 1 byte Formats: The different ways of storing digital data are called formats. Every computer file has a name and file extension. The file extension tells you what type of data is stored in that file. Pathname: The pathname is a sequence of symbols and names that identifies a file(Filename ). The operating system looks for that file in your current working directory. If the file resides in a different directory, you must tell the operating system how to find that directory. Example:.txt ---- means the file holds text stored as ASCII code..doc ---- means the file holds text and document formatting.exe ---- means the file holds instructions that the computer can execute. Text Files Audio Files Video Files Data File.doc Microsoft Word Document.mid MIDI File.mp4 MPEG-4 Video File.pps PowerPoint Slide Show.docx Microsoft Word Open XML.mp3 MP3 Audio File.mpg MPEG Video File.ppt PowerPoint Document Presentation.rtf Rich Text Format File.mpa MPEG-2 Audio.swf Shockwave Flash.xml XML File File Movie.txt Plain Text File.wav WAVE Audio File.vob DVD Video Object.pct Picture File File Compressed files Developer Files.wmv Windows Media Video Page Layouts File Files.rar WinRAR Compressed.cpp C++ Source Code.xls Excel Spreadsheet.pdf Portable Archive File Document.gif Adobe Format File C/C++ Source.zip Zipped File.c.jpeg Image files.indd InDesign Code File.jpg Document Images in computer: A computer image is made up of millions of tiny dots called pixels (picture elements). A pixel represents one point of light on the screen. The two common methods used to store images in digital form are bitmap and vector graphics. Bitmap graphics: A bitmap file stores the position and color of every pixel that makes up an image. The image is made of millions of pixels and the color of each pixel is stored using a number code. A bitmap is a good way to store photographs. Vector graphics: A vector image is made up of shapes constructed from lines. The computer stores mathematical formulas that tell it how to draw shapes and lines. A vector file is smaller than a bitmap file. Vector graphics are good for images made of simple lines and shapes, for example, cartoons, diagrams, and graphs. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 14 Data Storage Unit 1 The file extension.bmp is used for bitmap graphics files. The file extension.svg is an example of a vector graphic file. Pixelation: If a bitmap image is made larger, all the dots get bigger and the image looks distorted. This is called pixelation. But pixelation does not affect vector images, they are drawn to the correct size using the stored mathematical formulas Vector vs Bitmap image Bitmap Images: Bitmap images are made up of a two-dimension matrix of pixels. Each pixel is represented by a binary number. The bitmap image is stored as a series of binary numbers A black and white image needs only one bit per pixel (0 or 1) If each pixel is represented by 2 bits, then the pixel can be one of these four colors. (22 = 4), 00, 01, 10, 11 Colour depth: The number of bits used to represent each color is called bit depth If 8-bit color depth means that each pixel can be one of 256 colors ( 28 = 256) Image resolution: Number of pixels that make up an image. An increase in the image will increase the file size Representation of Sound: Each sound file has a wavelength, frequency, and amplitude. Sound is analog, so sound waves need to be sampled to be stored in the computer. Sampling means measuring the amplitude of the sound wave. This is done by ADC. To convert analog data to digital, the sound waves are sampled at regular time intervals. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 15 Data Storage Unit 1 Sample resolution: The number of bits per sample is known as sample resolution (bit depth) Sampling Rate (frequency): The number of sound samples taken per second. This is measured in hertz (Hz). 1 Hz = 1 sample per second Benefits and drawbacks of using large sampling resolution when recording the sound CDs have a 16-bit sampling resolution and a 4.1 kHz sample rate, that is 44100 Hz samples every second. This gives high-quality sound. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 16 Memory Size Calculations UNIT I File Size Calculations Image File Size Calculation: Formula: File size = horizontal pixels × vertical pixels × bit depth ------------------------------------------- 8(to convert to bytes) × 1024 (kilobytes) Bit Depth (Bits Number of Colours or Where do # of Colours Per Pixel) Tones Come From? 1 2 21=2 2 4 22=4 3 8 23=8 4 16 24=16 6 64 26=64 8 256 28=256 Audio File Size: Sample rate Bit Resolution Sample rate in Hz bit depth/ Bit Resolution 11.025kHz 8 Bit Length 22.05. kHz 8 bit Number of channels (mono /Stereo) 44.1 kHz 16 bit Sample rate X Bit Resolution X Time (seconds) X Channel Audio file Size = ________________________________________________ 8(to convert to bytes) × 1024 (kilobytes) Bit Rate: Every second how many bits are taken in that audio? Bit rates are usually measured in kilobits per second (kbps). Bit rate = sample rate X bit resolution X Channels ---------------------------------------------------- 1000 Example: 1. What if the audio is 1 Minute, with a 16-bit resolution and 44.1kHz with the stereo channel? What would be the file size? Size =( 44.1 X 1000 X 16 X 60 X 2) / ( 8 X 1024 X 1024) = 10.09 MB Bit rate = 44100 X 16 X 2 ------------------- = 1411.2 kbps 1000 IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 17 Memory Size Calculations UNIT I 2. Five minutes of music is sampled at 40000 samples per second, and each sample is encoded into 16 bits(2 bytes). How big will the resulting music file be? 5 minutes =300 seconds. So there are 300 X 40,000. Each sample occupies 2 bytes, making a file size of = 300 X 40000 X 2 bytes = 24,000000 bytes = 24 MB 3. Five minutes of music is sampled at 8000 samples per second, and each sample is encoded into 16 bits(2 bytes). How big will the resulting music file be? 5 minutes =300 seconds. So there are 300 X 40,000. Each sample occupies 2 bytes, making a file size of = 300 X 8000 X 2 bytes = 4,800,000 bytes = 48 MB Video file Calculation: Audio data Image Data Frame rate (frame per second ) Length of the video Several frames: Frame rate X time in seconds File size of Frame: (Size of one Image ) = horizontal pixels × vertical pixels × bit depth ------------------------------------------- 8(to convert to bytes) × 1024 (kilobytes) Size = Audio data + (image data X frame rate X length) IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 18 Memory Size Calculations UNIT I TEXT / Database file Size Calculations Standard datatypes size: Storage size Document / Spreadsheet Database ( MS ACCESS) 1 Byte A single character in the text Text data type Whole numbers 0 to 255 Whole number (-128 to +127) ---Number / BYTE data type Yes/ NO data type 2 byte A single character for a large The whole number (-3200 to +3200) ----- Number/ Asian character set Integer / Short data type 4 Byte A whole number between - 2 Number/ Long Integer type billion to +2 billion Number / Single data type (6 decimal digits) Single precision/ Floating point 8 bytes Double precisions Floating Number / Double Data type Point (15 digits) Date/Time Data type Massive whole numbers Currency Data type Example : Estimate the file size of the word document which contains 2000 characters and a small image which is 300 X 300 pixels in size with 8 colors. Assume an extra 20 KB for other document features. Characters = 2000 bytes Image size = 300 X 300 X 3 = 270000 bits = 33750 bytes Extra = 20 KB = 20480 bytes File size = 2000 +33750 + 20480 = 56230 bytes = 54.9 KB Or = 2000 + 33750 + 20000 = 55750 / 1000 = 55. 7 KB ( approximately) Download / Upload Speed calculations: A company that offers the following Internet broadband transfer rates: 56 megabits per second DOWNLOAD 16 megabits per second UPLOAD If each music track is 3.5 megabytes in size, how long would it take Juan to download his 40 tracks? 40 tracks = 40 X 3.5 = 140 MB 56 megabits/sec = 7 Mbyte/sec Time take to download the tracks = 140/7 = 20 Seconds IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 19 Memory Size Calculations UNIT I Exercise : Estimate the size of a file for each of the following: a) Text file containing 256 characters b) Word processor document containing 1000 characters and a small image which is 256 X 256 pixels in size ( assume an extra 14 KB for other document features) c) A database is to contain a person’s name and address. i) Decide the maximum number of characters you would need to input a name and an address. ii) Estimate the maximum size of files needed to store the name and address. d) A security system contains a password (16 characters long), a username(20 characters long), a small photo(256 X 640 pixels in size), and security questions ( up to 72 characters long). Estimate the file size needed to store all this data. Activity 1: Find an uncompressed image. Right-click on the thumbnail of the image and click on properties. Now click the summary tab. You now can find all the details you need to work out the file size of an image. Answer the following: 1. Dima has agreed to send Michaela a 20-megabyte file. They both have a broadband connection. Dima has to upload his file to a server and then Michaela needs to download it from the same server. The broadband data transfer rates (speeds) are: 1 megabit per second to upload a file 8 megabits per second to download a file (Note: 8 bits = 1 byte) (i) How long does it take to upload Dima’s file? (ii) How long does it take to download Dima’s file? 2. A company produces animation effects using computers rather than producing them manually. Each image takes about 400 kilobytes of storage. 25 images per second are produced. How much memory would be needed to store a 30-minute animation? 3. Each image size is 400 kilobytes (0.4 GB) (i) How many images can be stored before the hard disk is full (ii) Once the hard disk is full, how can the system ensure that the stored images are not lost? 4. A company advertises its Internet broadband speeds as follows: download speed of 128 megabits per second upload speed of 16 megabits per second (8 bits = 1 byte) (a) Explain what is meant by the two terms download speed and upload speed (b) How many 4-megabyte files could be downloaded per second using this company’s broadband IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 20 File Format and Compression UNIT I Compression Lossless Compression Breaking up the files into “smaller” forms for transmission or storage and then putting them back together. Example: lossless compression is required for text and data files, such as bank records and text articles. ZIP files format and in UNIX tool GZIP. PNG or GIF, use only lossless compression. Lossy compression works very differently. These programs simply eliminate "unnecessary" bits of information, tailoring the file so that it is smaller. This type of compression is used a lot for reducing the file size of bitmap pictures, which tend to be fairly bulky. Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming media and internet telephony. TIFF and PNG may use either lossless or lossy methods. BMP (Bit Map image files): This is an uncompressed image file. The raw bitmap is often referred to as. BMP or.TIFF files JPG or JPEG (Joint Photographic Experts Group) This is a compressed bitmap image file format commonly used for photographs (lossy) GIF (Graphics Interchange Format) This is a lossless bitmap image compression standard, but it is only suitable for vector images with limited numbers of colors such as logos. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 21 File Format and Compression UNIT I PDF (Portable Document Format) This is an open standard for exchanging documents. Text and graphics are displayed exactly as in the original, with no need to have the software that created the document MP3 (Moving Pictures Expert Group Audio Layer 3) This has become the standard for distributing digital music files on the internet. It uses lossy compression to reduce file sizes to about a tenth of the original. MPEG (Moving Pictures Expert Group) This is a set of standards designed to encode audio/visual information. It uses lossy compression for both the sound and the visual components. Below are comparisons of the same image saved in several popular file types. Tiff, uncompressed 901K Tiff, LZW lossless compression (yes, it is bigger) 928K JPG, High quality 319K JPG, medium quality 188K JPG, moderate web quality 105K JPG, low quality / high compression 50K JPG, absurdly high compression 18K PNG, lossless compression 741K GIF, lossless compression, but only 256 colors 131K Comparison of MPEG – 3(MP3) with CD Audio files MP3 uses audio compression technology to convert the music to MP3 format, which reduces the file size by about 90 percent. MP3 files can be used in MP3 players, computers, or mobiles whereas audio files in CD format can be used only on computers. The music quality on CD is better when compare with MP3 files as it is the full version of audio files. By using the file compression algorithm which uses Perceptual Music Shaping, the quality of MP3 files is retained, by removing the sounds that the human ear can’t hear properly, removing the softer sound when two sounds are played at the same time. Text and number file formats: Text files also undergo file compression by using a complex algorithm that works on redundancy or repeat sections of words. Usually, text files use ASCII format. It codes the text files in denary or hexadecimal. Example : The phrase “ WHEN IT IS SNOWING HEAVILY LOOK OUTSIDE LOOK OUTSIDE IT IS HEAVILY SNOWING” Excluding the spaces between the words and full stop, the message has a total of 62 characters. 1 character requires 1 byte so 62 memory units would be needed to store the above messages Instead of storing all 62 characters, the word's position at which they occur can be stored as shown below. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 22 File Format and Compression UNIT I The above table needs 1 byte for each character in each word and 1 byte for each position the word occurs in the message. So, 33 bytes are needed to store the words and 1 byte to store the positions, giving a total of 6 bytes. This is much less than the 62 bytes we required with our original method. No data has been lost and we have reduced our storage requirements by 26%. To recreate the message the computer simply retrieves the words and places them positions allocated. Run-length encoding (RLE) can be used for lossless compression of many different file formats: it is a form of lossless/reversible file compression it reduces the size of a string of adjacent, identical data (e.g. repeated colors in an image) a repeating string is encoded into two values: the first value represents the number of identical data items (e.g. characters) in the run the second value represents the code of the data item (such as ASCII code if it is a keyboard character) RLE is only effective where there is a long run of repeated units/bits. Using RLE on text data Consider the following text string: 'aaaaabbbbccddddd'. Assuming each character requires 1 byte then this string needs 16 bytes. If we assume ASCII code is being used, then the string can be coded as follows: This means we have five characters with ASCII code 97, four characters with ASCII code 98, two characters with ASCII code 99, and five characters with ASCII code 100. Assuming each number in the second row requires 1 byte of memory, the RLE code will need 8 bytes. This is half the original file size. IRUSHADHIYYA SCHOOL\COMPUTER SCEINCE\0478 23