Lect1_DataStorage.ppt PDF

Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns 1.5 The Binary System 1.6 Storing Integers 1.7 Storing Fractions 1.8 Data Compression 1.9 Communications Errors 1-1 Bits and their meaning Bit (or Binary Digit) a symbol whose meaning depends on the application at hand. Some possible meanings for a single bit – Numeric value (1 or 0) – Boolean value (true or false) – Voltage (high or low) 1-2 Bit patterns All data stored in a computer are represented by patterns of bits: – Numbers – Text characters – Images – Sound – Anything else… 1-3 Boolean operations Boolean operation is any operation that manipulates one or more true/false values – Can be used to operate on bits – often referred to as a logical operation Specific operations – AND – OR – XOR (‘exclusive OR’: X or Y but not both) – NOT 1-4 Figure 1.1 The Boolean operations AND, OR, and XOR (exclusive or) 1-5 Gates Gates are devices that produce the outputs of Boolean operations when given the operations’ input values – Often implemented as electronic circuits – Provide the building blocks from which computers are constructed 1-6 A pictorial representation of AND, OR, XOR, and NOT gates, as well as their input and output values 1-7 Flip-flops Flip-flop is a circuit built from gates that can store one bit of data. – Has an input line which sets its stored value to 1 – Has an input line which sets its stored value to 0 – While both input lines are 0, the most recently stored value is preserved 1-8 Figure 1.3 A simple flip-flop circuit 1-9 Figure 1.4 Setting the output of a flip-flop to 1 (cont’d) 1-10 Hexadecimal notation The hexadecimal system is a counting system using base 16 (c.f. decimal, which uses base 10 -see later notes) Hexadecimal notation: A shorthand notation for long bit patterns – Divides a pattern into groups of four bits each – Represents each group by a single symbol Example: 10100011 becomes A3 1-11 Figure 1.6 The hexadecimal coding system 1-12 Main Memory The main memory acts as the central storage unit in a computer system. It is a relatively large and fast memory which is used to store programs and data during the run time operations. The primary technology used for the main memory is based on semiconductor integrated circuits. 1-13 Main memory: cells Cell: A unit of main memory (typically 8 bits which is one byte) – Most significant bit: the bit at the left (high-order) end of the conceptual row of bits in a memory cell – Least significant bit: the bit at the right (low-order) end of the conceptual row of bits in a memory cell 1-14 Figure 1.7 The organization of a byte-size memory cell MSB LSB 1-15 Word : A general term describing several consecutive bytes in memory (usually 2 or 4). 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0  2-byte  word MSB LSB  30 29 28 27  31 5 4 3 2 1 0 4-byte word 1-16 Main memory addresses Address: A “name” that uniquely identifies one cell in the computer’s main memory – The names are actually numbers. – These numbers are assigned consecutively starting at zero. – Numbering the cells in this manner associates an order with the memory cells. Cells have an order, so “previous cell” and “next cell” have reasonable meanings. 1-17 Figure: Memory cells arranged by address 1-18 Main Memory Every computer contains both Random Access Memory (RAM) and Read-Only Memory (ROM). RAM is a type of volatile memory that saves the files people are operating on for a short period. Whereas, ROM is a type of non-volatile memory that saves commands for a computer indefinitely. Data that has been saved in RAM can be recovered and modified because it can be read and write. However, only read-only data can be stored in ROM. 1-19 Main Memory: Physical view Bank 0 Bank K 0 0 Memory is composed of a 1 1 number of banks. 2 2 3 3.. Each bank has a number of.. bytes arranged as an array....... N N 1-20 Main Memory: Physical view Bank 0 Some bytes are kept in RAM, some in ROM, some in I/O ports (special purpose ROM registers). Port 1 Bytes in RAM can be read/written. Port 2 Bytes in ROM can be read only. Port n RAM 1-21 Main Memory: Logical view ("Logical" means "As a programmer would imagine it") Main Memory 0 1 Memory is composed of a number of bytes 2 Bytes are arranged as an array 3. Each byte has an address (byte#).... 1-22 Measuring memory capacity Kilobyte: 210 bytes = 1024 bytes – Example: 3 KB = 3 times1024 bytes Megabyte: 220 bytes = 1,048,576 bytes – Example: 3 MB = 3 times 1,048,576 bytes Gigabyte: 230 bytes = 1,073,741,824 bytes – Example: 3 GB = 3 times 1,073,741,824 bytes 1-23 Mass Storage Systems Non-volatile: data remains when computer is off Usually much bigger than main memory Much slower than main memory Magnetic Systems – Disk, Tape Optical Systems – CD, DVD Flash Technology – Flash Drives, Secure Digital (SD) Memory Card 1-24 Figure 1.9 A disk storage system Each track contains same number of sectors, each with same number of bits – unless zoned-bit recording is used to add more sectors to longer, outer tracks 1-25 CD & DVD data storage One continuous spiral data track – in one revolution more data is read from outer part of track than from inner Uniform rate of data transfer can only be achieved by varying the rotation speed Most systems use constant rotation speed therefore the system must handle varying data transfer rates Typical capacity: 600-700 MB (CD) several GB (DVD) 1-26 Files File is the unit of data stored on a mass storage system. – Logical record natural groups of data within a file i.e. natural from the programmer point of view – Field : sub-unit of logical record Key field holds identifier key Physical record: A block of data conforming to the physical characteristics of the storage device. Buffer: A memory area used for the temporary storage of data. 1-27 Figure 1.12: Logical records versus physical records on a disk 1-28 Representing Data External representation: (for people) to use data easily, data must be represented (visually, graphically,...) on paper or on some other directly useable media. Internal representation: (for computing machines) to support storage and computation, data must be represented as a sequence of 0's and 1's. 1-29 Representing Data How to represent data on paper? in memory? Coding is representing data according to a system, internally or externally. Information is coded: For internal computations For presentation purposes (text, graphics, sound) For long-term archival storage (databases, archives) 1-30 Representing Data In all cases, data must be coded as a data type, in order to be recognized later. In a computer system: Text data must be represented according to standard coding table(s), Numerical data must be represented by some notation (a number system) Representing Numeric Data Internal number representation: Always a sequence of 0's and 1's:...01100111001010... External number representation: Many notations (number representation systems) Context-dependent notations Positional (fixed point) notations Floating point notation 1-32 Representing Numeric Data (Externally) Non-positional notation Roman Numerals: Positions do not carry weights Digits are interpreted according to context Numeral means XIV = 14 I 1 or -1 (add 1 or subtract 1) MCMLXXIV = 1974 V 5 or -5 X 10 or -10 Negative numbers had not L 50 or -50 been invented yet! C 100 or -100 D 500 or -500 M 1000 1-33 Representing Numeric Data (Externally) Positional notation For representing data externally, the most commonly used number systems: Number System Radix (base) Digits used Decimal 10 0123456789 Binary 2 01 Octal 8 01234567 Hexadecimal 16 0123456789 ABCDEF 1-34 Representing Non-numeric Data When the computing device has to handle letters and special symbols, each symbol is coded separately. Two dominant coding schemes are:  ASCII (American Standard Code for Information Interchange)  EBCDIC (Extended Binary Coded Decimal Interchange Code) 1-35 Representing Strings in ASCII String: a sequence of bytes (characters). Generally contains text characters, Can have any number of characters, The end point must be specified in some way. A N K A R A...  Code for 'A' in ASCII table 1-37 Figure 1.13 The message “Hello.” in ASCII 1-38 1-39 1-40 Octal encoding: for encoding a bit stream in groups of three Bit pattern Octal representation 000 0 001 1 010 2 011 3 100 4 101 5 110 6 111 7 1-41 Representing Images Two approaches: (1) Bit map techniques – Represent image as collection of dots (pixels) – Black and white binary images Pixel values either 0 (black) or 1 (white) – Black and white (monochrome) images Pixel values range from 0 to 255 (commonly) – Colour images Pixel represented as 3 colour components – Red, Green & Blue (RGB) Problems with resizing, zooming 1-42 Representing Still Images (2) Vector techniques Image represented as set of lines and curves (circles or polygones), specified as a description Details of drawing left to the device – Used in computer-aided design systems 1-43 Colour depth 4-bit colour 1-bit colour 24-bit colour 8-bit colour 1-44 The colour depth of an image is determined by the available palette size – 1-bit images support 2 “colours” black/white binary images – 8-bit images support 256 grey levels – 24-bit images support 16 million colours Many image formats: –.png - Portable Network Graphics Format –.bmp - Windows bitmap –.pic – PC Paint graphics format –.mac – Macintosh MacPaint format –.gif (Graphics Interchange Format) – common for graphics on the world wide web –.jpg – JPEG image (Joint Photographic Experts Group) – platform independent – used for photos –.tif ( Tagged image file format (TIFF) 1-45 Representing numeric values Binary notation – uses bits to represent a number in base two Limitations of computer representations of numeric values – Overflow – happens when a number is too big to be represented. If the bit sequence is cut off to fit the space available, this is truncation. 1-46 base ten (decimal - or denary) & base two (binary) systems 1-48 Figure 1.16 Decoding the binary representation 100101 1-49 Figure 1.17 An algorithm for finding the binary representation of a positive integer 1-50 Figure 1.18 Applying the algorithm in to obtain the binary representation of thirteen 1-51 Informal restatement of the algorithm Finding the binary representation of a positive integer P: Repeatedly divide P by 2, recording the remainders from right to left. Stop when you get 0 in the result (not as the remainder) Example: P=13 13 = 2 x 6 +1 1. 6=2x3+0 01. 3=2x1+1 101. 1=2x0+1 1101. STOP 1-52 Figure 1.19 The binary addition facts 1-53 Decimal addition Addition in base 10: 1 8 0 9 + 0 3 2 3 2 1 3 2 (carry) We start with a carry value of 0. Whenever the sum gets larger than 9, we add 1 to the carry. The carry is added at the next position. 1-54 Binary addition Addition in base 2: 1 1 0 1 + 0 1 0 1 10 0 1 0 (carry) We start with a carry value of 0. Whenever the sum gets larger than 1, we add 1 to the carry. The carry is added at the next position. 1-55 Fractions in binary: Decoding the binary representation 101.101 1-56 Finding the binary representation of a fraction Informal algorithm for encoding: Repeatedly subtract successive negative powers of 2, recording the success (1) or failure (0) of the subtraction, from left to right after the radix point. Stop when subtraction gives result 0. Example: N = 5/8 5/8 = 1/2 +1/8.1 1/8 = 0 x 1/4 + 1/8.10 subtraction failed 1/8 = 1/8 + 0. 101 STOP 2-1 = 1/2 2-2 = 1/22 = 1/4 2-3 = 1/23 = 1/8 1-57 binary representation of a fraction - same method Example, M = 0.33 0.33 = 0 x 0.5 + 0.33.0 0.33 = 1 x 0.25 + 0.08.01 0.08 = 0 x 0.125 + 0.08.010 0.08 = 1 x 0.0625 + 0.0175.0101 0.0175 = 0 x 0.03125 + 0.0175.01010 0.0175 = 1 x 0.015625 + …...010101 Process may not terminate. Stop when available space filled i.e. truncate the representation 1-58 Representing Integers Unsigned integers can be represented in base two, 4 bytes, ranges from 0 to 4294967295. Integers occupy 2n bytes in memory (n usually is 1, 2 or 3). It is usual that the sign information is explicitly encoded. 1-59 Signed integers are whole numbers that can be positive or negative, 4 bytes, ranges from -2147483648 to 2147483647 An integer can be coded in: – Sign-magnitude notation – One's complement notation – Two’s complement notation  the most popular representation – Excess notation  another, less popular representation, often used for exponent values in floating point representations 1-60 Representing Integers (Internally) Sign-magnitude MSB carries sign 0:+ 1:- Magnitude (= absolute value) Sign -6 : 1000 0000 0000 0110 6 : 0000 0000 0000 0110 1-61 Representing Integers (Internally) Sign-magnitude Two representations for 0: - 0 : 1000 0000 0000 0000 +0 : 0000 0000 0000 0000 What are the maximum and the minimum values that can fit in S/M in 2-bytes? 1-62 Representing Integers (Internally) Sign-magnitude What are the maximum and the minimum values that can fit in S/M in 2-bytes? 16 bits: 1 sign bit + 15 bits for magnitude Max magnitude: 20 + 21 + … + 214 = 215-1 = 1 + 2 + … + 16384 = 32767 maximum value = 32767 minimum value = - 32767 1-63 Representing Integers (Internally) Complement notations Complement notation provides another way of representing negative numbers. Two complement types are used frequently, differing in how negative numbers are coded: 1's Complement Subtract each digit of the number from (base-1) i.e. 0 becomes 1, 1 becomes 0. 2’s Complement Subtract each digit of the number from 1, as above Add 1 to the result 1-64 Ranges of values that can be represented in n bits are: 1's Complement -2n-1 +1 to 2n-1 -1 e.g. For one byte, n=8, from -127 to +127 2’s Complement -2n-1 to 2n-1 -1 e.g. For one byte, n=8, from -128 to +127 1-65 The 1's complement representation of a positive integer is the same as the original binary, with MSB is set to 0 - i.e. Same as sign/magnitude representation The 1's complement of a negative integer is obtained by subtracting its magnitude from 2n -1 where n is the number of bits used to store the integer in binary. e.g. One byte, n=8, subtract from 1111 1111 -1 becomes: 1111 1111 -0000 0001 = 1111 1110 1-66 One’s complement example using 8 bits (1 byte) Has two representations of zero: 00000000 (+0) and 11111111 (-0). To add two numbers represented in this system, one does a conventional binary addition, but it is then necessary to add any resulting carry back into the resulting sum: “end-around carry” e.g. add -1 and 2 11111110 + 00000010 = 00000000 with carry 1. Add this to LSB, giving result 00000001 1-67 One’s complement was popular in older computers (e.g UNIVAC, PDP-1) The problems of multiple representations of 0 and the need for the end-around carry are eliminated using two's complement. 2’s Complement of negative integer: – Subtract each digit of its magnitude from 1 – Add 1 to the result 1-68 Representing Integers (Internally) Two's complement notation Representing binary numbers (here, in one byte only) 2's complement of -19 Binary code for +19 = 0001 0011 1’s complement => 1111 1111 - 0001 0011 (means flip)= 1110 1100 2's complement = 1110 1100 + 1 = 1110 1101 1-69 Two’s Complement The 2's complement of a positive integer is the same as the original binary, with MSB set to 0 - i.e. Same as sign/magnitude representation For a negative integer, begin as with 1’s complement: Subtract each digit of the number’s magnitude from 1 Then add 1 to the result e.g. One byte, n=8, -1 becomes: 0000 0001 with bits complemented = 1111 1110 & finally 1111 1111 after adding 1 1-70 1-71 Figure 1.22 Coding the value -6 in two’s complement notation using four bits (using an alternative algorithm) Start with binary representation of the postive value: 1-72 Two’s Complement There is only one zero (00000000). Negating a number (whether negative or positive) is done by inverting all the bits and then adding 1 to that result. Addition of a pair of two's-complement integers is the same as addition of a pair of unsigned numbers (except for detection of overflow, if that is done). 1-73 Figure 1.21 Two’s complement notation systems 1-74 Figure 1.23 Addition problems converted to two’s complement notation 1-75 Two's-complement addition The largest positive integer that can be stored in N bits in 2's complement form is 2N-1 -1, whose binary bit pattern is one 0 followed by N-1 1s – e.g one byte, N=8, largest +ve value = 27-1 = 127 12710 = 0111 11112 Therefore, if two positive 2's complement integers are added and their sum is greater than +127 an overflow will occur. e.g. 100 + 30 : 01100100 + 00011110 = 10000010 MSB (sign bit) indicates –ve value 1-76 Two’s-complement addition Rule for overflow detection: 2's complement overflow occurs when the carry into the MSB is not equal to the carry out from the MSB. (Overflow detector circuit is used) e.g. in previous example, one carry into MSB, but no carry-out  overflow e.g. 100 + 30 01100100 + 00011110 = 10000010.....  carries 1-77 Two’s-complement addition second example: e.g. -60 -68 -68 : 10111100 -60 + 11000100 = 10000000...... one carry into MSB, but one carry-out – no overflow: answer is -128 1-78 Excess-N representation A value is represented by the unsigned number which is N greater than the intended value. Thus 0 is represented by N, and −N is represented by the all-zeros bit pattern. e.g. with 4 bits use excess-8 [ 4th bit is 8 in binary ] NB MSB = 0 for –ve values 1 for +ve values 1-79 Figure 1.24 An excess eight conversion table 1-80 Figure 1.25 An excess notation system using bit patterns of length three (excess-4) 1-81 Representing Integers (Internally) External data is converted into internal form, with the aid of the programming language being used: Using C, a programmer would: Select concrete type for the data (according to rules) int, long, char, double... Select a name for the data (to be able to use it): int A ; double F; Specify the value (in external notation) to be used A = 0xFF; or A = 255; 1-82 Representing Integers (Internally) During compilation, compiler will: Check for correctness, overflows, underflows Select the size of data (2-byte, 4 byte) Convert value into internal form (sign magnitude, 2's complement, Floating point...) Allocate memory and place data in memory Generate code to correctly manipulate data (integers in 2's complement arithmetic, floats in FP arithmetic, etc) 1-83 Representing Integers (Internally) Algorithm in coded form is a computer program, or program’s code. Translation from concepts to computer internal form is done in several phases, and each involves coding with a different style: From concepts to a high level language (*) From high level language to low level language, From low level language to machine language, From machine language to internal (voltage) form (*) only this is the programmer's task. 1-84 Terminology Data type Veri türü Base Taban Representation Gösterim Notation Yazım Sign-Magnitude (representation) İşaret/büyüklük (gösterimi) One's complement 1’in tümleri Two's complement 2’nin tümleri 1-85 Data Compression Reduce amount of data needed – For data storage – For data transfer Compression Ratio Describes ratio of original data size to compressed size e.g. 14:1 compress decompress store/transfer 1-86 Lossy vs. Lossless Compression Lossy: decompressed data is different from the original (data is permanently lost) (JPEG, MP3, MPEG) Lossless: decompressed data always exactly the same as the original (GIF, PNG, WAV, ZIP) Also called bit-preserving or reversible compression 1-87 Generic Data Compression Techniques I Run-length encoding: It works by finding runs of repeated binary patterns and replacing them with a single instance of the pattern and a number that specifies how many times the pattern is repeated. e.g. 888888833333 encoded as 83 Relative encoding: It records the differences between consecutive data units rather than entire units; each unit is encoded in terms of its relationship to the previous unit. 1-88 Generic Data Compression Techiques II Frequency-dependent encoding: length of the bit pattern to represent a data item inversely related to the frequency of the item’s use (variable length coding) – Huffman codes: It is most commonly used to compress text documents. It works particularly well when characters appear multiple times within the file that is going to be compressed. 1-89 variable length coding fixed length coding 1-90 Adaptive Dictionary Encoding Or “Dynamic Dictionary Encoding” e.g. Lempel-Ziv encoding systems (LZW) “Dictionary” is set of blocks/units from which the message/file to compress is constructed. Message is encoded as a sequence of references to the dictionary. Instead of storing the whole text as a set of character codes (letter by letter), the compression algorithm would allocate a code for each different word and store the code instead of the word. Dictionary is allowed to change during encoding. 1-91 LZW encoding: simplified example Encoding – Message is xyx xyx xyx xyx – Dictionary initially has 3 entries: “x” “y” “ ” – First encode xyx as 121 – i.e. As 1st, 2nd and 1st dictionary entries. – Then the space, giving 1213 – Space means that preceding string xyx is recognised as a word. Add it to dictionary as 4th entry – Encoding proceeds, using new entry. Entire message is encode as 121343434 1-92 LZW encoding: simplified example Decoding – Encoded message is 121343434 – Coding Dictionary initially has 3 entries: “x” “y” “ ” – First decode 1213 as xyx with following space – Space means that preceding string xyx is recognised as a word. Add it to dictionary as 4th entry – Decoding proceeds, using new dictionary entry. – 12134 produces xyx xyx – Entire message is decoded as xyx xyx xyx xyx which is the original message 1-93 Compressing Images GIF (graphic interchange format) – lossy compression – Limits colours to a pixel to only 256 – uses Dictionary Encoding 256 encodings stored in dictionary (the ‘palette’) pixel can be represented by 1 byte (referring to 1 dictionary entry) the colour represented is defined by 3 bytes (for Red, Green & Blue components, RGB) can be made adaptive, using LZW techniques unsuitable for high precision photography 1-94 JPEG (Joint Photographic Experts Group) – encompasses several different modes has lossless mode but does not produce high levels of compression, – lossy modes are widely used often default compression method in digital cameras – JPEG’s baseline standard (lossy sequential mode) has become the standard of choice in many applications uses a series of lossy and lossless techniques to achieve significant compression without noticeable loss of image quality 1-95 JPEG baseline standard (“lossy sequential mode”) uses a series of compression techniques – exploits perceptual limits and simplifies colour data, preserving luminance blurs the boundaries between different colours while maintaining all brightness information – divides image into 8x8 pixel blocks – applies DCT-based compression (difference of blocks) – additional compression achieved using run-length encoding, relative encoding and variable length encoding 1-96 Communication Errors Data bits may get corrupted in transfer or retreival from storage – Dirt/damage on disc surface – Circuit malfunctions – Static electricity on transmission path may cause data to be incorrectly recorded or read. Encoding techniques have been developed to detect errors, and even to correct them 1-97 Parity Bits Parity : oddness or evenness If each bit pattern is adjusted to have an odd (or even) number of 1s then a pattern with and even (odd) number must contain an error Example: odd parity – Add an extra bit – the Parity bit – to each pattern – If pattern has odd number of 1s, parity bit = 0 – If pattern has even number of 1s, parity bit = 1 1-98 Figure 1.28 The ASCII codes for the letters A and F adjusted for odd parity 1-99 Error-Correcting Codes Error-Correcting Codes are the codes that allow the reciver of a message to detect errors in the message and to correct them. The Hamming Distance between two patterns of bits is the number of bits in which the patterns differ – e.g. distance between 010101 & 011100 is 2 – or, distance between 11100111 & 11011011 is 4 [ XOR the bits then sum bits in the result ] 1-100 Error-correcting codes How Hamming distance is used to produce an error-correcting code? By designing a code in which each pattern has a Hamming distance of n from any other pattern, patterns with fewer than n/2 errors can be corrected by replacing them with the code pattern that is closest. 1-101 Figure 1.29 An error-correcting code If one bit is changed in a code it will not be in the list (i.e. will not be a legal code) and so the error is recognised 1-102 Hamming distances for given code A B C D E F G H 000000 A 4 3 3 3 3 4 4 001111 B 3 3 3 3 4 4 010011 C 4 4 4 3 3 011100 D 4 4 3 3 100110 E 4 3 3 101001 F 3 3 110101 G 4 111010 H Hamming distance between any two legal patterns is at least 3. 1-103 When the Hamming distance between any two patterns in the code is 3 then: Changing 1 bit in a pattern will mean it is still closer to its original pattern than any other (distance = 1) And its distance from any other pattern in the code will be at least 2 Hence we can reasonably deduce which is the original pattern 1-104 Decoding the received pattern 010100 1-105 Error-correcting codes Used in (high capacity) magnetic disk drives – Reduce possible flaws corrupting data Used in CDs for data storage – CD-DA format (for audio) reduced data errors to one error two discs (1/2) – CD for data storage have errors at 1/20000 1-106

Lect1_DataStorage.ppt PDF

Document Details

Tags

Related

Summary

Full Transcript