Chapter 2 Computer Organization and Design PDF

Chapter 2 Computer Organization and Design Outlines 2.1 Processors – CPU Organization – Instruction execution – Instruction Level Parallelism – Processor level parallelism 2.2 Primary Memory 2.3 Secondary Memory 2.4 Input/Output Basic Computer Organization contains circuits that direct and high speed coordinate proper sequence temporary data interprets each instruction and storage area to apply the proper signals to the support ALU and registers. execution activities. performs the arithmetic Advantages: Shorter and logical execution references within the processor. Faster access has no internal storage. Ease of programming Inside the Computer power supply I/O devices fan motherboard The motherboard has three parts – I/O connections – Memory – Processors The processor Typical Computer is an Interconnected system of processors, memories, buses, I/O devices. These are connected via buses. These buses may be internal or external. Central Processing Unit (CPU) – most important processor, and it is the ‘brain’ of the computer, it is used to: Executes programs/instructions stored in memory by fetching, examining, and executing It Consists of Arithmetic & Logic Unit (ALU), Control Unit (CU), small fast memories/registers ALU – performs logic and arithmetic operations e.g. add, subtract, shift, rotate Control Unit – fetch instruction from memory, determine type (decode) & direct data to right place, at the right time, choose circuits & data Registers – small high speed memory very close to the CPU General purpose – store data, control information, addresses Special purpose – only data or only addresses, e.g. PC, IR etc. These registers like Program Counter (PC), Instruction Register (IR), Memory Address Register (MAR), Memory Data or Buffer Register (MDR, MBR) Accumulator – main register that holds data CPU needs to process, results of operations. PC – holds memory address of the next instruction to be executed IR – holds the next instruction to be executed MAR – holds the address of the memory location referenced MDR – holds the data read from or to be written to memory CPU Organization The CPU organization uses the von Neumann machine model to perform arithmetic and logic operations using what is called the Data Path. This consists of: – An Arithmetic Logic Unit (ALU): Performs additions, subtractions, multiplications, divisions, comparisons, bit shifting, … etc – Registers (typically 1 to 32) –Several buses connecting the pieces. – Basically, it consists of lots of gates. Data Path of Neumann Machine of Addition The CPU (Data Path) performs additions, subtractions, multiplications, divisions, comparisons, and bit shifting on its inputs. The output can be then stored back into a register which will be then stored in memory. Instructions may be categorized into one of the followings: 1. Register-memory instructions – load data from memory into registers to be used as inputs for the ALU and allows the output data to be stored back into memory. 2. Register-register instructions – do some operations on data in registers and then store the result back in one of the used registers. Data path – defines flow of data from memory to registers to ALU for computation and back to registers and/or memory. Data path cycle – it is the process of executing or running the above data path (flow) and directly affects the speed of the machine. The faster the cycle is, the faster the machine runs. Instruction Execution Steps An instruction is a word defining a basic machine instruction and its operands. As an example, for a register machine: add r0, r1, r2. Instruction execution – fetch-decode-execute (FDE) cycle and the CPU executes instructions in a series of small steps which represent the central of the operation of all computers. These steps are: 1. Fetch the next instruction from memory then into instruction register 2. Change the program counter to point to next (i.e., following) instruction 3. Determine the type of instruction just fetched 4. If the instruction uses a word in memory, determine its location 5. Execute the instruction 6. Go back to step 1 to begin executing following instruction Design Principles for Modern Computers This is a set of principles which are called “RISC Design Principles”. 1. All instructions are directly executed by hardware – Not interpreted by microinstructions 2. Maximize rate at which instructions are issued – Minimize execution time – Exploit parallelism for better performance 3. Instructions should be easy to decode 4. Only loads and stores should reference memory 5. Provide plenty of registers Parallelism It is the process of doing two or more things at once to get better performance. – Instruction-Level Parallelism: Pipelining: Technique in which the execution of several instructions is overlapped. Superscalar architectures: – Using parallel pipelines – Processor-level Parallelism: Array computers Multiprocessors Multicomputers Instruction-Level Parallelism Fetching instructions from memory is a bottleneck for the execution speed. To solve this, the instructions are stored in a set of registers called Prefetch Buffers. Therefore, when an instruction is needed, it could easily be taken from the Prefetch buffer rather than waiting for a memory read to complete. This divides the instruction execution into two steps: (i.e., fetching and execution). Pipelining It is a type of Instruction-Level Parallelism. Using pipelining, the process of instruction executing is often divided into many parts or steps. All of these steps can run in parallel by a piece of hardware. See the following Example. Cont. Pipelining This is a five-stage pipeline Using pipelining, each instruction is broken into several stages. Stages can operate concurrently PROVIDED WE HAVE SEPARATE RESOURCES FOR EACH STAGE! Note: execution time for a single instruction is NOT improved. Throughput of several instructions is improved. Pipelining Benefits Performance: pipelining allows the trade off between: – Speed or bandwidth (Millions Instructions per Second) – Latency/Execution time Completely hardware mechanism All modern machines are pipelined – This was the key technique for advancing performance in the 80’s – In the 90’s the move was to multiple pipelines Beware, no benefit is totally free/good – Problem: Watch for hazards!!! Superscalar Architectures Dual five-stage pipelines with a common instruction fetch unit: – Each pipeline with its own hardware for each stage, providing duplicate decoding and duplicate ALU's. – The two instructions must not conflict over resource usage (registers) – The main pipeline is called u pipeline and the second one is v pipeline Superscalar Architectures Also refers to single pipeline with multiple functional units at one or more stages of the pipeline. What are the benefits? A superscalar processor with five functional units. Today’s definition of Superscalar architecture: Processor that can issue multiple instructions per clock cycle (often four or six). Processor-Level Parallelism Many applications require fast computers, therefore high- speed computers become a must to solve modern problems. Due to CPUs keep getting faster and faster, computers will run into the problems with the speed of light. To get these types of computers, Instruction-Level parallelism may provide a little help using pipelining and superscalar, but this high speed can be achieved by designing computers with multiple CPUs. Processor-Level Parallelism refers to a computer system with multiple processors: 1. Array computers 2. Multiprocessors 3. Multicomputers Arrays Computers – An array processor is an array which have a large number of identical processors that perform the same sequence of instructions on different sets of data. There are two methods that have been used to execute large scientific problems: 1. Array processors: have one control unit that directs an array of identical processors to execute the same instruction at the same time, but on different data. SIMD Processor: Single Instruction Multiple Data An array of processor of the ILLIAC IV type. Cont… array computers 2. Vector Processor: It looks like the array processors in executing a sequence of operations on data elements, efficiently. But unlike them in that all addition operations are performed using single pipelined adder. Both array processors and vector processors work on an arrays of data. The result is another array using the array processor and another vector using the vector processor. Multiprocessors Multiprocessor: is a system which consists of multiple CPUs sharing a common memory. Therefore, these CPUs must be controlled to avoid conflict. Different implementations are possible. a) A single-bus multiprocessor: multiple independent CPU's sharing a common memory and other resources. This type will face the problem of conflicts if CPUs try to access the memory at the same time because they are using same bus and the same resource. b) To reduce the contention between the processors and improve the performance, a multiprocessor with local memories may be used to try to reduce the number of conflicts because accessing to this special memory does not need to use the main bus Processor-Level Parallelism a- Single-bus multiprocessor b- Multiprocessor with local memories Outline 2.1 Processors 2.2 Primary Memory – Bits – Memory Address – Byte Ordering – Error-correcting codes – Cache memory – Packaging 2.3 Secondary Memory 2.4Input/Output Memory It is the part where the computer programs and data are stored according to the Von Neumann architecture. And without it no stored-programs or data are available. The main memory is built from multiple DRAM chips – Dynamic Random Access Memory: using the DRAM, individual storage locations in the main memory may be accessed in any order and at very high speed. Caches are smaller but faster memories that are used for performance – Caches are built out of SRAM technology Static Random Access Memory More expensive than DRAM (and less dense) Memory Subsystem Organization Types of memory – Read Only Memory (ROM): ROM chips are designed for applications in which data is only read and retain their data even when power to the chip is turned off. Masked ROM Programmable ROM (PROM) Erasable PROM (EPROM) – Random Access Memory (RAM), or Read/Write memory: used to store data that changes. Dynamic RAM Static RAM: once written, contents stay valid – Cache memory Processor-Memory Interconnections Memory Addresses Memory is viewed as an array which consists of a number of cells (or memory locations) to store information. Each cell has an address, which is an index into the array, where the programs can refer to (e.g., memory of n cells will have addresses of (0 to n-1)). By default, adjacent cells have successive addresses. All cells have the same number of bits (e.g., if a cell consists of k bits, then it can hold any of the 2k possible bit combinations). As an example, the following Figures show three ways for organizing a 96-bit memory. Memory addresses may be expressed in binary numbers, which means that if an address has n bits, then the maximum number of addressed cells is 2n. Therefore, how many bits do we need to express each address in the previous memory organization Figure. The number of bits in the address determines the maximum number of addressable cells in the memory. The address is independent of the number of bits per cell. Therefore, the cell is the smallest addressable unit which is standardized of 8-bit (byte) length. Bytes are grouped in words, e.g., – 32-bit word has 4 bytes/word – 64-bit word has 8 bytes/word. Byte Ordering Two rules for byte ordering in a word: – Bytes in a word are numbered left-to-right or right-to-left Big Endian: The byte whose address is x...xx00 is placed at the “most significant” position, the “big end” – IBM mainframes are big-endian machines Little Endian: The byte whose address is x...xx00 is placed at the “least significant” position, the “little end” – Intel PCs are little-endian machines Big endian memory Little endian memory Byte Ordering example (a) A personal record for a big endian machine. (b) The same record for a little endian machine. (c) The result of transferring from big endian to little endian. (d) The result of byte-swapping (c). Little Endian doesn’t match “normal” ordering of characters in strings Therefore, many people think that Little Endian is backwards Differences in Endianness causes problem when exchanging data among computers Error Correcting Codes Computer memories can make errors. These errors in memory system may occur due to: – Garbled bits when fetching or storing data – Hard (permanent) errors: like manufacturing defects – Soft (transient) errors: random, non-destructive, i.e., power supply problems. Therefore, some memories use error-correcting and/or error- detecting codes to detect and correct these errors. If these methods are used, additional bits are added for each memory byte or word to ensure integrity of data. Then, each word will contain: – m-bit data – r redundant or check bits – Consequently, each word will have n = m + r bits (referred as Codeword). In order to check if there are errors have occurred, Error detection/correction can be done using Hamming distance. Hamming distance is defined as the number of 1’s in the resulted EXCLUSIVE OR (XOR) of the two codewords. Which means that if the Hamming distance is d, then it will need d single-bit errors to change one to the another word. As an example, the codewords 11110001 and 00110000. these have a Hamming distance of 3. Therefore, we need 3 bits to convert one word to the other. Therefore, in order to check the words for error (error- detecting), a single-bit called parity bit is added to the data. This bit is chosen so that the number of 1 bits is even (or odd). Now if any bit is changed, the number of 1 bits is odd (or even). Also, Hamming distance algorithm can be used to construct error correcting codes for any word size. For that, the number of parity bits needed depends on the word size. As shown in the following table. From this table, as the size of the word increases, the percentage overhead decreases. Using the Hamming algorithm for constructing error- correcting codes, r bits parity bits are added to m-bit word, forming (r+m) bits new word. All bits with a number of power 2 are parity bits, and the others are data. Each parity bit checks specific bit positions. Parity bit is set so that the number of 1s in the checked positions is even. Bit b is checked by those bits b1, b2, …, bj such that b1 + b +, …+ bj = b. As an example, take the word 1111000010101110. – The word size is m = 16 bits, which is 24. – This means that we have n = 4, then the number of parity check bits required would be r = 4 + 1= 5 and the total size of the memory word would be r + m = 16 + 5 = 21. – The parity bits are in position 1, 2, 4, 8 and 16 (powers of 2). So the first step is to write the word in positions 3, 5, 6, 7,9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, and 21 leaving positions 1, 2, 4, 8, 16 blank. – Bit 1 checks bits 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 (take 1, skip 1 and so on). – Bit 2 checks bits 2, 3, 6, 7, 10, 11, 14, 15, 18, 19 (take 2, skip 2 and so on). – Bit 4 checks bits 4, 5, 6, 7, 12, 13, 14, 15, 20, 21 (take 4, skip 4 and so on). – Bit 8 checks bits 8, 9, 10, 11, 12, 13, 14, 15 (take 8, skip 8 and so on). – Bit 16 checks bits 16, 17, 18, 19, 20, 21 (take 16, skip 16 and so on). – For bit 1, the total of the bits it checks is 1 + 1 + 1 + 0 + 0 + 1 + 1 + 0 + 1 + 0 = 6. This is even, so bit 1 is 0. – For bit 2, the total of the bits it checks is 1 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 1 = 6. This is even, so bit 2 is 0. – For bit 4, the total of the bits it checks is 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 = 6. This is even, so bit 4 is 0. – For bit 8, the total of the bits it checks is 0 + 0 + 0 + 0 + 1 + 0 + 1 = 2. This is even, so bit 8 is 0. – For bit 16, the total of the bits it checks is 0 + 1 + 1 + 1 + 0 = 3. This is odd, so bit 16 is 1 So the word is stored in memory as Now the grand total for bits 1, 2, 4, 8, and 16 will be even Bit 1 = 0 + 1 + 1 +1 + 0 + 0 + 1 + 1 + 0 + 1 + 0 = 6 Bit 2 = 0 + 1 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 1 = 6 Bit 4 = 0 + 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 = 6 Bit 8 = 0 + 0 + 0 + 0 + 0 + 1 + 0 + 1 = 2 Bit 16 = 1 + 0 + 1 + 1 + 1 + 0 = 4 If the word somehow changes by altering bit 11 from 0 into 1. In order to check and detect the error to be able to correct it – Bit 1 becomes 0 + 1 + 1 +1 + 0 + 1 + 1 + 1 + 0 + 1 + 0 = 7 which is odd – Bit 2 becomes 0 + 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 1 = 7 which is odd – Bit 4 becomes 0 + 1 + 1 + 1 + 0 + 1 + 0 + 1 + 1 + 0 = 6 which is even – Bit 8 becomes 0 + 0 + 0 + 1 + 0 + 1 + 0 + 1 = 3 which is odd – Bit 16 becomes 1 + 0 + 1 + 1 + 1 + 0 = 4 which is even. All bits must be even, so bits 1, 2, and 8 are incorrect. Since 1 + 2 + 8 = 11, we know that bit 11 has been changed. Hamming's code will not only let us know that a bit has been changed but which bit has changed. Cache Memory CPUs have always been faster than the memories. Chip manufacturers use pipelining and superscalar design to improve CPU speed Memory designers have concentrated in increasing the capacity, but speed remains relatively the same. → Problem: If CPU issues a memory word access request it will be idle for multiple CPU cycles before getting the word (a lot of delays) (i.e., the slower the memory, the more cycles the CPU needs to wait). To deal with this kind of problems, there are two solutions: 1. Simply, is to just start memory READs when they are encountered but continue executing and stall the CPU if an instruction tries to use the memory word before it has arrived. 2. The other solution is to require compilers not to generate code to use words before they have arrived. Often after a LOAD there is nothing else to do, so the compiler is forced to insert NOP (no operation) instructions, which do nothing, but occupy a slot and waste time. Engineers can build fast memories, but they have to be located on the CPU chip (going over the bus to memory is slow). This makes the chip bigger and expensive. So the choice is between a small amount of fast memory or a large amount of slow memory. The best is to get large, fast and cheap memories !!!!! To achieve some of that, is to mix a small amount of fast memory with a large amount of slow memory to get the speed of the fast memory and the capacity of the large memory at moderate prices. Here, the small fast memories are called Cache. The basic idea of Cache is simple, it keeps the most heavily used memory words. When the CPU needs a word, it looks first in the cache. If the word is not there, it checks the main memory. If a substantial fraction of the words are in cache, the access time is reduced. Success or failure depends on what fraction of the words are in cache. Programs do not access their memories completely at random. The next memory reference will be in the general vicinity of the previous one. Except for branches and procedure calls, instructions are fetched from consecutive locations in memory. While in loops, a limited number of instructions are repeatedly executed. The Principle of Locality in Cache Memory It is when memory references are made in any short time interval using a small fraction of the total memory. This principle forms the basis of all caching memory systems. When a word is referenced, it and some of its neighbors are brought from slow memory into the cache, so that the next time it is required, it can be accessed quickly. If a word is read or written k times in a short interval, the computer will need 1 reference to the slow memory (main) and (k - 1) references to the fast memory (Cache). The larger k is, the better the overall performance. If c is the cache access time, m is the main memory access time, and h is the hit ratio, which is the fraction of all references that can be satisfied out of cache and can be given by h = (k - 1)/k. The miss ratio is given by (1 – h). The mean access time = c + (1 - h)*m. 1. If h 1, then the access time = c and all references are in cache. 2. If h 0, then the access time = c + m, then a main memory address is needed every time. First an unsuccessful time c to check the cache, and then a time m to do the memory reference. Due to this, main memory reference can be started in parallel with cache search, so if a cache miss occurs, the memory cycle is already on. Cache design should be compatible with the CPU performance. If the CPU is of high-performance, the cache design considerations are: 1. Cache size: the bigger the cache, the better the performance but with higher cost. 2. The size of the cache line: e.g., a 16-KByte cache can be divided into: 1024 lines of 16 bytes, or 2048 line of 8 bytes, or And so on… 3. The cache organization: Keeping tracking of which words are still in use. 4. Are the instructions and the data kept in the same cache or not? 1. Unified cache: both (instructions and the data ) are in the same cache. 2. Split cache (Harvard Architecture): instructions and the data are in different caches. 5. The last issue is the number of cache memories Memory Packaging and Types Since early stages up to 90s, memory was manufactured and provided in separate chips with typical sizes: from 1K-bit to 1M-bit. Now a days, a group of them are installed in groups of 8, 16 or 32 on printed boards. Therefore, current memories come on a single plug in board in either a SIMM (1-sided) or DIMM (2- sided) package. Single Inline Memory Module (SIMM): row of connectors on one side – Typical SIMM board has 72 connectors on one side of the board and delivers 32 bits at once Dual Inline Memory Module (DIMM): row of connectors on both sides – Typical DIMM board has 84 connectors on each side of the board with a total of 168 connectors and delivers 64 bits at once – Commonly used today A single inline memory module (SIMM) holding 256 MB. Two of the chips control the Two types of SIMM's have been in SIMM. general use: 1. 30-pin SIMM's have 8-bit data buses; 2. 72-pin SIMM's have 32-bit data buses. Dual Inline Memory Module 512MB, 168-pin DIMM, SDRAM, PC133 memory module SDRAM: Synchronous Dynamic RAM. → a variant of DRAM in which the memory speed is synchronized with the clock pulse from the CPU. → enables the SDRAM to pipeline read and write requests. Outline 2.1 Processors 2.2 Primary Memory 2.3 Secondary Memory – Magnetic Disks – Floppy Disks – SCSI Disks – RAID – CD-ROMs – DVDs Secondary Memory The main memory is always too small and it is usually not enough for people’s usage. Therefore, additional storage memories are required. The traditional solution for this is Memory Hierarchy. Memory Hierarchy Upper Level faster Instr. Operands Blocks Pages Files Larger Lower Level A five-level memory hierarchy These levels are: 1. At the top are the CPU registers, which can be accessed at full CPU speed. 2. Next comes the cache memory, which ranges from 32 KB to a few megabytes 3. Next is main memory, from 16 MB to tens and hundreds of gigabytes. 4. Then come magnetic disks for permanent storage. 5. Finally, there is magnetic tape and optical disks for archival storage. As we move down the hierarchy, the access time gets bigger: 1. CPU registers can be accessed in a few nanoseconds. 2. Cache memories take a small multiple of CPU registers. 3. Main memory needs a few tens of nanoseconds. 4. Disk access times are at least 10 msec, 5. Tape and optical disk times are measured in seconds. But the storage capacity increases as we go down: 1. CPU registers store 128 bytes, 2. Caches a few megabytes, 3. Main memories tens to thousands megabytes, 4. Magnetic disks a few gigabytes to tens of gigabytes. 5. Tapes and optical disks are kept off-line, so they have unlimited capacity. Number of bits per dollar also increases as we go down the hierarchy: – Main memory is measured in dollars/megabyte, – magnetic disk storage in pennies/megabyte, and – magnetic tape in dollars/gigabyte or less. Magnetic Disks Long term and large-size nonvolatile storage: come in floppy and hard configurations. A magnetic disk consists of one or more aluminum platters with a thin magnetizable coating/film. They are 3 to 12 cm in diameter and less than 3 cm for notebook computers. A disk head containing an induction coil floats just over the surface; resting on a cushion of air (it touches the surface on floppy disks). When a current passes through the head, it magnetizes the surface just beneath the head, aligning the magnetic particles facing left or right depending on the polarity of the drive current. When the head passes over a magnetizes area, a positive or negative current is induced in the head, making it possible to read back the previously stored bits. Thus as the platter rotates under the head, a stream of bits can be written and later read back. The circular sequence of bits written as the disk makes a complete rotation is called a track. Each track is divided up into fixed-length sectors, containing 512 data bytes, preceded by a preamble that allows the head to be able to identify start of track and sector and synchronized before reading or writing. Following the data is an ECC, either Hamming code or Reed Solomon Code. Between consecutive sectors is an intersector gap. Disks have movable arms of moving in and out to different radial locations and distances. At each radial distance, a different track can be written. Tracks are series of concentric circles about the spindle. Disks have between 5000 and 10,000 tracks per cm, so track widths are 1 to 2 microns. These, disks have a density of 50,000 to 100,000 bits/cm. Disks have multiple platters stacked vertically. Each surface has its own arm and head. All the arms are ganged together so they move to different radial positions all at once. The set of tracks at a given radial position is called a Cylinder. The platters are often 2 sided, and most hard drives have multiple platters, therefore, modern PC disks posses 6 to 12 platters/drive, which provide 12 to 24 recording surfaces. To read or write a sector, the arm is moved to the right radial position called seek. This takes 5 to 10 msec except 1 msec in consecutive tracks (seek time). Rotational latency is a delay until the desired sector rotates under the head. Disks rotate at 3600, 5400, or 7200, or 10,800 RPM, so the average delay (half a rotation) is 3 to 6 msec. Transfer time depends on the linear density and rotation speed. With transfer rates of 20 to 40 MB/sec, a 512-byte sector takes 13 to 26 micro seconds. Modern disks divide the surface into zones. Each zone has an equally sized number of sectors. This number increased as moving outward which will increase the drive capacity. A disk with five zones. Each zone has many tracks. Therefore, a magnetic-disk system consists of one or more disks mounted on a common spindle. Each drive has three key parts: – disk: assembly of disk platters – disk drive: electromechanical mechanism that spins the disk and moves the read/write heads – disk controller: electronic circuitry chip that controls the operation of the system (drive). Some of the controller's tasks include 1. Accepting commands from the software, such as READ, WRITE, and FORMAT, 2. Controlling the arm motion, 3. Detecting and correcting errors, and 4. Converting 8-bit bytes read from memory into a serial bit stream and vice versa. 5. Remapping of bad sectors. Floppy Disks Diskette or Floppy disk is a small removable medium, discovered by IBM to record maintenance information. They have similar general characteristics of the magnetic disks (hard disks) discussed earlier. BUT, floppy disk heads touch the surface , so the media and heads wear out. So to solve the problem of tearing and wearing, personal computers retract the heads and stop the rotation when a drive is not reading or writing. When the next read or write command is encountered, there is a delay of half a second while the motor gets up to speed. Mostly, they are not used in these days in modern computers. Following is a comparison between some types of them: IDE Disks This technology evolved rapidly to have a controller closely integrated with the drives. IDE drives evolved into EIDE, which supports LBA addressing scheme to address the drives sectors. EIDE controller have two channels, each with a primary and a secondary drive. IBM improve the EIDE to get ATA-3, which transfer data in more speed. The next edition was the ATAPI-4, which increase the speed more and more. RAID CPU speeds are growing fast. Therefore, the gap between the CPU and the disk performance has become larger over time. To improve disk performance and reliability, parallelism may be used. Achieving fast and reliable storage based on multiple disks: a technology called Redundant Array of Inexpensive/Independent Disks (RAID) has been proposed in 1988. RAID is a box which consists of a set of disks hidden behind a controller that appears to the host computer as a single logical large disk drived by OS. Because SCSI disks have a good performance and low price, RAID may consist of RAID SCSI controller in addition to a box of SCSI disks (which appear as a single large disk). The data are distributed over the drives to allow parallel operation. Several schemes have been proposed to arrange the disks. The most common one is the 6 levels/configurations (not a hierarchy), RAID level 0 to RAID level 5. Terminology: – Data striping: A single large file is stored in several disk units by breaking the file up into a number of smaller and storing those pieces on different disks. These six different categorizations are simply as follows: RAID level 0: Figure (a) – Disks are divided into strips of k sectors. – Data is striped across each disk sequentially in round-robin fashion: k sectors to disk 1, the next k to disk 2, etc, – It works better with large requests. RAID level 1: Figure (b) –Same organization as level 0, but with mirrored disks (duplication): 2 copies of each strip on separate disks. – Data is always written to both drives, but when reading, either drive can be used. – In cases of fault or drive crash, Recovery is simple: swap faulty disk & re-mirror with no down time. – Expensive. RAID level 2: Figure (c) – This level works on a word or byte basis: single byte/word. – Splitting each byte into a pair of 4 bits, then add 3 parity bits (1, 2 and 4) to each one to form 7 bit word. – 7 bit word is written over 7 disks which must be synchronized with each others. – Error correction calculated across corresponding bits on disks RAID level 3 - Simplifies level 2, Figure (d) – Single parity bit is added for each data word and written to a parity drive. – Here drives should also be synchronized. – If any drive is crashed, 1-bit error correction is used because the position of errored bit is known. –Data on failed drive can be reconstructed from surviving data and parity info – Very high transfer rates RAID level 4 - Figure (e). – It works with stripes. – A strip-to-strip parity written onto extra drive. – It extends level 0 by adding a parity strip on an extra drive. This allows for full recovery if a drive fails from the parity drive. This organization places a heavy load on the parity drive. RAID level 5 - Figure (f). It uses the same approach as level 4, but – Parity striped across all disks – Round robin allocation for parity stripe – Avoids RAID 4 bottleneck at parity disk – Commonly used in network servers SCSI Disks Most disk units are designed to connect to standard buses. SCSI (Small Computer System Interface) disks use a different interface and typically enjoy higher data transfer rates. Various versions use either 8 or 16-bit parallel connections and achieve hundreds of megabytes per second. Because they have high rates, they are used in the UNIX workstations, Macintoshes, network servers. SCSI controllers can be used as a bus to manage a wide range of devices up to 7, not just disk drives. Each SCSI device has an ID and Some of the possible SCSI parameters. devices can be chained to be handled by a single controller. SCSI controllers are common on servers and high-end workstations. Optical Storage: CD-ROMs Due to their large capacity and low price, optical disks are widely used for storage and distributing books, movies, software,…. The dimensions of CDs (Compact Disk) are 120 mm diameter, 1.2 mm thickness, with 15 mm hole in the middle. CDs are prepared to store data by a high power laser to burn 0.8 micron holes in a coated glass disk to produce bumps. Then, a molten polycarbonate is used to cover the surface. Then, a thin aluminum layer is deposited on the polycarbonate topped by a protective lacquer. The burned areas are called pits, and the unburned areas between the pits are called lands. These pits may be used to record 0s and lands to record 1s or vice versa. The pits and lands are written in a continuous spiral starting near the hole and going out 32 mm to the edge with 22,188 revolutions. The total length is 5.6 km. The rotation of the CD is reduced as the head moves Recording structure of further from the center to a Compact Disk or CD-ROM. keep the linear density of the data the same along the entire length of the track. Rotational speed varies from 530 to 200 RPM. In 1984, CD-ROMs appeared. CD-ROMs have the same physical size as the audio CDs and mechanically and optically compatible with them. The basic format of the CD-ROMs is based on encoding every byte in a 14-bit symbol. A group of 42 consecutive symbols forms a 588-bit frame. Every frame has 192 data bits (24 bytes). The other 396 bits are used for control and error correction. At a higher level, every 98 frames are grouped and called a sector. Logical data layout on a CD-ROM. Every CD-ROM sector starts with a 16-byte preamble which recognizes the starting of the sector, contains the sector number and the mode. Two modes allow a tradeoff between reliable storage of bytes and efficient storage of video or audio data, bypassing error correction. Mode 1 stores 2048 byte data + 288 bytes error correction Mode 2 combines the data and the error correction in 2336 byte data field. This mode is used for applications that the ECC is not important. Single-speed CD-ROM drives operates at 75 sectors/sec which provide a data rate of 156,300 bytes/sec in mode 1 and 175,2000 bytes/sec in mode 2. A standard CD has a storage capacity of 74 minutes with 650 Mbyte. In 1986, graphics, audio, video,… multimedia application can be merged and saved to the same CD-ROM. All information written to the CD-ROMs could not be erased. CD Recordable In the 1990s, CD recorders were available as a back up medium. CD-Recordable (CD-R) written data are stored permanently. The 1st CD-Rs were similar to the regular CD-ROMs with 120 mm polycarbonate blanks, except that they were gold colored on top instead of silver for the reflective layer. Here, the reflections from the pits and the lands are simulated by adding a dye layer between the polycarbonate and the reflective layer. The added dye layer is transparent which allows the laser light to pass through and reflects off the reflective layer. To write on CD-R, – The laser is firstly turned up to high power (8-16 mW) – When the beams hits a spot of the dye layer, it will be heated and create a dark spot. When the CD-R is read back, the photo-detector detects a difference between the dark and transparent areas. This difference is interpreted as the difference between the pits and lands. A new format of the CD-R and CD-ROM XA, which allows tracks to be written incrementally (rather than all at once). This organization requires each track to have its own table of contents (VTOC). The reader searches for the most recent TOC and uses this as the index to the CD contents. Because the track on a CD must be written in a single, continuous process, the computer writing to the CD must be able to provide data at the required rate for a sustained period of time. CD Re-Writable For some applications, people are still in need for rewritable CDs. To achieve that, other technology which uses the same size media as CD-R is called CD-RW. But, for the recording layer, it uses different layer called alloy of silver, indium, and tellurium. CD-RW drives have three different laser powers: 1. High power: to melt the alloy to convert it from high-reflectivity to low-reflectivity to represent a pit. 2. Medium power to melt the alloy to represent a land 3. Low power to sense the state of the alloy (for reading). The reasons why the CD-RW has not replaced the CD-R: 1. CD-RW blanks are more expensive 2. For security applications, the CD-R can not accidentally be erased, while the CD-RW can be. DVD DVD (Digital Versatile (or Video) Disk) uses the same general design and storage techniques as normal CDs, but uses – Smaller pits (0.4 microns vs. 0.8 microns for CDs), – A tighter spiral (closer tracks) (0.74 microns vs. 1.6 microns for CDs), and – A higher frequency red laser (can focus the light to a smaller spot) (i.e., 0.65 microns vs. 0.78 microns for CDs). Due to these differences, a single DVD can store 7 times more than a normal CD, i.e., up to 4.7 GB with data rate of 1.4MB/sec compared to 150KB/sec of the CDs. Using some compression techniques, a 4.7 GB can hold 133 minutes of full motion, full screen and high resolution video. To increase the DVD storage capacity other than the 4.7 GB which is a single-sided, single-layer format, other formats have been defined: 1. 8.5 GB which is a single-sided, dual-layer format 2. 9.4 GB which is a double-sided, single-layer format 3. 17 GB which is a double-sided, dual-layer format. Blu-Ray The successor to DVD is the Blu-Ray. It differs from the DVD in the followings: 1. It uses blue laser instead of red in DVD 2. This laser allows to focus more accurately than red 3. Therefore, smaller pits and lands can be achieved 4. Single-sided Blu-ray disks hold about 25 GB, and double-sided holds 50 GB with data rate of 4.5 MB/sec. 5. These data rates are still insignificant compared to those of magnetic disks. Summary Memory is an important component in computers – Capacity and speed characteristics determine performance of computer Primary storage – Large main memory implemented with dynamic memory chips Secondary storage in the form of magnetic and optical disks provides the largest capacity in memory. The data rates of optical disks are insignificant compared to those of magnetic disks. Outline 2.1 Processors 2.4 Input/Output 2.2 Primary Memory – Buses 2.3 Secondary Memory – Terminals – Mouse – Printers – Telecommunication equipments – Character Codes Input/Output The computer system’s I/O architecture is its interface to the outside world. A set of I/O modules interface to the processor and memory via the system bus. I/O: is a subsystem of components that moves coded data between external devices and a host system. – Interfaces to external components (keyboard, scanner, disk, printers, …) – Cabling or communication links between host system and its peripherals. Buses The logical arrangement and structure of a any PC is as follows. It can be seen that one single bus has been used to connect the CPU, memory and the I/O devices. Some systems may possess two or more buses. Logical structure of a simple personal computer. Buses carry data between the major components of a computer system. – Shared resource → requires a controller that handles the bus communications. Controllers are either part of the main system board or are located on plug-in circuit boards. The controllers are connected to its corresponding device by a cable. Controllers are used to control the I/O device and switch bus access to it. Some systems allow controllers to access the main memory of the system. This allows data transfer without intervention by the CPU and is called Direct Memory Access (DMA): – When the transfer is complete, the controller signals the CPU by sending it an interrupt which forces the CPU to hang up its current program and to execute an interrupt handler to take any action needed and tell the OS that the I/O is finished. – When the interrupt handler is finished, the CPU carries on its original program which has been suspended due to the Interrupt. Now if the bus is needed by the CPU and the I/O at the same time, what will be the case? Arbitration procedures are used to mediate bus requests when the bus is busy. It is common for I/O devices to have preference to bus access because they may lose data if made to wait. – Cycle stealing: the DMA uses memory cycles that would otherwise be used by the CPU. Bus technology has changed over the years, and each new idea brings up the issue of compatibility. – The early PC had an 8-bit ISA (Industry Standard Architecture) bus. Then it was modified to be 16-bits. – Then an EISA bus was a backward compatible bus using a 32- bit data path. – The more popular bus of the EISA is the Peripheral Component Interconnect (PCI) should be considered the successor to EISA. Terminals The computer terminals are two parts: 1. Keyboard: Attaches to a PC communicate using a serial protocol. Is connected to a dedicated processor called the keyboard controller. – detects the pressing events and generates an interrupt on the main processor which then handles the key event. 2. Monitor Cathode Ray Tube (CRT) Monitors Flat panel displays Video RAM Cathode Ray Tube Monitors CRT: – Electron gun shoots an electron beam against the phosphorescent screen – The beam scan across the face of the tube, one scan line at a time. A typical monitor scans at least 500 lines to display a full screen, and does this at least 60 times each second. – Because CRT images are displayed one line at a time, it is called a raster scan device. (a) Cross section of a CRT Color monitors have (b) CRT scanning pattern three electron guns (red, green, blue) Horizontal and vertical sweeping is controlled by applying a voltage to its corresponding plates. The voltage applied to the grid causes the corresponding bit pattern to appear on the screen. This allows for the binary signals to be converted into a visual display consisting of bright and dark spots. Flat Panel Displays CRT are very heavy to be used in notebook computers. Therefore, another technology is used based on the LCD (Liquid Crystal Display) technology. Liquid Crystal Display technology – Uses an electrical field to change the optical properties of a liquid crystal by changing the molecular alignment. – This allows control of the intensity of light passing through the crystal. – This can be used to construct flat panel displays which use a lighted panel behind the liquid crystal layer. Many types of displays are based on this technology like the Twisted Nematic (TN), and active matrix display. (a) The construction of an LCD screen. (b) Twisted Nematic display: the grooves on the rear and front plates are perpendicular to one another. CRT Monitor LCD Monitor Video RAM Video RAM: is a special memory set aside to store the digital data required to drive and refresh the display unit. – Located on the display controller. Each pixel is represented by a data value in the video RAM – common representation of a pixel uses a 24-bit (3 byte) RGB value (one byte for the intensity of each of the three colors) allowing 224 (16 million) colors. – indexed color: most common reduced color scheme uses 8-bit number to represent a color ( i.e., palette of 256 RGB values). This will reduce the required memory by 2/3. Video requires a high data rates to be transferred to the display. Therefore, special buses are needed like the Accelerated Graphics Port (AGP), which has a data rate of 252 MB/sec and with different speed versions. Mice 1. Mechanical mice – motion of the wheels is turned into electrical signals measuring movement in two directions. 2. An optical mouse – It has no wheels and uses a LED to reflect light from the mouse pad: the reflections are picked up by a photo detector which determines how far the mouse has moved in each direction. 3. Opto-mechanical mouse. The mouse transmits movement information to the computer through its tail. Usually three bytes are sent each time the mouse detects a movement above some minimum (the minimum is called a Mickey). The bytes represent x and y change (signed) and the state of the mouse buttons. Software running under the operating system control translates the mouse data into cursor movement or other application specific actions. Printers Monochrome printers – Matrix printer: print head with 7-24 electromagnetically activable needles – Inkjet printers: movable head which holds an ink cartridge thermal inkjet (Bubble jet) printers Piezoelectric inkjet. – Laser printer chamber: (a) The letter “A” on a 5 x 7 higher quality, great matrix. speed, flexibility, and (b) The letter “A” printed with moderate cost. 24 overlapping needles. Operation of a laser printer. Color printers – Produce colors by combining three colors: cyan, yellow, and magenta pigments. These colors are the result when red, blue and green are absorbed instead of reflected. They are essentially the inverse of the RGB color screen used to produce color using light. These are called CYMK printers (K stands for black) because mixing the colors does not produce a true black. – The total set of colors produced is called gamut. – Converting a colored image from the screen to an identical printed one has some problems, therefore, requires special calibration, sophisticated software in addition to expertise. Five technologies of colored printers are commonly used : 1. Color ink jet printers: this kind uses four cartridges ( C, M, Y, and K). Good quality and medium cost. 2. Solid ink jet printers: which uses four solid blocks of special waxy ink which are melted to hot ink and then sprayed onto paper. 3. Color laser printers: it works in the same way of the normal laser one but with four different toners. 4. Wax printer: which uses a wide ribbon of four-color wax with thousands of heating elements to melt the wax and then fused to the paper in a form of pixels (CMYK). 5. Dye sublimation printer: CMYK dyes pass over thermal head which contains thousands of programmable heating elements, then the dyes are vaporized and absorbed by a special paper. Each element can produce 256 colors depending on the temperature. Telecommunications Equipment Modem (modulator/demodulator) : – Used to send digital data across an analog telephone network. – Modulates the digital data by changing the binary information into an analog signal and on the receiving end demodulates the analog signal to recover the binary data. – Use amplitude, frequency, and phase changes to encode the binary data. The simplest scheme transmits an audio signal that switches between two volumes (AM) or tones (FM) to represent the 0's and 1's. Faster modems use a phase switching technique to increase the number of bits that can be transmitted per second, the bit-rate. This is different (in some cases) from the baud rate which is the number of signal changes possible per second. Full-duplex communication (2- way) is possible if different frequencies are used for transmit and receive. Modems are limited to 57,600 bits per second due to the Transmission of the binary number characteristics of common 01001010000100 over a telephone line bit telephone systems which restrict by bit. the frequency of transmission (a) Two-level signal. (b) Amplitude between the substation and your modulation. (c) Frequency modulation. phone. (d) Phase modulation. Digital Subscriber Lines Because the modem has a maximum data rate of 56 kbps, services with more data rates are required called broadband to be used in the Internet access. The reason why the modem is very slow compared with others, because it uses the same telephone network which is based on human voices where all the incoming data are restricted to 3-4KHz by a filter. By removing the frequency filter, telephone companies can offer high speed data lines as part of the original telephone system. This is called DSL (Digital Subscriber Lines). The higher frequency spectrum allows the equivalent of about 250 modems to utilize a single telephone line. Operation of ADSL. A typical ADSL equipment configuration. Internet over Cable Cable companies also offer broadband access by utilizing extra bandwidth already present in the cable connections to peoples homes. Everyone in a neighborhood shares a single wire that is connected to a headend (similar to a telephone substation) that has very high bandwidth to the cable companies main office. The bandwidth of the local cable is 750 MHz, of which about 200 MHz is available for data. The ADSL connection provides only about 1 MHz, but it is not shared. The quality of Internet service through cable is thus dependent on your neighbors' usage. Frequency allocation in a typical cable TV system used for Internet access Character Codes Every computer has a set of characters that it uses. These include, numbers, letters, signs,… etc. For the computer to be able to understand these characters, each one is assigned to a number. The process of transferring or mapping is called character code. For computers to be able to communicate with each others, they should use the same code. There are two common character codes: 1. American Standard Code for Information Interchange (ASCII) 2. Unicode ASCII Character Codes American Standard Code for Information Interchange (ASCII) – It is a standard character code used by most PC's since the 1980's – It assigns a 7 bit number to each of the encodable characters, which allows to 128 characters to be used. – ASCII characters are two types: 1. ASCII control characters for data transmission (non-printing). 2. ASCII printing characters (numbers, letters, signs,… etc.) The ASCII Character set Unicode Character Codes PC's systems soon added another 128 characters, to provide graphics on text-based displays. Officially the ASCII set was extended to Latin-1(8-bits) which included symbols and letters with diacritical marks. To support international character sets, the ASCII system has been extended to the Unicode Character Set (16 bit) called a code point. This uses a variable length code that can represent any character or symbol by a unique number. Various encoding schemes are used; UTF-8 uses the byte values 00 through 7F to represent the ASCII characters. UCS-2 is a fixed size 16-bit code used to represent only the most common characters.

Chapter 2 Computer Organization and Design PDF

Document Details

Tags

Related

Summary

Full Transcript