Computer Organization PDF

5. Computer Organization Foundations of Computer Science ã Cengage Learning 5.1 Objectives After studying this chapter, the student should be able to:  List the three subsystems of a computer.  Describe the role of the central processing unit (CPU).  Describe the fetch-decode-execute phases of a cycle.  Describe the main memory and its addressing space.  Define the input/output subsystem.  Understand the interconnection of subsystems.  Describe different methods of input/output addressing.  Distinguish the two major trends in the design of computers.  Understand how computer throughput can be improved using pipelining and parallel processing. 5.2 3 CHAPTER OUTLINE  Computer Subsystems  CPU  Memory  I/O subsystems  Subsystems Interconnections  Connecting CPU and memory  Connecting I/O devices  Program Execution  Different Architectures  CISC  Pipelining  RISK  Parallel processing 4 Computer Subsystems 5 Computer Subsystems  Three broad categories make up a computer:  CPU (Central Processing Unit)  Main Memory  I/O (Input / Output) subsystems Storage Devices 6 Computer Subsystems  CPU (Central Processing Unit) The CPU performs operations on data. It consists three parts:  ALU (Arithmetic Logic Unit)  CU (Control Unit)  Set of Registers 7 Computer Subsystems  CPU (Central Processing Unit)  ALU (Arithmetic Logic Unit)  It performs logic, shift & arithmetic operations on data.  CU (Control Unit)  It controls the operation of each subsystem. Controlling is achieved through signals sent from the control unit to other subsystems.  Set of Registers  Registers are fast stand-alone storage locations that hold data temporarily. 8 Computer Subsystems  CPU (Central Processing Unit)  Set of Registers Multiple registers are needed to facilitate the operation of the CPU. They are categorized into three groups: 1. Data Registers Dozens of registers to hold the intermediate results from operations. 2. Instruction Registers (IR) Program is made of instructions to run by a computer… Instructions of a program are loaded to instruction registers to be called by CU. 3. Program Counter (PC) It keeps tracking of the instruction currently being executed. After execution of the instructions, the counter is incremented to point the address of the next instruction in memory. 9 Computer Subsystems  Main Memory  It consists of a collection of storage locations, each with a unique identifier, called an address.  Data is transferred to and from memory in groups of bits called words. A word can be a group of 8 bits, 16 bits, 32 bits or 64 bits (and growing).  If the word is 8 bits, it is referred to as a byte. The term “byte” is so common in computer science that sometimes a 16-bit word is referred to as a 2- byte word, or a 32-bit word is referred to as a 4- byte word. 10 Computer Subsystems  Main Memory Address Contents (values) 11 Computer Subsystems  Main Memory  Address space To access a word in memory, it requires an identifier. Each word is identified by an address. The total number of uniquely identifiable locations in memory is called the address space. For example, a memory with 64 kilobytes and a word size of 1 byte has an address space that ranges from 0 to 65,535. 12 Computer Subsystems 13 Computer Subsystems  Main Memory  Address space This table assists to know the exact number of bytes. Memory addresses are defined using unsigned binary integers. 14 Computer Subsystems  Main Memory Memory types can be classified into three types:  RAM (Random Access Memory)  SRAM (Static RAM)  DRAM (Dynamic RAM)  ROM (Read Only Memory)  PROM (Programmable read-only memory)  EPROM (Erasable programmable read-only memory )  EEPROM (Electrically erasable programmable read- only memory)  Cache memory 15 T-RAM ? Computer Subsystems Z-RAM ?  Main Memory  RAM (Random Access Memory) It makes up most of the main memory. It hold data as long as power is on. It has two types based on its design: DRAM (Dynamic RAM) SRAM (Static RAM) Uses capacitors Uses flip-flop gates Its cells need to be refreshed Data stored as long as the power periodically because capacitors is on lose some of its charge with time there is no need to refresh memory locations Slow but inexpensive Fast but expensive. used for a computer's main Typically used for CPU cache memory 16 Computer Subsystems  Main Memory  ROM (Read Only Memory) It is written by the manufacturer & the CPU can read from. It is not like RAM, data on ROM is not lost when the power is off. One example of data that ROM keeps is boot program that runs when computer switched on. It can be found in three types: 1. PROM (Programmable read-only memory) 2. EPROM (Erasable programmable read-only memory ) 3. EEPROM (Electrically erasable programmable read- only memory) 17 Computer Subsystems  Main Memory  Cache memory Cache memory is faster than main memory, but slower than the CPU and its registers. Cache memory, which is normally small in size, is placed between the CPU and main memory. The CPU always check first the cache. 18 Computer Subsystems  Main Memory  Memory hierarchy Very fast and inexpensive memory is not always possible to satisfy. A compromise needs to be made. The solution is hierarchical levels of memory. 19 Computer Subsystems  Input/ Output subsystem The collection of devices referred to as the input/output (I/O) subsystem. This subsystem allows a computer to communicate with the outside world and to store programs and data even when the power is off. Input/output devices can be divided into two broad categories:  Non-storage devices  Storage devices 20 Computer Subsystems  Input / Output subsystem  Non-storage devices They allow the CPU/memory to communicate with the outside world, but they cannot store information.  Keyboard & monitor  Printer  Storage devices They store large amounts of information to be retrieved at a later time. They are cheaper than main memory, and their contents are nonvolatile— that is, not erased when the power is turned off. They are sometimes referred to as auxiliary storage devices. They can be categorized to:  Magnetic devices (e.g. Hard Disk)  Optical devices (e.g. CD) Magnetic Disk Consist of one or more disks stacked on top of each other. Information is stored on and retrieved from the surface of the disk using read/write head Surface organization: Each surface  track  sector. The tracks are separated by an intertrack gap, and the sectors are separated by intersector gap. Figure 5.6 A magnetic disk 5.21 Magnetic Disk  Data access: Data can be access randomly without the need to access all other data located before it  Performance: Depends on several factors Rotational speed: how fast the disk is spinning Seek time: time to move read/write head to the desired track where the data is stored. Transfer time: time to move data from the disk to the CPU/memory 5.22 Magnetic Tape The tape is mounted on two reels and uses read/write head to read or write information when the tape is passed through it. Surface organization: The width of the tape is divided into nine tracks Each location on a track can store 1 bit of information Nine vertical location can store 8 bits (1 byte) of information plus error detection Figure 5.7 A magnetic tape 5.23 Magnetic Tape  Data access: Sequential access device. The surface may be divided into blocks There is no addressing mechanism to access each block. To retrieve a specific block on the tape, we need to pass through all the previous blocks  Performance: Tape is slower and cheaper than magnetic disk Today people use magnetic tape to back up large amounts of data 5.24 43 Subsystems Interconnections 44 Subsystems Interconnections  Information needs to be exchanged between the three subsystems (CPU, Memory, and I/O).  The three subsystems have to communicate.  How these three subsystems are interconnected?  CPU and memory connections  CPU and I/O connections 45 Subsystems Interconnections  Connecting CPU and memory The CPU and memory are normally connected by three groups of connections, each called a bus:  Data bus  Address bus  Control bus 46 Subsystems Interconnections  Connecting CPU and memory  Data bus It is made of several connections, each carrying 1 bit at a time. The number of connections depends on the size of the word used by the computer  E.g. 32 bits word  32 data buses  Address bus It allows access to a particular word in memory. The number of connections depends on the address space of the memory. If the memory has words, the address bus need to carry bits at a time  connections are needed. 47 Subsystems Interconnections  Connecting CPU and memory  Control bus It carries communication between CPU & memory. E.g. there must be a code from CPU to memory to specify a read or write operation. The number of connections depends on the total number of control commands a computer needs. If a computer has control actions  connections are needed. bits can define different operations. 48 Subsystems Interconnections  Connecting I/O devices  I/O devices cannot be connected directly to the buses that connect the CPU and memory, because the nature of I/O devices is different from the nature of CPU and memory.  I/O devices are electromechanical, magnetic, or optical devices, whereas the CPU and memory are electronic devices.  I/O devices also operate at a much slower speed than the CPU/memory.  There is a need for some sort of intermediary to handle this difference. Input/output devices are therefore attached to the buses through input/output controllers or interfaces. 49 Subsystems Interconnections  Connecting I/O devices 50 Subsystems Interconnections  Connecting I/O devices Two types of controllers can be found  Serial  one data connection, moving bits in serials.  Parallel  several data connections, moving bits in blocks. Common kinds of I/O controllers SCSI (Small USB Computer System FireWire (Universal Serial bus) Interface) Parallel Serial Serial Need terminator high speed devices High/low speed devices Few devices Up to 63 devices Up to 127 devices number 51 Subsystems Interconnections  Connecting I/O devices In the following photo, what type of controllers the I/O devices use? Figure 5.14 SCSI controller 5.52 Controller (FireWire controller) IEEE standard 1394 defines a serial interface called FireWire It is a high-speed serial interface that transfers data in packets, achieving a transfer rate of up to 50 MB/sec, or double in the most recent version It can be used to connect up to 63 devices in a daisy chain or tree connection using one connection only 5.53 Figure 5.15 FireWire controller 5.54 Controller (USB controller) Universal serial bus (USB) is a serial controller that connects both low and high-speed devices to the computer bus USB controller is referred to as a root hub USB-2 (USB version 2) allows up to 127 devices to be connected to the USB controller using a tree-like topology with the controller as the root of the tree, Hubs are the intermediate nodes, and The devices are the end nodes 5.55 Figure 5.16 USB controller 5.56 Addressing input/output devices The CPU usually uses the same bus to read data from or write data to main memory and I/O device. The only difference is the instruction. If the instruction refers to a word in main memory, data transfer is between main memory and the CPU. If the instruction identifies an I/O device, data transfer is between the I/O device and the CPU. There are two methods for handling the addressing of I/O devices: isolated I/O memory-mapped I/O. 5.57 Isolated I/O Addressing The instruction used to read/write memory are totally different from the instruction used to read/write I/O devices Each I/O devices has its own address The I/O addresses can overlap with memory addresses without any ambiguity because the instruction itself is different Figure 5.17 Isolated I/O addressing 5.58 Memory-Mapped I/O Addressing The CPU treats each register in the I/O controller as a word in memory The CPU does not have separate instructions for transferring data from memory and I/O devices The advantage of the memory-mapped configuration is that it has a smaller number of instructions  all memory instructions can be used by I/O devices The disadvantage is that part of the memory address space is allocated to registers in I/O controllers 5.59 Figure 5.18 Memory-mapped I/O addressing 5.60 5-5 PROGRAM EXECUTION Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the program to create output data from input data. Both the program and the data are stored in memory. 5.61 Machine cycle The CPU uses repeating machine cycles to execute instructions in the program, one by one, from beginning to end. A simplified cycle can consist of three phases: fetch, decode and execute (Figure 5.19). Figure 5.19 The steps of a cycle 5.62 Four steps of Machine cycle 1.Fetch - Retrieve an instruction from the memory. 2.Decode - Translate the retrieved instruction into a series of computer commands. 3.Execute - Execute the computer commands. 4.Store – Write the results back in memory. 5.63 Input/output operation Commands are required to transfer data from I/O devices to the CPU and memory. Because I/O devices operate at much slower speeds than the CPU, the operation of the CPU must be somehow synchronized with the I/O devices. Three methods have been devised for this synchronization: Programmed I/O Interrupt driven I/O Direct memory access (DMA) 5.64 Synchronization (Progrmmed I/O) Synchronization is very primitive The CPU waits for the I/O device The transfer of data between I/O device and CPU is done by instruction in the program When the CPU encounters an I/O instruction, it does nothing else until the data transfer is complete The CPU constantly checks the status of the I/O device 5.65 Synchronization (Progrmmed I/O) If the device is ready to transfer, data is transferred to the CPU If the device is not ready, the CPU continues checking the device status until the I/O device is ready The big issue is that the CPU time is wasted by checking the status of the I/O device for each unit of data to be transferred 5.66 Figure 5.20 Programmed I/O 5.67 Synchronization (Interrupte-driven I/O) The CPU informs the I/O device that a transfer is going to happen, but it does not test the status of the I/O device continuously Instead, the I/O device interrupts the CPU when it is ready During this time, the CPU can do other jobs such as running other programs or transferring data from/to other I/O devices In this method, CPU time is not wasted 5.68 Figure 5.21 Interrupt-driven I/O 5.69 Synchronization (DMA) The third method used for transferring data is direct memory access (DMA). This method transfers a large block of data between a high-speed I/O devices, such a disk and memory directly without passing it through the CPU The DMA controller has registers to hold a block of data before and after memory transfer 5.70 Figure 5.22 DMA connection to the general bus 5.71 Synchronization (DMA) Using this method for an I/O operation, the CPU sends a message to the DMA The message contains the type of transfer (input/output), the start address of the memory location, and the number of bytes to be transferred The CPU is then available for other jobs When ready to transfer data, the DMA controller informs the CPU that it needs to take control of the buses. 5.72 Synchronization (DMA) The CPU stops the buses and lets the controller use them After data transfer directly between the DMA and memory, the CPU continue its normal operation Note that, the CPU is idle for a time. However, the duration of this idle period is very short compared to other methods The CPU is idle only during the data transfer between the DMA and memory, not while the device prepares the data 5.73 Figure 5.23 DMA input/output 5.74 5-6 DIFFERENT ARCHITECTURES The architecture and organization of computers has gone through many changes in recent decades. In this section we discuss some common architectures and organization that differ from the simple computer architecture we discussed earlier. 5.75 CISC CISC (pronounced sisk) stands for complex instruction set computer (CISC). The strategy behind CISC architectures is to have a large set of instructions, including complex ones. Programming CISC-based computers is easier than in other designs because there is a single instruction for both simple and complex tasks. Programmers, therefore, do not have to write a set of instructions to do a complex task. 5.76 RISC RISC (pronounced risk) stands for reduced instruction set computer. The strategy behind RISC architecture is to have a small set of instructions that do a minimum number of simple operations. Complex instructions are simulated using a subset of simple instructions. Programming in RISC is more difficult and time-consuming than in the other design, because most of the complex instructions are simulated using simple instructions. 5.77 CISC Vs RISC CISC RISC Has more complex hardware Has simpler hardware More compact software More complicated software code code Takes more cycles per Takes one cycle per instruction instruction Can use less RAM as no Can use more RAM to need to store intermediate handle intermediate results results 5.78 Pipelining We have learned that a computer uses three phases, fetch, decode and execute, for each instruction. In early computers, these three phases needed to be done in series for each instruction. In other words, instruction n needs to finish all of these phases before the instruction n + 1 can start its own phases. Modern computers use a technique called pipelining to improve the throughput (the total number of instructions performed in each period of time).  The idea is that if the control unit can do two or three of these phases simultaneously, the next instruction can start before the previous one is finished. 5.79 Figure 5.24 Pipelining 5.80 Parallel processing Traditionally a computer had a single control unit, a single arithmetic logic unit and a single memory unit. With the evolution in technology and the drop in the cost of computer hardware, today we can have a single computer with multiple control units, multiple arithmetic logic units and multiple memory units. This idea is referred to as parallel processing. Like pipelining, parallel processing can improve throughput. Figure 5.24 A taxonomy of computer organization 5.81 Figure 5.26 SISD organization 5.82 Figure 5.27 SIMD organization 5.83 Figure 5.28 MISD organization 5.84 Figure 5.29 MIMD organization 5.85 5-7 A SIMPLE COMPUTER To explain the architecture of computers as well as their instruction processing, we introduce a simple (unrealistic) computer, as shown in Figure 5.30. Our simple computer has three components: CPU, memory and an input/output subsystem. 5.86 Figure 5.30 The components of a simple computer 5.87 Instruction set Our simple computer is capable of having a set of sixteen instructions, although we are using only fourteen of these instructions. Each computer instruction consists of two parts: the operation code (opcode) and the operand (s). The opcode specifies the type of operation to be performed on the operand (s). Each instruction consists of sixteen bits divided into four 4-bit fields. The leftmost field contains the opcode and the other three fields contains the operand or address of operand (s), as shown in Figure 5.31. 5.88 Figure 5.31 Format and different instruction types 5.89 Op-code Mnemonic Function Example Load the value of the operand into the 001 LOAD Accumulator LOAD 10 010 STORE Store the value of the Accumulator at the STORE 8 address specified by the operand 011 ADD Add the value of the operand to the ADD #5 Accumulator 100 SUB Subtract the value of the operand from the SUB #1 Accumulator If the value of the operand equals the 101 EQUAL value of the Accumulator, skip the next EQUAL #20 instruction Jump to a specified instruction by setting 110 JUMP the Program Counter to the value of the JUMP 6 operand 111 HALT Stop execution HALT 5.90 Assembly # Machine code Description code 0 001 1 000010 LOAD #2 Load the value 2 into the Accumulator 010 0 001101 STORE 13 Store the value of the Accumulator in memory 1 location 13 2 001 1 000101 LOAD #5 Load the value 5 into the Accumulator 010 0 001110 STORE 14 Store the value of the Accumulator in memory 3 location 14 001 0 001101 LOAD 13 Load the value of memory location 13 into the 4 Accumulator 011 0 001110 ADD 14 Add the value of memory location 14 to the 5 Accumulator 010 0 001111 STORE 15 Store the value of the Accumulator in memory 6 location 15 7 111 0 000000 HALT Stop execution Processing the instructions Our simple computer, like most computers, uses machine cycles. A cycle is made of three phases: fetch, decode and execute. During the fetch phase, the instruction whose address is determined by the PC is obtained from the memory and loaded into the IR. The PC is then incremented to point to the next instruction. During the decode phase, the instruction in IR is decoded and the required operands are fetched from the register or from memory. During the execute phase, the instruction is executed, and the results are placed in the appropriate memory location or the register. Once the third phase is completed, the control unit starts the cycle again, but now the PC is pointing to the next instruction. The process continues until the CPU reaches a HALT instruction. 5.92 5.93 An example Let us show how our simple computer can add two integers A and B and create the result as C. We assume that integers are in two’s complement format. Mathematically, we show this operation as: We assume that the first two integers are stored in memory locations (40) and (41) and the 16 16 result should be stored in memory location (42). To do the simple addition needs five instructions, as shown next: 16 5.94 In the language of our simple computer, these five instructions are encoded as: 5.95 Storing program and data We can store the five-line program in memory starting from location (00)to (04). We 16 16 already know that the data needs to be stored in memory locations (40) , (41) , and 16 16 (42). 16 Cycles Our computer uses one cycle per instruction. If we have a small program with five instructions, we need five cycles. We also know that each cycle is normally made up of three steps: fetch, decode, execute. Assume for the moment that we need to add 161 + 254 = 415. The numbers are shown in memory in hexadecimal is, (00A1) , (00FE) , and (019F). 16 16 16 5.96 Figure 5.32 Status of cycle 1 5.97 Figure 5.33 Status of cycle 2 5.98 Figure 5.34 Status of cycle 3 5.99 Figure 5.35 Status of cycle 4 5.100 Figure 5.36 Status of cycle 5 5.101 Another example In the previous example we assumed that the two integers to be added were already in memory. We also assume that the result of addition will be held in memory. You may ask how we can store the two integers we want to add in memory, or how we use the result when it is stored in the memory. In a real situation, we enter the first two integers into memory using an input device such as keyboard, and we display the third integer through an output device such as a monitor. Getting data via an input device is normally called a read operation, while sending data to an output device is normally called a write operation. To make our previous program more practical, we need modify it as follows: 5.102 In our computer we can simulate read and write operations using the LOAD and STORE instruction. Furthermore, LOAD and STORE read data input to the CPU and write data from the CPU. We need two instructions to read data into memory or write data out of memory. 5.103 The read operation is: The write operation is: i The input operation must always read data from an input device into memory: the output operation must always write data from memory to an output device. 5.104 The program is coded as: Operations 1 to 4 are for input and operations 9 and 10 are for output. When we run this program, it waits for the user to input two integers on the keyboard and press the enter key. The program then calculates the sum and displays the result on the monitor. 5.105

Computer Organization PDF

Document Details

Tags

Related

Summary

Full Transcript