Computer Organization and Architecture PDF

Computer Organization and Architecture DR. S. JEBA PRIYA Text Book: 1. William Stallings, “Computer Organization and Architecture: Designing for Performance”, Pearson Education, 11th edition, 2019, ISBN: 978-0-13- 499719-3. 2. John P. Hayes, “Computer Organization and Architecture”, McGraw Hill, 3rd edition, 2002, ISBN: 0070273553. Course Outcomes: CO1 explain function of the central processing unit CO2 develop algorithms for error correction for memory modules (main and cache memory) CO3 design and understand various input and output modules for central processing unit CO4 select and use standard addressing modes for logical and physical memory addressing CO5 list and define various stages of instruction pipelining in processor CO6 explore various ways to implementing the micro instruction sequencing and execution Module 1: Introduction to Computer Architecture Introduction of computer organization and architecture, A top level view of computer function and interconnection Computer 1. A computer is a programmable electronic device that accepts raw data as input and processes it with a set of instructions (a program) to produce the result as output. 2. It renders output just after performing mathematical and logical operations and can save the output for future use. It can process numerical as well as non-numerical calculations. 3. The term "computer" is derived from the Latin word "computare" which means to calculate. The basic parts without which a computer cannot work are as follows: Processor: It executes instructions from software and hardware. Memory: It is the primary memory for data transfer between the CPU and storage. Motherboard: It is the part that connects all other parts or components of a computer. Storage Device: It permanently stores the data, e.g., hard drive. Input Device: It allows you to communicate with the computer or to input data, e.g., a keyboard. Output Device: It enables you to see the output, e.g., monitor. computer can be divided into five types:: 1. Micro Computer 2. Mini Computer 3. Mainframe Computer 4. Super Computer 5. Workstations Computer Organization and Architecture, Structure and Function DR. S. JEBA PRIYA Organization and Architecture Computer Architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical execution of a program. A term that is often used interchangeably with computer architecture is instruction set architecture (ISA)  The ISA defines instruction formats, instruction opcodes, registers, instruction and data memory; the effect of executed instructions on the registers and memory; and an algorithm for controlling instruction execution Computer Architecture the instruction set the number of bits used to represent various data types I/O mechanisms memory addressing techniques Computer Architecture Computer Organization refers to the operational units and their interconnections that realize the architectural specifications.  refers to the operational units and their interconnections that realize the architectural specifications. Examples of architectural attributes include the instruction set, the number of bits used to represent various data types (e.g., numbers, characters), I/O mechanisms, and techniques for addressing memory.  Organizational attributes include those hardware details transparent to the programmer, such as control signals; interfaces between the computer and peripherals; and the memory technology used. Computer Organization control signals interfaces between computer and peripherals the memory technology being used Computer Organization Organization and Architecture Structure and Function Structure: The way in which the components are interrelated. Function: The operation of each individual component as part of the structure Structure and Function All computer functions are: Data processing: Computer must be able to process data which may take a wide variety of forms and the range of processing. Data storage: Computer stores data either temporarily or permanently. Data movement: Computer must be able to move data between itself and the outside world. Control: There must be a control of the above three functions Four main structural components: Central processing unit (CPU) Main memory I / O System interconnections CPU structural components: Control unit Arithmetic and logic unit (ALU) Registers CPU interconnections CPU structural components: Control unit Arithmetic and logic unit (ALU) Registers CPU interconnections Simple Single-processor Computer Top level view of computer function and interconnection: Computer component DR. S. JEBA PRIYA von Neumann architecture and is based on three key concepts: Data and instructions are stored in a single read–write memory. The contents of this memory are addressable by location, without regard to the type of data contained there. Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next. Hardware and Software Approaches Hardware and Software Approaches  Two major components of the system: an instruction interpreter and a module of general-purpose arithmetic and logic functions. These two constitute the CPU. Several other components are needed to yield a functioning computer. Data and instructions must be put into the system. For this we need some sort of input module. This module contains basic components for accepting data and instructions in some form and converting them into an internal form of signals usable by the system. A means of reporting results is needed, and this is in the form of an output module. Taken together, these are referred to as I/O components. Computer Components: Top-Level View Computer Components: Top-Level View MAR (Memory Address Register) MBR (Memory Buffer Register) PC (Program Counter) IR (Instruction Register) AC (Accumulator-Temporary Register) I/O AR (Input-Output Address Register) I/O BR (Input-Output Buffer Register). Computer Components: Top-Level View a memory address register (MAR) - which specifies the address in memory for the next read or write  memory buffer register (MBR) - which contains the data to be written into memory or receives the data read from memory  I/O address register (I/OAR) - specifies a particular I/O device. I/O buffer register (I/OBR) - used for the exchange of data between an I/O module and the CPU. Computer Function The basic function performed by a computer is execution of a program, which consists of a set of instructions stored in memory. The processor does the actual work by executing instructions specified in the program. This section provides an overview of the key elements of program execution. In its simplest form, instruction processing consists of two steps. The processor reads (fetches) instructions from memory one at a time, then executes each instruction. Program execution consists of repeating the process of instruction fetch and instruction execution. Basic Instruction Cycle Basic Instruction Cycle The two steps are referred to as the fetch cycle and the execute cycle. Program execution halts only if the machine is turned off, some sort of unrecoverable error occurs, or a program instruction that halts the computer is encountered. Example: MOV AL,03 MOV BL,04 ADD AL,BL MOV ,AL HLT Instruction Fetch and Execute At the beginning of each instruction cycle, the processor fetches an instruction from memory. In a typical processor, a register called the program counter (PC) holds the address of the instruction to be fetched next. Unless told otherwise, the processor always increments the PC after each instruction fetch so that it will fetch the next instruction in sequence (i.e., the instruction located at the next higher memory address). Instruction Fetch and Execute The fetched instruction is loaded into a register in the processor known as the instruction register (IR). The instruction contains bits that specify the action the processor is to take. The processor interprets the instruction and performs the required action. Execute Cycle Processor-memory: Data may be transferred from processor to memory or from memory to processor. Processor-I/O: Data may be transferred to or from a peripheral device by transferring between the processor and an I/O module. Data processing: The processor may perform some arithmetic or logic operation on data. Control: An instruction may specify that the sequence of execution be altered. For example, the processor may fetch an instruction from location 149, which specifies that the next instruction be from location 182. (JUMP) Characteristics of a Hypothetical Machine Example of Program Execution Example of Program Execution Example of Program Execution Example of Program Execution 1. The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the instruction register IR, and the PC is incremented. Note that this process involves the use of a memory address register and a memory buffer register. For simplicity, these intermediate registers are ignored. 2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded. The remaining 12 bits (three hexadecimal digits) specify the address (940) from which data are to be loaded. 3. The next instruction (5941) is fetched from location 301, and the PC is incremented. 4. The old contents of the AC and the contents of location 941 are added, and the result is stored in the AC. 5. The next instruction (2941) is fetched from location 302, and the PC is incremented. 6. The contents of the AC are stored in location 94 Instruction Cycle State Diagram  States in the upper part involve an exchange between the processor and either memory or an I/O module.  States in the lower part of the diagram involve only internal processor operations. Instruction Cycle State Diagram 1. Instruction address calculation (iac) 2. Instruction fetch (if) 3. Instruction operation decoding (iod) 4. Operand address calculation (oac) 5. Operand fetch (of) 6. Data operation (do) 7. Operand store (os) Instruction Cycle State Diagram Instruction address calculation (iac) Instruction address calculation (iac): Determine the address of the next instruction to be executed. Usually, this involves adding a fixed number to the address of the previous instruction. Instruction fetch (if) Read instruction from its memory location into the processor. Instruction operation decoding (iod) Analyze instruction to determine type of operation to be performed and operand(s) to be used. Operand address calculation (oac) If the operation involves reference to an operand in memory or available via I/O, then determine the address of the operand. Instruction Cycle State Diagram Operand fetch (of) Fetch the operand from memory or read it in from I/O. Data operation (do) Perform the operation indicated in the instruction. Operand store (os) Write the result into memory or out to I/O. Interrupts Virtually all computers provide a mechanism by which other modules (I/O, memory) may interrupt the normal processing of the processor. No - Interrupts Interrupts – shot I/O wait Interrupts – long I/O wait Transfer of Control via Interrupts Program Timing: Short I/O Wait Program Timing: Long I/O Wait Instruction Cycle with Interrupts Instruction Cycle State Diagram, with Interrupts MULTIPLE INTERRUPTS The discussion so far has focused only on the occurrence of a single interrupt. Suppose, however, that multiple interrupts can occur. For example, a program may be receiving data from a communications line and printing results. The printer will generate an interrupt every time it completes a print operation. The communication line controller will generate an interrupt every time a unit of data arrives. The unit could either be a single character or a block, depending on the nature of the communications discipline. In any case, it is possible for a communications interrupt to occur while a printer interrupt is being processed. MULTIPLE INTERRUPTS Two approaches can be taken to dealing with multiple interrupts. The first is to disable interrupts while an interrupt is being processed. A disabled interrupt simply means that the processor can and will ignore that interrupt request signal. If an interrupt occurs during this time, it generally remains pending and will be checked by the processor after the processor has enabled interrupts. Thus, when a user program is executing and an interrupt occurs, interrupts are disabled immediately. After the interrupt handler routine completes, interrupts are enabled before resuming the user program, and the processor checks to see if additional interrupts have occurred. Sequential Interrupt Processing Nested Interrupt Processing Example Time Sequence of Multiple Interrupts Interconnection Structures DR. A. KETHSY PRABAVATHY Interconnection Structures A computer consists of a set of components or modules of three basic types (processor, memory, I/O) that communicate with each other. In effect, a computer is a network of basic modules. Thus, there must be paths for connecting the modules. The collection of paths connecting the various modules is called the interconnection structure. The design of this structure will depend on the exchanges that must be made among modules. Interconnection Structures Memory: Typically, a memory module will consist of N words of equal length. Each word is assigned a unique numerical address. A word of data can be read from or written into the memory. The nature of the operation is indicated by read and write control signals. The location for the operation is specified by an address. Interconnection Structures I/O module: From an internal (to the computer system) point of view, I/O is functionally similar to memory. There are two operations; read and write. Further, an I/O module may control more than one external device. We can refer to each of the interfaces to an external device as a port and give each a unique address. In addition, there are external data paths for the input and output of data with an external device. Finally, an I/O module may be able to send interrupt signals to the processor. Interconnection Structures Processor: The processor reads in instructions and data, writes out data after processing, and uses control signals to control the overall operation of the system. It also receives interrupt signals. The preceding list defines the data to be exchanged. Interconnection Structures The interconnection structure must support the following types of transfers: Memory to processor: The processor reads an instruction or a unit of data from memory. Processor to memory: The processor writes a unit of data to memory. I/O to processor: The processor reads data from an I/O device via an I/O module. Processor to I/O: The processor sends data to the I/O device. I/O to or from memory: For these two cases, an I/O module is allowed to exchange data directly with memory, without going through the processor, using direct memory access. Interconnection Structures Over the years, a number of interconnection structures have been tried. By far the most common are (1) the bus and various multiple-bus structures, and (2) point-to-point interconnection structures with packetized data transfer. Bus Interconnection The bus was the dominant means of computer system component interconnection for decades. For general-purpose computers, it has gradually given way to various point-to-point interconnection structures, which now dominate computer system design. However, bus structures are still commonly used for embedded systems, particularly microcontrollers. Bus Interconnection A bus is a communication pathway connecting two or more devices. A key characteristic of a bus is that it is a shared transmission medium. Multiple devices connect to the bus, and a signal transmitted by any one device is available for reception by all other devices attached to the bus. If two devices transmit during the same time period, their signals will overlap and become garbled. Thus, only one device at a time can successfully transmit. Bus Interconnection Typically, a bus consists of multiple communication pathways, or lines. Each line is capable of transmitting signals representing binary 1 and binary 0. Over time, a sequence of binary digits can be transmitted across a single line. Taken together, several lines of a bus can be used to transmit binary digits simultaneously (in parallel). For example, an 8-bit unit of data can be transmitted over eight bus lines. Bus Interconnection Computer systems contain a number of different buses that provide pathways between components at various levels of the computer system hierarchy. A bus that connects major computer components (processor, memory, I/O) is called a system bus. The most common computer interconnection structures are based on the use of one or more system buses. A system bus consists, typically, of from about fifty to hundreds of separate lines. Each line is assigned a particular meaning or function. Although there are many different bus designs, on any bus the lines can be classified into three functional groups : data, address, and control lines. Bus Interconnection Bus Interconnection – Data lines The data lines provide a path for moving data among system modules. These lines, collectively, are called the data bus. The data bus may consist of 32, 64, 128, or even more separate lines, the number of lines being referred to as the width of the data bus. Because each line can carry only one bit at a time, the number of lines determines how many bits can be transferred at a time. The width of the data bus is a key factor in determining overall system performance. For example, if the data bus is 32 bits wide and each instruction is 64 bits long, then the processor must access the memory module twice during each instruction cycle. Bus Interconnection – Address lines The address lines are used to designate the source or destination of the data on the data bus. For example, if the processor wishes to read a word (8, 16, or 32 bits) of data from memory, it puts the address of the desired word on the address lines. Clearly, the width of the address bus determines the maximum possible memory capacity of the system. Furthermore, the address lines are generally also used to address I/O ports. Typically, the higher-order bits are used to select a particular module on the bus, and the lower-order bits select a memory location or I/O port within the module. For example, on an 8-bit address bus, address 01111111 and below might reference locations in a memory module (module 0) with 128 words of memory, and address 10000000 and above refer to devices attached to an I/O module (module 1). Bus Interconnection – Control lines Memory write: causes data on the bus to be written into the addressed location. Memory read: causes data from the addressed location to be placed on the bus. I/O write: causes data on the bus to be output to the addressed I/O port. I/O read: causes data from the addressed I/O port to be placed on the bus. Transfer ACK: indicates that data have been accepted from or placed on the bus. Bus request: indicates that a module needs to gain control of the bus. Bus grant: indicates that a requesting module has been granted control of the bus. Interrupt request: indicates that an interrupt is pending. Interrupt ACK: acknowledges that the pending interrupt has been recognized. Clock: is used to synchronize operations. Reset: initializes all modules. Bus Interconnection – The operation of the bus is as follows. If one module wishes to send data to another, it must do two things: (1) obtain the use of the bus, and (2) transfer data via the bus. If one module wishes to request data from another module, it must (1) obtain the use of the bus, and (2) transfer a request to the other module over the appropriate control and address lines. It must then wait for that second module to send the data. Point-to-Point Interconnect The shared bus architecture was the standard approach to interconnection between the processor and other components (memory, I/O, and so on) for decades. But contemporary systems increasingly rely on point-to-point interconnection rather than shared buses. The principal reason driving the change from bus to point-to-point interconnect was the electrical constraints encountered with increasing the frequency of wide synchronous buses. At higher and higher data rates, it becomes increasingly difficult to perform the synchronization and arbitration functions in a timely fashion. Point-to-Point Interconnect Intel’s QuickPath Interconnect (QPI), which was introduced in 2008. The following are significant characteristics of QPI and other point-to-point interconnect schemes: Multiple direct connections: Multiple components within the system enjoy direct pairwise connections to other components. This eliminates the need for arbitration found in shared transmission systems. Layered protocol architecture: As found in network environments, such as TCP/IP- based data networks, these processor-level interconnects use a layered protocol architecture, rather than the simple use of control signals found in shared bus arrangements. Packetized data transfer: Data are not sent as a raw bit stream. Rather, data are sent as a sequence of packets, each of which includes control headers and error control codes. Intel’s QuickPath Interconnect (QPI) Point-to-Point Interconnect In addition, QPI is used to connect to an I/O module, called an I/O hub (IOH). The IOH acts as a switch directing traffic to and from I/O devices. Typically in newer systems, the link from the IOH to the I/O device controller uses an interconnect technology called PCI Express (PCIe) The IOH translates between the QPI protocols and formats and the PCIe protocols and formats. A core also links to a main memory module (typically the memory uses dynamic access random memory (DRAM) technology) using a dedicated memory bus. QPI is defined as a four-layer protocol architecture QPI Physical: Consists of the actual wires carrying the signals, as well as circuitry and logic to support ancillary features required in the transmission and receipt of the 1s and 0s. The unit of transfer at the Physical layer is 20 bits, which is called a Phit (physical unit). Link: Responsible for reliable transmission and flow control. The Link layer’s unit of transfer is an 80-bit Flit (flow control unit). Routing: Provides the framework for directing packets through the fabric. Protocol: The high-level set of rules for exchanging packets of data between devices. A packet is comprised of an integral number of Flits. QPI Physical Layer QPI Physical Layer The QPI port consists of 84 individual links grouped as follows. Each data path consists of a pair of wires that transmits data one bit at a time; the pair is referred to as a lane. There are 20 data lanes in each direction (transmit and receive), plus a clock lane in each direction. Thus, QPI is capable of transmitting 20 bits in parallel in each direction. The 20-bit unit is referred to as a phit. Typical signaling speeds of the link in current products calls for operation at 6.4 GT/s (transfers per second). At 20 bits per transfer, that adds up to 16 GB/s, and since QPI links involve dedicated bidirectional pairs, the total capacity is 32 GB/s. The lanes in each direction are grouped into four quadrants of 5 lanes each. In some applications, the link can also operate at half or quarter widths in order to reduce power consumption or work around failures. QPI Physical Layer The form of transmission on each lane is known as differential signaling, or balanced transmission. With balanced transmission, signals are transmitted as a current that travels down one conductor and returns on the other. The binary value depends on the voltage difference. Typically, one line has a positive voltage value and the other line has zero voltage, and one line is associated with binary 1 and the other is associated with binary 0. Specifically, the technique used by QPI is known as low-voltage differential signaling (LVDS). QPI Link Layer The QPI link layer performs two key functions: flow control and error control. These functions are performed as part of the QPI link layer protocol, and operate on the level of the flit (flow control unit). Each flit consists of a 72-bit message payload and an 8-bit error control code called a cyclic redundancy check (CRC). A flit payload may consist of data or message information. The data flits transfer the actual bits of data between cores or between a core and an IOH. QPI Link Layer – flow control function The flow control function is needed to ensure that a sending QPI entity does not overwhelm a receiving QPI entity by sending data faster than the receiver can process the data and clear buffers for more incoming data. To control the flow of data, QPI makes use of a credit scheme. During initialization, a sender is given a set number of credits to send flits to a receiver. Whenever a flit is sent to the receiver, the sender decrements its credit counters by one credit. Whenever a buffer is freed at the receiver, a credit is returned to the sender for that buffer. Thus, the receiver controls that pace at which data is transmitted over a QPI link. QPI Link Layer – error control function  A bit transmitted at the physical layer is changed during transmission, due to noise or some other phenomenon. The error control function at the link layer detects and recovers from such bit errors, and so isolates higher layers from experiencing bit errors. QPI Routing Layer  The routing layer is used to determine the course that a packet will traverse across the available system interconnects. Routing tables are defined by firmware and describe the possible paths that a packet can follow. In small configurations, such as a two-socket platform, the routing options are limited and the routing tables quite simple. For larger systems, the routing table options are more complex, giving the flexibility of routing and rerouting traffic depending on how (1) devices are populated in the platform, (2) system resources are partitioned, and (3) reliability events result in mapping around a failing resource. QPI Protocol Layer  In this layer, the packet is defined as the unit of transfer. The packet contents definition is standardized with some flexibility allowed to meet differing market segment requirements. One key function performed at this level is a cache coherency protocol, which deals with making sure that main memory values held in multiple caches are consistent. A typical data packet payload is a block of data being sent to or from a cache. PCI Express DR. A. KETHSY PRABAVATHY PCI Express The peripheral component interconnect (PCI) is a popular high-bandwidth, processor-independent bus that can function as a mezzanine or peripheral bus.  Compared with other common bus specifications, PCI delivers better system performance for high-speed I/O subsystems (e.g., graphic display adapters, network interface controllers, and disk controllers). Accordingly, a new version, known as PCI Express (PCIe) has been developed. PCIe, as with QPI, is a point-to-point interconnect scheme intended to replace bus-based schemes such as PCI. A key requirement for PCIe is high capacity to support the needs of higher data rate I/O devices, such as Gigabit Ethernet. Another requirement deals with the need to support time-dependent data streams PCI Physical and Logical Architecture PCI Physical and Logical Architecture A root complex device, also referred to as a chipset or a host bridge, connects the processor and memory subsystem to the PCI Express switch fabric comprising one or more PCIe and PCIe switch devices. The root complex acts as a buffering device, to deal with differences in data rates between I/O controllers and memory and processor components. The root complex also translates between PCIe transaction formats and the processor and memory signal and control requirements. The chipset will typically support multiple PCIe ports, some of which attach directly to a PCIe device, and one or more that attach to a switch that manages multiple PCIe streams. PCI Physical and Logical Architecture PCIe links from the chipset may attach to the following kinds of devices that implement PCIe: 1. Switch: The switch manages multiple PCIe streams. 2. PCIe endpoint: An I/O device or controller that implements PCIe, such as a Gigabit ethernet switch, a graphics or video controller, disk interface, or a communications controller. 3. Legacy endpoint: Legacy endpoint category is intended for existing designs that have been migrated to PCI Express, and it allows legacy behaviors such as use of I/O space and locked transactions. PCI Express endpoints are not permitted to require the use of I/O space at runtime and must not use locked transactions. 4. PCIe/PCI bridge: Allows older PCI devices to be connected to PCIe-based systems. As with QPI, PCIe interactions are defined using a protocol architecture PCIe Protocol Layers PCIe Protocol Layers 1. Physical: Consists of the actual wires carrying the signals, as well as circuitry and logic to support ancillary features required in the transmission and receipt of the 1s and 0s. 2. Data link: Is responsible for reliable transmission and flow control. Data packets generated and consumed by the DLL are called Data Link Layer Packets (DLLPs). 3. Transaction: Generates and consumes data packets used to implement load/store data transfer mechanisms and also manages the flow control of those packets between the two components on a link. Data packets generated and consumed by the TL are called Transaction Layer Packets (TLPs). PCIe Physical Layer Similar to QPI, PCIe is a point-to-point architecture. Each PCIe port consists of a number of bidirectional lanes (note that in QPI, the lane refers to transfer in one direction only). Transfer in each direction in a lane is by means of differential signaling over a pair of wires. A PCI port can provide 1, 4, 6, 16, or 32 lanes. In what follows, we refer to the PCIe 3.0 specification, introduced in late 2010. PCIe Transaction Layer The transaction layer (TL) receives read and write requests from the software above the TL and creates request packets for transmission to a destination via the link layer. Most transactions use a split transaction technique, which works in the following fashion. A request packet is sent out by a source PCIe device, which then waits for a response, called a completion packet. The completion following a request is initiated by the completer only when it has the data and/or status ready for delivery. Each packet has a unique identifier that enables completion packets to be directed to the correct originator. With the split transaction technique, the completion is separated in time from the request, in contrast to a typical bus operation in which both sides of a transaction must be available to seize and use the bus. Between the request and the completion, other PCIe traffic may use the link. PCIe Transaction Layer The TL supports four address spaces: 1. Memory: The memory space includes system main memory. It also includes PCIe I/O devices. Certain ranges of memory addresses map into I/O devices. 2. I/O: This address space is used for legacy PCI devices, with reserved memory address ranges used to address legacy I/O devices. 3. Configuration: This address space enables the TL to read/write configuration registers associated with I/O devices. 4. Message: This address space is for control signals related to interrupts, error handling, and power management PCIe Data Link Layer The purpose of the PCIe data link layer is to ensure reliable delivery of packets across the PCIe link. The DLL participates in the formation of TLPs and also transmits DLLPs. Data link layer packets originate at the data link layer of a transmitting device and terminate at the DLL of the device on the other end of the link. A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data.

Computer Organization and Architecture PDF

Document Details

Tags

Related

Summary

Full Transcript