UNIT 4 Processors PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of MIPS instruction execution, including memory-reference, arithmetic-logical, and branch instructions. It also explains the role of the ALU (Arithmetic Logic Unit) and how these different types of instructions operate.
Full Transcript
UNIT - IV PROCESSORS Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control, Micro programmed Control – Pipelining – Data Hazard – Control Hazards. 1. BASIC MIPS IMPLEMENTATION A Basic MIPS...
UNIT - IV PROCESSORS Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control, Micro programmed Control – Pipelining – Data Hazard – Control Hazards. 1. BASIC MIPS IMPLEMENTATION A Basic MIPS implementation includes a subset of the core MIPS instruction set. The MIPS instruction set are divided in to three classes : Memory-reference instructions - load word (lw) , store word (sw) Arithmetic-logical instructions - add, sub, and, or ,slt Branch instructions – beq , jump (j). For every instruction, the first two steps are identical: 1. Fetch Instruction 2. Fetch Operands The remaining steps depend on the instruction class. Use of ALU in MIPS: The memory-reference instructions uses the ALU for an memory address calculation The arithmetic-logical instructions uses the ALU for operation execution The branch instruction uses the ALU for comparison. After using the ALU, A memory-reference instruction will need to access the memory either to read data for a load or write data for a store. An arithmetic-logical or load instruction must write the data from the ALU or memory back into a register. A branch instruction, change the next instruction address based on the comparison; otherwise, the PC should be incremented by 4 to get the address of the next instruction. WORKING OF A BASIC MIPS IMPLEMENTATION All instructions start by using the program counter to supply the instruction address to the instruction memory. After the instruction is fetched, the register operands used by an instruction are specified by fields of that instruction. Page 1 Once the register operands have been fetched, they can be operated on to compute a memory address (for a load or store), to compute an arithmetic result (for an integer arithmetic-logical instruction), or a compare (for a branch). If the instruction is an arithmetic-logical instruction, the result from the ALU must be written to a register. If the operation is a load or store, the ALU result is used as an address to either store a value from the registers or load a value from memory into the registers. The result from the ALU or memory is written back into the register file. Branches require the use of the ALU output to determine the next instruction address, which comes either from the ALU (where the PC and branch offset are summed) or from an adder that increments the current PC by 4. 2. BUILDING A MIPS DATAPATH Data path design begins in examining the major components required to execute each class of MIPS instructions. The major components required to execute each class of MIPS instruction are called as data path elements. A data path element is a unit used to operate on or hold data within a processor. In the MIPS implementation, the data path elements include Instruction Memory Data Memory Register File ALU Adders Page 2 Building a MIPS data path consists of 1. DataPath for Fetching the instruction and incrementing the PC 2. DataPath for Executing arithmetic and logic instructions 3. Datapath for Executing a memory-reference instruction 4. DataPath for Executing a branch instruction 1. DATAPATH FOR FETCHING THE INSTRUCTION AND INCREMENTING THE PC A memory unit to store the instructions of a program and supply instructions given an address. The program counter is used to hold the address of the current instruction. An adder to increment the PC to the address of the next instruction. To execute any instruction, fetch the instruction from memory. To fetch the next instruction, increment the program counter so that it points at the next instruction, 4 bytes later. Combined all three elements into single stage Page 3 2. DATAPATH FOR EXECUTING ARITHMETIC AND LOGIC INSTRUCTIONS (R-Type) The processor’s 32 general-purpose registers are stored in a structure called a register file. A register file is a collection of registers in which any register can be read or written by specifying the number of the register in the file. An ALU is used to operate on the values read from the registers. It reads two registers, performs an ALU operation on the contents of the registers, and write the result to a register. These instructions are either called R-type instructions or arithmetic logical instructions. This instruction class includes add, sub, AND, OR, and slt. R-format Instruction Operations : 1. Read the two register operands 2. Perform the arithmetic/logical operation 3. Write the register result Combined two elements into single stage Page 4 3. DATAPATH FOR EXECUTING A MEMORY-REFERENCE INSTRUCTION The MIPS load word and store word instructions have the general form (i) lw $t1,offset($t2) (ii) sw $t1,offset ($t2). These instructions compute a memory address by adding the base register, which is $t2, to the 16-bit signed offset field contained in the instruction. If the instruction is a load, the value read from memory must be written into the register file in the specified register, which is $t1.Thus, we need both the register file and the ALU. If the instruction is a store, the value to be stored must also be read from the register file where it resides in $t1. In addition, a unit to sign-extend the 16-bit offset field in the instruction to a 32-bit signed value, and a data memory unit to read from or write to. The data memory must be written on store instructions; hence, it has both read and write control signals, an address input, as well as an input for the data to be written into memory. Load/Store Instructions Operations : 1. Read register operands 2. Calculate the memory address using 16-bit offset - Use ALU with sign-extend offset shifted left 2 times 3. Load: Read memory and update register ($t1) 4. Store: Write register value to memory ($t2 + offset) 4. DATAPATH FOR EXECUTING A BRANCH INSTRUCTION The general form of a Branch Instruction is beq $t1,$t2,offset. The branch datapath does two operations: Compute the branch target address and Compare the register contents. Page 5 Branch Target Address = PC + 4 + Offset (Sign Extended and Shifted left 2 times) When the condition is true (operands are equal), the branch target address becomes the new PC, and we say that the branch is taken. When the condition is false(operands are not equal), the incremented PC should replace the current PC; we say that the branch is not taken. Branch Instruction Operations: 1. Read register operands 2. Compare operands Use ALU - Subtract the two operands and Check for Zero output 3. Calculate target address Sign-extend the offset value Shift left 2 times Add to PC + 4 CREATING A SINGLE (or) COMBINED DATAPATH Show how to build a datapath for the operational portion of the memory reference and arithmetic-logical instructions that uses a single register file and a single ALU to handle both types of instructions, adding any necessary multiplexors. We can combine the datapath components needed for the individual instruction classes, into a single datapath and add the control to complete the implementation. This simplest datapath will execute all instructions in one clock cycle. To share a datapath element between two different instruction classes, we may need to allow multiple Page 6 connections to the input of an element, using a multiplexor and control signal to select among the multiple inputs. Step 1 To create a datapath with only a single register file and a single ALU, we must have two different sources for the second ALU input, as well as two different sources for the data stored into the register file. Thus, one multiplexor is placed at the ALU input and another at the data input to the register file. Step 2 Combine all the pieces to make a simple datapath for the MIPS architecture by adding the Datapath for Instruction fetch Datapath for Arithmetic-Logical instructions Datapath for Memory instructions Datapath for Branch instruction Step 3 In the datapath obtained by composing separate pieces, The branch instruction uses the main ALU for comparison of the register operands, so we must keep the adder for computing the branch target address. An additional multiplexor is required to select either the sequentially following instruction address (PC + 4) or the branch target address to be written into the PC. Step 4 The control unit must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control. Page 7 Page 8 Design of Control Unit The Control Unit is classified into two major categories: 1. Hardwired Control 2. Micro programmed Control Hardwired Control The Hardwired Control organization involves the control logic to be implemented with gates, flip-flops, decoders, and other digital circuits. The following image shows the block diagram of a Hardwired Control organization. o A Hard-wired Control consists of two decoders, a sequence counter, and a number of logic gates. o An instruction fetched from the memory unit is placed in the instruction register (IR). o The component of an instruction register includes; I bit, the operation code, and bits 0 through 11. o The operation code in bits 12 through 14 are coded with a 3 x 8 decoder. o The outputs of the decoder are designated by the symbols D0 through D7. Page 9 o The operation code at bit 15 is transferred to a flip-flop designated by the symbol I. o The operation codes from Bits 0 through 11 are applied to the control logic gates. o The Sequence counter (SC) can count in binary from 0 through 15. Micro-programmed Control The Micro programmed Control organization is implemented by using the programming approach. In Micro programmed Control, the micro-operations are performed by executing a program consisting of micro-instructions. The following image shows the block diagram of a Micro programmed Control organization. o The Control memory address register specifies the address of the micro-instruction. o The Control memory is assumed to be a ROM, within which all control information is permanently stored. o The control register holds the microinstruction fetched from the memory. o The micro-instruction contains a control word that specifies one or more micro-operations for the data processor. o While the micro-operations are being executed, the next address is computed in the next address generator circuit and then transferred into the control address register to read the next microinstruction. o The next address generator is often referred to as a micro-program sequencer, as it determines the address sequence that is read from control memory. Page 10 5. PIPELINING Pipelining (or) Instruction Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is increased. The computer pipeline is divided in stages. The stages are connected to one another. Each stage completes a part of an instruction in parallel. Pipelining is widely used in modern processors. Pipelining is a particularly effective way of organizing concurrent activity in a computer system. It uses faster circuit technology to build the processor and the main memory. Advantages : Pipelining is a key to make processing fast. Pipelining improves system performance in terms of throughput. Pipelining makes the system reliable. Disadvantages: 1. The design of pipelined processor is complex and costly to manufacture. 2. The instruction latency is more. DIFFERENCE BETWEEN SEQUENTIAL EXECUTION AND PIPELINED EXECUTION SEQUENTIAL EXECUTION PIPELINED EXECUTION In the Sequential Execution, the processor In Pipelined Execution, the processor executes a program by fetching and executes a program by overlapping the executing instructions, one after another. instructions. Page 11 PIPELINED EXECUTION / ORGANIZATION 2 - STAGE PIPELINED EXECUTION Execution of a program consists of a sequence of fetch and executes steps. Let Fi and Ei refer to the fetch and execute steps for instruction Ii. A computer has two separate hardware units. They are: Instruction fetch unit Instruction execution unit The instruction fetched by the fetch unit is stored in an intermediate storage buffer. This buffer is needed to enable the execution unit to execute the instruction while the fetch unit is fetching the next instruction. The execution results are stored in the destination location specified by the instruction. The fetch and execute steps of any instruction can each be completed in one cycle. 3 - STAGE PIPELINED EXECUTION The stages are: F - Fetch : Read the instruction from the memory D - Decode : Decode the instruction and fetch the source operand(s) E - Execute : Perform the operation specified by the instruction Page 12 4 - STAGE PIPELINED EXECUTION The stages are: F - Fetch : Read the instruction from the memory D - Decode : Decode the instruction and fetch the source operand(s) E - Execute : Perform the operation specified by the instruction W - Write : Store the result in the destination location 5 - STAGE PIPELINED EXECUTION Instruction Fetch - The CPU reads instructions from the address in the memory whose value is present in the program counter. Instruction Decode - Instruction is decoded and the register file is accessed to get the values from the registers used in the instruction. Execute - ALU operations are performed. Memory Access - Memory operands are read and written from/to the memory that is present in the instruction. Write Back – Computed value is written back to the register. Page 15 6- STAGE PIPELINED EXECUTION STRUCTURAL HAZARD A structural hazard occurs when two or more instructions that are already in pipeline need the same resource. These hazards are because of conflicts due to insufficient resources. The result is that the instructions must be executed in series rather than parallel for a portion of pipeline. Structural hazards are sometime referred to as resource hazards. Example: A situation in which multiple instructions are ready to enter the execute instruction phase and there is a single ALU (Arithmetic Logic Unit). One solution to such resource hazard is to increase available resources, such as having multiple ALU. DATA HAZARD A data hazard occurs when there is a conflict in the access of an operand location. There are three types of data hazards. They are Read After Write (RAW) or True Dependency: An instruction modifies a register or memory location and a succeeding instruction reads the data in that memory or register location. A RAW hazard occurs if the read takes place before the write operation is complete. Example I1 : R2 ← R5 + R3 I2 : R4 ← R2 + R3 Page 16 Write After Read (WAR) or Anti Dependency: An instruction reads a register or memory location and a succeeding instruction writes to the location. A WAR hazard occurs if the write operation completes before the read operation takes place. Example I1 : R4 ← R1 + R5 I2 : R5 ← R1 + R2 Write After Write (WAW) or Output Dependency: Two instructions both write to the same location. A WAW hazard occurs if the write operations take place in the reverse order of the intended sequence. Example: I1 : R2 ← R4 + R7 I2 : R2 ← R1 + R3 Page 17 INSTRUCTION / CONTROL / BRANCH HAZARD An instruction (or) control (or) branch hazard, occurs when the pipeline makes the wrong decision on a branch prediction and therefore brings instructions into the pipeline that must subsequently be discarded. Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline stalls. 8. HANDLING DATA HAZARDS (or) DATA DEPENDENCY Consider the two instructions: Add R2, R3, #100 Subtract R9, R2, #30 The destination register R2 for the Add instruction is a source register for the Subtract instruction. There is a data dependency between these two instructions, because register R2 carries data from the first instruction to the second. 3 There are two techniques using which we can handle data hazards. They are (1) Using Operand Forwarding (2) Using Software Handling Data Dependencies Using Operand Forwarding Pipeline stalls due to data dependencies can be improved through the use of operand forwarding. Rather than stalling the instruction, the hardware can forward the value from result register to the ALU input through the Multiplexers. The second instruction can get data directly from the output of ALU after the previous instruction is completed. Page 18 A special arrangement needs to be made to “forward” the output of ALU to the input of ALU. Example : I1 : ADD R1,R2,R3 I2: SUB R4,R1,R5 Handling Data Dependencies Using Software An alternative approach is for detecting data dependencies and dealing with them. When the compiler identifies a data dependency between two successive instructions Ij and Ij+1, it can insert three explicit NOP (No-operation) instructions between them. The NOP’s introduce the necessary delay to enable instruction Ij+1 to read the new value from the register file after it is written. 9. HANDLING INSTRUCTION HAZARDS (or) CONTROL HAZARDS A variety of approaches have been taken for dealing with Instruction/Control/Branch Hazards.(Conditional branches) 1) Multiple Streams 2) Prefetch Branch Target Page 19 3) Loop Buffer 4) Branch Prediction 5) Delayed Branch 1) MULTIPLE STREAMS o The approach is to replicate the initial portions of the pipeline and allow the pipeline to fetch both instructions, making use of multiple streams. o There are two problems with this approach: 1. Contention delays for access to the registers and to memory. 2. Additional branch instructions may enter the pipeline before the original branch decision is resolved. 2) PREFETCH BRANCH TARGET o When a conditional branch is recognized, the target of the branch is prefetched, in addition to the instruction following the branch. o This target is then saved until the branch instruction is executed. o If the branch is taken, the target has already been prefetched. 3) LOOP BUFFER o A loop buffer is a small, very-high-speed memory maintained by the instruction fetch stage of the pipeline and containing the ‘n’ most recently fetched instructions, in sequence. o If a branch is to be taken, the hardware first checks whether the branch target is within the buffer. If so, the next instruction is fetched from the buffer. 4) BRANCH PREDICTION o To reduce the branch penalty, the processor needs to anticipate that an instruction being fetched is a branch instruction and predict its outcome to determine which instruction should be fetched. o It is generally of two types: Static Branch Prediction Dynamic Branch Prediction o Static Branch Prediction - Assume that the branch will not be taken and to fetch the next instruction in sequential address order. o Dynamic Branch Prediction - Uses the recent branch history, to see if a branch was taken the last time this instruction was executed. o Techniques for Branch Prediction Various techniques can be used to predict whether a branch will be taken. The most common are the following: Predict never taken Predict always taken Predict by opcode Page 20 Taken/not taken switch Branch history table o Branch Prediction Buffer (or) Branch History Table One implementation of that approach is a branch prediction buffer or branch history table. A branch prediction buffer is a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not. A branch predictor tells us whether or not a branch is taken, Calculates the branch target address. Using a cache to hold the branch target buffer. o Branch Prediction Flowchart If the instruction is predicted as taken, fetching begins from the target as soon as the PC is known; it can be as early as the ID stage. If the instruction is predicted as not taken, sequential fetching and executing continue. If the prediction turns out to be wrong, the prediction bits are changed. Page 21 o Types Of Branch Predictor 1. Correlating Predictor - Combines local behavior and global behavior of a particular branch. 2. Tournament Predictor - Makes multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch. 5) DELAYED BRANCH o In MIPS, branches are delayed. o This means that the instruction immediately following the branch is always executed, independent of whether the branch condition is true or false. This is known as Branch Folding Technique. o When the condition is false, the execution looks like a normal branch. o When the condition is true, a delayed branch first executes the instruction immediately following the branch in sequential instruction order before jumping to the specified branch target address. Page 22