Single-Cycle Datapath Performance PDF
Document Details
Uploaded by KindlyOrbit4224
Tags
Summary
This document details the design, control, and performance analysis of a single-cycle processor architecture. Focused on how different instruction types utilize functional units and affect overall processing time. The material targets computer architecture topics.
Full Transcript
What Will We Cover Today? Single Cycle Processor ALU control logic How to support a new instruction Performance evaluation 1 Design of the ALU Control Unit ALU used for R-type F depends on funct field ALU control Function...
What Will We Cover Today? Single Cycle Processor ALU control logic How to support a new instruction Performance evaluation 1 Design of the ALU Control Unit ALU used for R-type F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 2 ALU Control for R-Type 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct7 funct3 ALU function ALU control R-type 10 add 0000000 000 add 0010 subtract 0100000 000 subtract 0110 AND 0000000 111 AND 0000 OR 0000000 110 OR 0001 3 ALU Control for lw/sw Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control ALU used for Load/Store: F = add opcode ALUOp Operation funct7 funct3 ALU function ALU control lw 00 load word XXXXXXX XXX add 0010 sw 00 store word XXXXXXX XXX add 0010 R-type 10 add 0000000 000 add 0010 subtract 0100000 000 subtract 0110 AND 0000000 111 AND 0000 OR 0000000 110 OR 0001 ALU Control for beq beq F = sub lw/sw F = add R-type F depends on funct3/funct7 opcode ALUOp Operation funct7 funct3 ALU function ALU control beq 01 branch on equal XXXXXXX XXX sub 0110 lw 00 load word XXXXXXX XXX add 0010 sw 00 store word XXXXXXX XXX add 0010 R-type 10 add 0000000 000 add 0010 subtract 0100000 000 subtract 0110 AND 0000000 111 AND 0000 OR 0000000 110 OR 0001 The Datapath in Operation for R-type Instructions (e.g. add t1, t2, t3) 6 The Datapath in Operation for lw 7 The Datapath in Operation for beq 8 Problem: lw t1, 32(t2) Inactive blocks: Instruction [19-15]= Instruction [24-20] = Instruction [11-7]= Branch= MemRead= MemtoReg= ALUOp= MemWrite= ALUSrc= RegWrite= Review What is routed to ‘Write register’? What value comes out of ImmGen? Why does it not matter what value goes to Read register 2? Why does it not matter if the Zero flag is set or not coming out of the ALU? What other inputs are ignored due to control signals? 10 How to Add a New Instruction to a Given Datapath 1) what is the new instruction supposed to accomplish? Describe it using Register Transfer Language. 2) for any new task for the instruction, do we have all the components needed in the current processor? If not, what shall we add? 3) if any new component is added, how do we assemble it into the existing datapath? 4) do we need to add any new control signal? if yes, how to set its value for the new instruction? 5) do we need to add a new multiplexor for the support of the new instruction? if yes, where do we add it? 11 Implementing jal Instruction Implementing jal JAL saves R[rd] = PC+4PC+4 in Reg[rd] (the return address) Set=PC PC PC= +PCimmed + offset(untangled (PC-relativeand jump) shifted left 1 bit) Target somewhere within ±219 locations, 2 bytes apart − ±218 32-bit instructions Immediate encoding optimized similarly to branch instruction to reduce hardware cost 7 12 Implementing jal 13 Performance of a Single-Cycle Datapath Time needed by functional units: Memory unit (r/w): 200 ps ALU or adder: 100 ps Register file (r/w): 50 ps No delay for other units Instruction mix: 25% loads, 10% stores, 45% R-type instructions, 20% branches Compare the performance of the following 2 single cycle implementations Fixed clock cycle time (the same for all instructions) Variable clock cycle time per instruction 14 Measuring Performance of a Single Cycle Professor Determine pathway of each instruction Determine time of each pathway Max cost for fixed cycle length Actual cost (weighted average) for variable cycle length 15 Functional Units used by each Instr Class Instruction Instruction Register ALU Memory Register class Fetch Access Access Access R-type X X X X Load X X X X X Store X X X X Branch X X X 16 Required Length for each Instr Class Instruction Instruction Register ALU Data Register Total class Memory Read operation Memory Write (ps) R-type 200 50 100 50 400 Load 200 50 100 200 50 600 Store 200 50 100 200 550 Branch 200 50 100 350 The clock cycle for a machine with a single clock for all instructions will be determined by the longest instruction, which is 600 ps here 17 Variable Length Cycles Instruction Instruction Instruction Register ALU Data Register Total class Mix Memory Read operation Memory Write (ps) R-type 45% 200 50 100 50 400 Load 25% 200 50 100 200 50 600 Store 10% 200 50 100 200 550 Branch 20% 200 50 100 350 The average time per instruction with a variable clock is: v = 455ps 18 Comparing Fixed and Variable Cycle CPU performance variable clock CPU time single clock ------------------------------------ = ------------------------- CPU performance single clock CPU time variable clock CPU time = IC x CPU clock cycle x CPI IC’s are the same CPI = 1 CPU time single clock 600 -------------------------- = --------- = 1.32 CPU time variable clock 455 Variable Clock Cycle is 1.32 times faster than Fixed Clock Cycle 19 Single-Cycle Use of Components add addi beq lw sw 20% 20% 25% 25% 10% In what fraction of all cycles is the data memory used? In what fraction of all cycles is the input of the immediate-gen circuit needed (addi uses immediate-gen, what other ops use it)? What is the immediate-gen circuit doing in the cycles in which its input is not needed? 20 Summary of Single-Cycle Datapath 5 steps to design a processor 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that affects the register transfer 5. Assemble the control logic RISC-V makes it easier Instructions same size Source registers always in same place Immediates minimize placement differences Operations always on registers/immediates Single cycle datapath => CPI=1, CCT = long 21