MIPS Microarchitecture Lecture Notes PDF

MIPS Microarchitecture 471029: Introduction to Computer Architecture 14th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 1 Questions  Let’s assume we have an ISA  How do we implement it?  i.e., how do we design a system that obeys the hardware/software interface? 2 Introduction  Performance factors  Instruction count  Determined by ISA and compiler  CPI(Cycle Per Instruction) and Cycle time  Determined by CPU hardware  We will study two MIPS implementations  A simplified version (aka. single-cycle uArch)  A more realistic pipelined version  We will illustrate the key principles used in creating a datapath and designing the control with a simple subset of the core MIPS instruction sets  Memory reference: lw, sw  Arithmetic/logical: add, sub, and, or, slt  Control transfer: beq, j 3 How Does a Machine Process Instructions?  What does processing an instruction mean?  Remember the von Neumann model AS = Architectural (programmer visible) state before an instruction is processed Process instruction AS’ = Architectural (programmer visible) state after an instruction is processed  Processing an instruction: transforming AS to AS’ according to the ISA specification of the instruction  you can see the ISA specification of MIPS on the green sheet 4 The “Process instruction” Step  ISA specifies abstractly what AS’ should be, given an instruction and AS  It defines an abstract finite state machine where  State = programmer-visible state  Next-state logic = instruction execution specification  From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution  One state transition per instruction  Microarchitecture implements how AS Is transformed to AS’  There are many choices in implementation  We can have programmer-invisible state to optimize the speed of instruction execution: multiple state transitions per instruction  Choice 1: AS  AS’ (transform AS to AS’ in a single clock cycle)  Choice 2: AS  AS + MS1  AS + MS2  AS + MS3  AS’ (take multiple clock cycles to transform AS to AS’) 5 A Very Basic Instruction Processing Engine  Each instruction takes a single clock cycle to execute  Only combinational logic is used to implement instruction execution  No intermediate, programmer-invisible state updates AS = Architectural (programmer visible) state at the beginning of a clock cycle Process instruction in one clock cycle AS’ = Architectural (programmer visible) state at the end of a clock cycle 6 A Very Basic Instruction Processing Engine (cont’d)  Single-cycle machine AS’ Sequential AS Combinational Logic Logic (State)  What is the clock cycle time determined by?  What is the critical path of the combinational logic determined by? 7 Remember: Programmer Visible (Architectural) State M M M M Registers M - given special names in the ISA (as opposed to addresses) - general vs. special purpose M[N-1] Memory Program Counter array of storage locations memory address indexed by an address of the current instruction Instructions (and programs) specify how to transform the values of programmer 8 visible state 8 Single-cycle vs. Multi-cycle Machines  Single-cycle machines  Each instruction takes a single clock cycle  All state updates made at the end of an instruction’s execution  Big disadvantage: The slowest instruction determines cycle time  long clock cycle time  Multi-cycle machines  Instruction processing broken into multiple cycles/stages  State updates can be made during an instruction’s execution  But architectural state updates made only at the end of an instruction’s execution  Advantage over single-cycle: The slowest “stage” determines cycle times  Both single-cycle and multi-cycle machines literally follow the von Neumann model at the microarchitectural level 9 Instruction Processing “Cycle”  Instructions are processed under the direction of a “control unit” step by step  Instruction cycle: Sequence of steps to process an instruction  Fundamentally, there are six phases  Fetch  Decode  Evaluate Address  Fetch Operands  Execute  Store Result  Not all instructions require all six stages 10 Instruction Processing “Cycle” vs. Machine Clock Cycle  Single-cycle machine:  All six phases of the instruction processing cycle takes a single machine clock cycle to complete  Multi-cycle machine:  All six phases of the instruction processing cycle can take multiple machine clock cycles to complete  In fact, each phase can take multiple clock cycles to complete 11 Instruction Processing Viewed Another Way  Instructions transform Data (AS) to Data’ (AS’)  This transformation is done by functional units  Units that “operate” on data  These units need to be told what to do to the data  An instruction processing engine consists of two components  Datapath: Consists of hardware elements that deal with and transfrom data signals  functional units that operate on data  hardware structures (e.g., wires and muxes) that enable the flow of data into the functional units and registers  storage units that store data (e.g., registers)  Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data 12 Single-cycle vs. Multi-cycle: Control & Data  Single-cycle machine:  Control signals are generated in the same clock cycle as the one during which data signals are operated on  Everything related to an instruction happens in one clock cycle (serialized processing)  Multi-cycle machine:  Control signals needed in the next cycle can be generated in the current cycle  Latency of control processing can be overlapped with latency of datapath operation (more parallelism) 13 Many Ways of Datapath and Control Design  There are many ways of designing the data path and control logic  Single-cycle, multi-cycle, pipelined datapath and control  Single-bus vs. multi-bus datapaths  Hardwired/combinational vs. microcoded/microprogrammed control  Control signals generated by combinational logic versus  Control signals stored in a memory structure  Control signals and structure depend on the datapath design 14 Flash-Forward: Performance Analysis  Execution time of an instruction  {CPI} x {clock cycle time}  Execution time of a program  Sum over all instructions [ {CPI} x {clock cycle time}]  {# of instructions} x {Average CPI} x {clock cycle time}  Single cycle microarchitecture performance  CPI = 1  Clock cycle time = long  Multi-cycle microarchitecture performance  CPI = different for each instruction Now, we have  Average CPI  hopefully small two degrees of freedom to  Clock cycle time = short optimize independently 15 A Single-Cycle Microarchitecture  Let’s take a closer look at a single-cycle microarchitecture  It is easier to understand how we design the datapath and the control in a single-cycle microarchitecture  Remember a single cycle machine AS’ Sequential AS Combinational Logic Logic (State) 16 Let’s Start with the State Elements  Data and control inputs 5 Read register 1 Read 5 data 1 Read register 2 Registers PC 5 Write register Read Write data 2 data RegWrite MemWrite Instruction address Address Read data Instruction Write Data Instruction data memory memory MemRead 17 For Now, We Will Assume  “Magic” memory and register file  Combinational read  output of the read data port is a combinational function of the register file contents and the corresponding read select port  Synchronous write  the selected register is updated on the positive edge clock transition when write enable is asserted  Cannot affect read output in between clock edges  Single-cycle, synchronous memory  Contrast this with memory that tells when the data is ready  i.e., Ready bit: indicating the read or write is done 18 Instruction Processing  5 generic steps (based on our textbook)  Instruction fetch (IF)  Instruction decode and register operand fetch (ID/RF)  Execute/Evaluate memory address (EX/AG)  Memory operand fetch (MEM)  Store/writeback result (WB) WB IF Data Register # PC Address Instruction Registers ALU Address Register # Instruction memory ID/RF Data Register # EX/AG memory Data MEM 19 Instruction Processing (cont’d)  Fetch instruction  PC  instruction memory  Decode instruction  Read registers  Register numbers  register file  Depending on instruction class  Use ALU to calculate Arithmetic result  Memory address for load/store  Branch target address  Access data memory for load/store  PC  target address or PC + 4 20 Aside: A Big Picture Data a.out 1100101000111110000001 Code MIPS Processor Fetch Loader 21 What Is To Come: The Full MIPS Datapath PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read register 2 Zero bcond Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU ALU operation control Instruction [5– 0] JAL, JR, JALR omitted 22 Logic Design Basics  Information encoded in binary  Low voltage = 0, High voltage = 1  One wire per bit  Multi-bit data encoded on multi-wire buses  Datapath elements  Combinational element  State (sequential) elements 23 Combinational Elements  Elements that operate on data value  Output is a function of input (that is, same inputs  same output)  E.g., AND gate or ALU Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add Shift RegDst Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read register 2 Zero Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU control Instruction [5– 0] 24 State (sequential) Elements  Elements that contain states  E.g., register or memory  In general, the data value and clock are inputs  When the data is written  The output is the data value Shift Jump address [31– 0] left 2 26 28 that was written in an earlier 0 1 31– 28] M M u u x x ALU Add result 1 0 clock cycle, and it depends RegDst Jump Branch Shift left 2 MemRead 1– 26] on the both inputs and Control MemtoReg ALUOp MemWrite ALUSrc the internal state RegWrite 5– 21] Read register 1 Read 0– 16] data 1 Read register 2 Zero 0 Registers Read ALU ALU M Write data 2 0 Address Read result data 1 u register M u M 5– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 5– 0] Sign extend ALU control Instruction [5– 0] 25 Clocking Methodology  Edge-triggered clocking Writing at the sequential elements  Allows us to 1) read the contents of a register; 2) send the value through some combinational logic; 3) write that register in the same clock 26 Fetching Instructions  From now on, we will build a MIPS datapath incrementally  Fetch instruction 27 Fetching Instructions (cont’d)  Fetching instruction and incrementing PC Increment by 4 for next 32-bit instruction register 28 Single-Cycle Datapath for Arithmetic and Logical Instructions 29 R-Type ALU Instructions  Assembly (e.g., register-register signed addition) ADD rdreg rsreg rtreg  Machine encoding 0 rs rt rd 0 ADD R-type 6-bit 5-bit 5-bit 5-bit 5-bit 6-bit  Semantics if MEM[PC] == ADD rd rs rt GPR[rd]  GPR[rs] + GPR[rt] PC  PC + 4 30 ALU Datapath Add 4 ALU operation 25:21 Read 3 Read register 1 PC address Read 20:16 Read data 1 register 2 Zero Instruction Registers ALU ALU 15:11 Write result Instruction register Read memory data 2 Write data RegWrite 1 IF ID EX MEM WB if MEM[PC] == ADD rd rs rt GPR[rd]  GPR[rs] + GPR[rt] Combinational PCfrom **Based on original figure [P&HPC CO&D,+ 4 2004 Elsevier. ALL RIGHTS RESERVED.] COPYRIGHT state update logic 31 I-Type ALU Instructions  Assembly (e.g., register-immediate signed additions) ADDI rtreg rsreg immediate16  Machine encoding ADDI rs rt immediate I-type 6-bit 5-bit 5-bit 16-bit  Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt]  GPR[rs] + sign-extend (immediate) PC  PC + 4 32 Datapath for R and I-Type ALU Insts. Add 4 3 ALU operatio Read Read 25:21 PC register 1 address Read data 1 Read 20:16 Zero register 2 Instruction Registers ALU ALU 15:11 Write result Instruction register Read memory data 2 Write RegDest data isItype RegWrite ALUSrc 116 15:0 Sign 32 isItype extend IF ID EX MEM WB if MEM[PC] == ADDI rt rs immediate GPR[rt]  GPR[rs] + sign-extend (immediate) Combinational PC  PC + 4 state update logic 33 Single-Cycle Datapath for Data Movement Instruction 34 Load Instructions  Assembly (e.g., load 4-byte word) LW rtreg offset16 (basereg)  Machine encoding LW base rt offset I-type 6-bit 5-bit 5-bit 16-bit  Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt]  MEM[ translate(EA) ] PC  PC + 4 35 LW Datapath Add 0 4 add ALU operatio MemWrite Read 3 Read register 1 PC address Read data 1 Read register 2 Zero Address Read Instruction Registers ALU ALU data Write result Instruction register Read Data memory data 2 Write Write data memory data RegDest RegWrite isItype 116 ALUSrc MemRead Sign 32 isItype extend 1 if MEM[PC]==LW rt offset16 (base) IF ID EX MEM WB EA = sign-extend(offset) + GPR[base] GPR[rt]  MEM[ translate(EA) ] Combinational PC  PC + 4 state update logic 36 Store Instructions  Assembly (e.g., store 4-byte word) SW rtreg offset16 (basereg)  Machine encoding SW base rt offset I-type 6-bit 5-bit 5-bit 16-bit  Semantics if MEM[PC]==SW rt offset16 (base) EA = sign-extend(offset) + GPR[base] MEM[ translate(EA) ]  GPR[rt] PC  PC + 4 37 SW Datapath Add 1 4 add ALU operatio MemWrite Read 3 Read register 1 PC address Read data 1 Read register 2 Zero Address Read Instruction Registers ALU ALU data Write result Instruction register Read Data memory data 2 Write Write data memory data RegDest RegWrite isItype 016 ALUSrc MemRead Sign 32 isItype extend 0 if MEM[PC]==SW rt offset16 (base) IF ID EX MEM WB EA = sign-extend(offset) + GPR[base] MEM[ translate(EA) ]  GPR[rt] Combinational PC  PC + 4 state update logic 38 Load-Store Datapath Add 4 add Read 3 ALU operation isStore Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegDest RegWrite Write data isItype !isStore 16 32 ALUSrc Sign isItype MemRead extend isLoad 39 Load-Store Datapath (cont’d) Add 4 Read 3 ALU operation isStore Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegDest RegWrite Write data isItype !isStore 16 32 ALUSrc Sign isItype MemRead extend isLoad MemtoReg isLoad 40 Single-Cycle Datapath for Control Flow Instructions 41 Unconditional Jump Instructions  Assembly J immediate26  Machine encoding J immediate J-type 6-bit 26-bit  Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC  target 42 Unconditional Jump Datapath Add 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } 43 Unconditional Jump Datapath (cont’d) isJ Add PCSrc 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 concat Write Data data memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } 44 Unconditional Jump Datapath (cont’d) isJ Add PCSrc 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 concat Write Data data ? memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } What about JR, JAL? 45 Conditional Branch Instructions  Assembly (e.g., branch if equal) BEQ rsreg rtreg immediate16  Machine encoding BEQ rs rt immediate I-type 6-bit 5-bit 5-bit 16-bit  Semantics (assuming no branch delay slot) if MEM[PC]==BEQ rs rt immediate16 target = PC + 4 + sign-extend(immediate) x 4 if GPR[rs]==GPR[rt] then PC  target else PC  PC + 4 46 Conditional Branch Datapath watch out PC + 4 from instruction datapath Add PCSrc Add Sum Branch target 4 Shift left 2 PC Read address sub ALU operation Read 3 register 1 Read Instruction data 1 Read Instruction register 2 To branch memory Registers ALU bcond Zero concat Write control logic register Read data 2 Write data RegWrite 16 0 32 Sign extend 47 Putting It All Together PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read bcond register 2 Zero Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU operation ALU control Instruction [5– 0] JAL, JR omitted 48

MIPS Microarchitecture Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue