MIPS Microarchitecture Lecture Notes PDF
Document Details
Uploaded by CuteWatermelonTourmaline
KNU
Dohyung Kim
Tags
Summary
These lecture notes provide an introduction to MIPS microarchitecture, covering performance factors and implementation details for MIPS instructions. The document discusses the design of datapaths, control units, and different instruction types, like Arithmetic/Logic (R-Type, I-Type) and control flow (jumps, branches). It includes diagrams and illustrative examples.
Full Transcript
MIPS Microarchitecture 471029: Introduction to Computer Architecture 14th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU...
MIPS Microarchitecture 471029: Introduction to Computer Architecture 14th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 1 Questions Let’s assume we have an ISA How do we implement it? i.e., how do we design a system that obeys the hardware/software interface? 2 Introduction Performance factors Instruction count Determined by ISA and compiler CPI(Cycle Per Instruction) and Cycle time Determined by CPU hardware We will study two MIPS implementations A simplified version (aka. single-cycle uArch) A more realistic pipelined version We will illustrate the key principles used in creating a datapath and designing the control with a simple subset of the core MIPS instruction sets Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j 3 How Does a Machine Process Instructions? What does processing an instruction mean? Remember the von Neumann model AS = Architectural (programmer visible) state before an instruction is processed Process instruction AS’ = Architectural (programmer visible) state after an instruction is processed Processing an instruction: transforming AS to AS’ according to the ISA specification of the instruction you can see the ISA specification of MIPS on the green sheet 4 The “Process instruction” Step ISA specifies abstractly what AS’ should be, given an instruction and AS It defines an abstract finite state machine where State = programmer-visible state Next-state logic = instruction execution specification From ISA point of view, there are no “intermediate states” between AS and AS’ during instruction execution One state transition per instruction Microarchitecture implements how AS Is transformed to AS’ There are many choices in implementation We can have programmer-invisible state to optimize the speed of instruction execution: multiple state transitions per instruction Choice 1: AS AS’ (transform AS to AS’ in a single clock cycle) Choice 2: AS AS + MS1 AS + MS2 AS + MS3 AS’ (take multiple clock cycles to transform AS to AS’) 5 A Very Basic Instruction Processing Engine Each instruction takes a single clock cycle to execute Only combinational logic is used to implement instruction execution No intermediate, programmer-invisible state updates AS = Architectural (programmer visible) state at the beginning of a clock cycle Process instruction in one clock cycle AS’ = Architectural (programmer visible) state at the end of a clock cycle 6 A Very Basic Instruction Processing Engine (cont’d) Single-cycle machine AS’ Sequential AS Combinational Logic Logic (State) What is the clock cycle time determined by? What is the critical path of the combinational logic determined by? 7 Remember: Programmer Visible (Architectural) State M M M M Registers M - given special names in the ISA (as opposed to addresses) - general vs. special purpose M[N-1] Memory Program Counter array of storage locations memory address indexed by an address of the current instruction Instructions (and programs) specify how to transform the values of programmer 8 visible state 8 Single-cycle vs. Multi-cycle Machines Single-cycle machines Each instruction takes a single clock cycle All state updates made at the end of an instruction’s execution Big disadvantage: The slowest instruction determines cycle time long clock cycle time Multi-cycle machines Instruction processing broken into multiple cycles/stages State updates can be made during an instruction’s execution But architectural state updates made only at the end of an instruction’s execution Advantage over single-cycle: The slowest “stage” determines cycle times Both single-cycle and multi-cycle machines literally follow the von Neumann model at the microarchitectural level 9 Instruction Processing “Cycle” Instructions are processed under the direction of a “control unit” step by step Instruction cycle: Sequence of steps to process an instruction Fundamentally, there are six phases Fetch Decode Evaluate Address Fetch Operands Execute Store Result Not all instructions require all six stages 10 Instruction Processing “Cycle” vs. Machine Clock Cycle Single-cycle machine: All six phases of the instruction processing cycle takes a single machine clock cycle to complete Multi-cycle machine: All six phases of the instruction processing cycle can take multiple machine clock cycles to complete In fact, each phase can take multiple clock cycles to complete 11 Instruction Processing Viewed Another Way Instructions transform Data (AS) to Data’ (AS’) This transformation is done by functional units Units that “operate” on data These units need to be told what to do to the data An instruction processing engine consists of two components Datapath: Consists of hardware elements that deal with and transfrom data signals functional units that operate on data hardware structures (e.g., wires and muxes) that enable the flow of data into the functional units and registers storage units that store data (e.g., registers) Control logic: Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data 12 Single-cycle vs. Multi-cycle: Control & Data Single-cycle machine: Control signals are generated in the same clock cycle as the one during which data signals are operated on Everything related to an instruction happens in one clock cycle (serialized processing) Multi-cycle machine: Control signals needed in the next cycle can be generated in the current cycle Latency of control processing can be overlapped with latency of datapath operation (more parallelism) 13 Many Ways of Datapath and Control Design There are many ways of designing the data path and control logic Single-cycle, multi-cycle, pipelined datapath and control Single-bus vs. multi-bus datapaths Hardwired/combinational vs. microcoded/microprogrammed control Control signals generated by combinational logic versus Control signals stored in a memory structure Control signals and structure depend on the datapath design 14 Flash-Forward: Performance Analysis Execution time of an instruction {CPI} x {clock cycle time} Execution time of a program Sum over all instructions [ {CPI} x {clock cycle time}] {# of instructions} x {Average CPI} x {clock cycle time} Single cycle microarchitecture performance CPI = 1 Clock cycle time = long Multi-cycle microarchitecture performance CPI = different for each instruction Now, we have Average CPI hopefully small two degrees of freedom to Clock cycle time = short optimize independently 15 A Single-Cycle Microarchitecture Let’s take a closer look at a single-cycle microarchitecture It is easier to understand how we design the datapath and the control in a single-cycle microarchitecture Remember a single cycle machine AS’ Sequential AS Combinational Logic Logic (State) 16 Let’s Start with the State Elements Data and control inputs 5 Read register 1 Read 5 data 1 Read register 2 Registers PC 5 Write register Read Write data 2 data RegWrite MemWrite Instruction address Address Read data Instruction Write Data Instruction data memory memory MemRead 17 For Now, We Will Assume “Magic” memory and register file Combinational read output of the read data port is a combinational function of the register file contents and the corresponding read select port Synchronous write the selected register is updated on the positive edge clock transition when write enable is asserted Cannot affect read output in between clock edges Single-cycle, synchronous memory Contrast this with memory that tells when the data is ready i.e., Ready bit: indicating the read or write is done 18 Instruction Processing 5 generic steps (based on our textbook) Instruction fetch (IF) Instruction decode and register operand fetch (ID/RF) Execute/Evaluate memory address (EX/AG) Memory operand fetch (MEM) Store/writeback result (WB) WB IF Data Register # PC Address Instruction Registers ALU Address Register # Instruction memory ID/RF Data Register # EX/AG memory Data MEM 19 Instruction Processing (cont’d) Fetch instruction PC instruction memory Decode instruction Read registers Register numbers register file Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4 20 Aside: A Big Picture Data a.out 1100101000111110000001 Code MIPS Processor Fetch Loader 21 What Is To Come: The Full MIPS Datapath PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read register 2 Zero bcond Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU ALU operation control Instruction [5– 0] JAL, JR, JALR omitted 22 Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Datapath elements Combinational element State (sequential) elements 23 Combinational Elements Elements that operate on data value Output is a function of input (that is, same inputs same output) E.g., AND gate or ALU Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add Shift RegDst Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read register 2 Zero Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU control Instruction [5– 0] 24 State (sequential) Elements Elements that contain states E.g., register or memory In general, the data value and clock are inputs When the data is written The output is the data value Shift Jump address [31– 0] left 2 26 28 that was written in an earlier 0 1 31– 28] M M u u x x ALU Add result 1 0 clock cycle, and it depends RegDst Jump Branch Shift left 2 MemRead 1– 26] on the both inputs and Control MemtoReg ALUOp MemWrite ALUSrc the internal state RegWrite 5– 21] Read register 1 Read 0– 16] data 1 Read register 2 Zero 0 Registers Read ALU ALU M Write data 2 0 Address Read result data 1 u register M u M 5– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 5– 0] Sign extend ALU control Instruction [5– 0] 25 Clocking Methodology Edge-triggered clocking Writing at the sequential elements Allows us to 1) read the contents of a register; 2) send the value through some combinational logic; 3) write that register in the same clock 26 Fetching Instructions From now on, we will build a MIPS datapath incrementally Fetch instruction 27 Fetching Instructions (cont’d) Fetching instruction and incrementing PC Increment by 4 for next 32-bit instruction register 28 Single-Cycle Datapath for Arithmetic and Logical Instructions 29 R-Type ALU Instructions Assembly (e.g., register-register signed addition) ADD rdreg rsreg rtreg Machine encoding 0 rs rt rd 0 ADD R-type 6-bit 5-bit 5-bit 5-bit 5-bit 6-bit Semantics if MEM[PC] == ADD rd rs rt GPR[rd] GPR[rs] + GPR[rt] PC PC + 4 30 ALU Datapath Add 4 ALU operation 25:21 Read 3 Read register 1 PC address Read 20:16 Read data 1 register 2 Zero Instruction Registers ALU ALU 15:11 Write result Instruction register Read memory data 2 Write data RegWrite 1 IF ID EX MEM WB if MEM[PC] == ADD rd rs rt GPR[rd] GPR[rs] + GPR[rt] Combinational PCfrom **Based on original figure [P&HPC CO&D,+ 4 2004 Elsevier. ALL RIGHTS RESERVED.] COPYRIGHT state update logic 31 I-Type ALU Instructions Assembly (e.g., register-immediate signed additions) ADDI rtreg rsreg immediate16 Machine encoding ADDI rs rt immediate I-type 6-bit 5-bit 5-bit 16-bit Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-extend (immediate) PC PC + 4 32 Datapath for R and I-Type ALU Insts. Add 4 3 ALU operatio Read Read 25:21 PC register 1 address Read data 1 Read 20:16 Zero register 2 Instruction Registers ALU ALU 15:11 Write result Instruction register Read memory data 2 Write RegDest data isItype RegWrite ALUSrc 116 15:0 Sign 32 isItype extend IF ID EX MEM WB if MEM[PC] == ADDI rt rs immediate GPR[rt] GPR[rs] + sign-extend (immediate) Combinational PC PC + 4 state update logic 33 Single-Cycle Datapath for Data Movement Instruction 34 Load Instructions Assembly (e.g., load 4-byte word) LW rtreg offset16 (basereg) Machine encoding LW base rt offset I-type 6-bit 5-bit 5-bit 16-bit Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] MEM[ translate(EA) ] PC PC + 4 35 LW Datapath Add 0 4 add ALU operatio MemWrite Read 3 Read register 1 PC address Read data 1 Read register 2 Zero Address Read Instruction Registers ALU ALU data Write result Instruction register Read Data memory data 2 Write Write data memory data RegDest RegWrite isItype 116 ALUSrc MemRead Sign 32 isItype extend 1 if MEM[PC]==LW rt offset16 (base) IF ID EX MEM WB EA = sign-extend(offset) + GPR[base] GPR[rt] MEM[ translate(EA) ] Combinational PC PC + 4 state update logic 36 Store Instructions Assembly (e.g., store 4-byte word) SW rtreg offset16 (basereg) Machine encoding SW base rt offset I-type 6-bit 5-bit 5-bit 16-bit Semantics if MEM[PC]==SW rt offset16 (base) EA = sign-extend(offset) + GPR[base] MEM[ translate(EA) ] GPR[rt] PC PC + 4 37 SW Datapath Add 1 4 add ALU operatio MemWrite Read 3 Read register 1 PC address Read data 1 Read register 2 Zero Address Read Instruction Registers ALU ALU data Write result Instruction register Read Data memory data 2 Write Write data memory data RegDest RegWrite isItype 016 ALUSrc MemRead Sign 32 isItype extend 0 if MEM[PC]==SW rt offset16 (base) IF ID EX MEM WB EA = sign-extend(offset) + GPR[base] MEM[ translate(EA) ] GPR[rt] Combinational PC PC + 4 state update logic 38 Load-Store Datapath Add 4 add Read 3 ALU operation isStore Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegDest RegWrite Write data isItype !isStore 16 32 ALUSrc Sign isItype MemRead extend isLoad 39 Load-Store Datapath (cont’d) Add 4 Read 3 ALU operation isStore Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegDest RegWrite Write data isItype !isStore 16 32 ALUSrc Sign isItype MemRead extend isLoad MemtoReg isLoad 40 Single-Cycle Datapath for Control Flow Instructions 41 Unconditional Jump Instructions Assembly J immediate26 Machine encoding J immediate J-type 6-bit 26-bit Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC target 42 Unconditional Jump Datapath Add 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 Write Data data memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } 43 Unconditional Jump Datapath (cont’d) isJ Add PCSrc 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 concat Write Data data memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } 44 Unconditional Jump Datapath (cont’d) isJ Add PCSrc 4 XALU operation Read 3 0 Read register 1 MemWrite PC address Read data 1 Read register 2 Zero Instruction Registers ALU ALU Write Read result Address Instruction register data Read memory data 2 concat Write Data data ? memory RegWrite Write data ALUSrc 0 16 32 Sign X MemRead extend 0 if MEM[PC]==J immediate26 PC = { PC[31:28], immediate26, 2’b00 } What about JR, JAL? 45 Conditional Branch Instructions Assembly (e.g., branch if equal) BEQ rsreg rtreg immediate16 Machine encoding BEQ rs rt immediate I-type 6-bit 5-bit 5-bit 16-bit Semantics (assuming no branch delay slot) if MEM[PC]==BEQ rs rt immediate16 target = PC + 4 + sign-extend(immediate) x 4 if GPR[rs]==GPR[rt] then PC target else PC PC + 4 46 Conditional Branch Datapath watch out PC + 4 from instruction datapath Add PCSrc Add Sum Branch target 4 Shift left 2 PC Read address sub ALU operation Read 3 register 1 Read Instruction data 1 Read Instruction register 2 To branch memory Registers ALU bcond Zero concat Write control logic register Read data 2 Write data RegWrite 16 0 32 Sign extend 47 Putting It All Together PCSrc1=Jump Instruction [25– 0] Shift Jump address [31– 0] left 2 26 28 0 1 PC+4 [31– 28] M M u u x x ALU Add result 1 0 Add RegDst Shift PCSrc2=Br Taken Jump left 2 4 Branch MemRead Instruction [31– 26] Control MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [25– 21] Read Read register 1 PC address Read Instruction [20– 16] data 1 Read bcond register 2 Zero Instruction 0 Registers Read ALU ALU [31– 0] 0 Read M Write data 2 result Address 1 Instruction u register M data u M memory Instruction [15– 11] x u 1 Write x Data data x 1 memory 0 Write data 16 32 Instruction [15– 0] Sign extend ALU operation ALU control Instruction [5– 0] JAL, JR omitted 48