MIPS Pipelining Lecture Notes PDF
Document Details
Uploaded by CuteWatermelonTourmaline
KNU
Tags
Summary
These lecture notes cover MIPS pipelining, discussing microprogramming, microcode, and instruction processing. The notes are mainly based on a textbook and include examples of pipeline operation. They also include a laundry analogy to illustrate the concept.
Full Transcript
The Power of Abstraction ¢ The concept of a control store of microinstructions enables the hardware designer with a new abstraction: microprogramming ¢ The designer can translate any desired operation to a sequence of microinstructions ¢ All the designer need to provide is § The s...
The Power of Abstraction ¢ The concept of a control store of microinstructions enables the hardware designer with a new abstraction: microprogramming ¢ The designer can translate any desired operation to a sequence of microinstructions ¢ All the designer need to provide is § The sequence of microinstructions needed to implement the desired operation § The ability for the control logic to correctly sequence through the microinstructions § Any additional datapath elements and control signals needed (no need if the operation can be “translated” into existing control signals) 1 Advantages of Microprogrammed Control ¢ Allows a very simple design to do powerful computation by controlling the datapath (using a sequencer) § High-level ISA translated into microcode (sequence of u-instructions) § Microcode (u-code) enables a minimal datapath to emulate an ISA § Microinstructions can be thought of as a user-invisible ISA (u-ISA) ¢ Enables easy extensibility of the ISA § Can support a new instruction by changing the microcode § Can support complex instructions as a sequence of simple microinstructions ¢ Enables update of machine behavior § A buggy implementation of an instruction can be fixed by chaning the microcode in the field § See next slide 2 Update of Machine Behavior ¢ The ability to update/patch microcode in the field (after a processor is shipped ) enables § Ability to add new instructions without changing the processor! ¢ Example § IBM 370 Model 145: microcode stored in main memory, can be updated after a reboot § IBM System z: Simiar to 370/145 § B1700 microcde can be updated while the processor is running § User-microprogrammable machine! (of course, to superviser) 3 Update of Machine Behavior (cont’d) ¢ Example on modern CPUs 4 Update of Machine Behavior (cont’d) ¢ Example in terms of system security 5 New Attack Surfaces on ISA/uArch/uCode ¢ Breaking the x86 instruction set, Blackhat 2017 ¢ Reverse-engineering x86 Processor Microcode, Philipp Koppe et al., USENIX Security 2017 ¢ An Exploratory Analysis of Microcode as a Building Block for System Defenses, Benjamin Kollenda et al, CCS 2018 ¢ CHEx86: Context-Sensitive Enforcement of Memory Safety via Microcode-Enabled Capabilities, Rasool Sharifi, ISCA 2020 ¢ On the Design and Misuse of Microcoded (Embedded) Processors — A Cautionary Note, Nils Albartus et al, USENIX Security 2021 ¢ …. 6 Pipelining 471029: Introduction to Computer Architecture 18th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 7 Can We Do Better? ¢ We have discussed pros and cons between single-cycle and multi-cycle (also microprogrammed vs hardwired control) ¢ What limitations do you see with the multi-cycle design? ¢ Limited concurrency § Some hardware resources are idle during different phases of instruction processing cycle § E.g., “Fetch” logic is idle when an instruction is being “decoded” or C.4. THE CONTROL STRUCTURE 7 “executed” GateMARMUX GatePC 16 16 16 16 § E.g., Most of the datapath is idle MARMUX LD.PC 2 PC +2 PCMUX when a memory access is happening 16 16 REG FILE 3 + LD.REG DR 3 SR2 SR1 3 SR2 OUT OUT SR1 ZEXT & LSHF1 LSHF1 ADDR1MUX 16 [7:0] 2 ADDR2MUX 16 16 16 16 16 16 16 16 [10:0] 0 16 SEXT [8:0] SR2MUX SEXT [5:0] SEXT CONTROL [4:0] SEXT R LD.IR IR 2 B A 6 LD.CC N Z P SHF IR[5:0] 16 ALU ALUK LOGIC 16 16 16 GateALU GateSHF 8 GateMDR Can We Use the Idle Hardware to Improve Concurrency? ¢ Goal: More concurrency à higher instruction throughput (i.e., more “work” completed in one cycle) ¢ Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction § E.g., when an instruction is being decoded, fetch the next instruction § E.g., when an instruction is being executed, decode another instruction § E.g., when an instruction is accessing data memory (ld/st), execute the next instruction § E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 9 Pipelining: Basic Idea ¢ More systematically: § Pipeline the execution of multiple instructions § Analogy: “Assembly line processing” of instructions ¢ Idea: § Divide the instruction processing cycle into distict “stages” of processing § Ensure there are enough hardware resources to process one instruction in each stage, of course § Process a different instruction in each stage § Instructions consecutive in program order are processed in consecutive stages ¢ Benefit: Increases instruction processing throughput (1/CPI) ¢ Downside: Start thinking about this… 10 Example: Execution of Four Independent ADDs ¢ Multi-cycle: 4 cycles per instruction F D E W F D E W F D E W F D E W Time ¢ Pipelined: 4 cycles per 4 instructions (steady state) F D E W F D E W Is life always this beautiful? F D E W F D E W Time 11 The Laundry Analogy 6 PM 7 8 9 10 11 12 1 2 AM Time Task order A B C D ¢ “place on dirty load of clothes in the washer” ¢ “when the washer is finished, place the wet load in the dryer” ¢ “whenTime the 6 PM dryer 7 is finished, 8 9 take 10 out 11 the 12 dry1 load2 AMand fold” ¢ “whenTaskfolding is finished, ask your roommate (??) to put the clothes away” order A - steps to do a load are sequentially dependent B - no dependence between different loads C - different steps do not share resources D 12 Pipelining Multiple Loads of Laundry Time Task 6 PM 7 8 9 10 11 12 1 2 AM order 6 PM 7 8 9 10 11 12 1 2 AM TimeA Task B order A C D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM Time A - 4 loads of laundry in parallel Task order B - no additional resources A C - throughput increased by 4 B D (after steady state) C - latency per load is the same D 13 6 PM 7 8 9 10 11 12 1 2 AM Pipelining Multiple Loads of Laundry: In Practice Time Task order A 6 PM 7 8 9 10 11 12 1 2 AM Time B Task order C A D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM TimeA Task B order C A D B C the slowest step decides throughput D 14 6 PM 7 8 9 10 11 12 1 2 AM Pipelining Multiple Loads of Laundry: In Practice Time Task order A 6 PM 7 8 9 10 11 12 1 2 AM Time Task B order C A D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM Time A A Task B order B C A A B D B C throughput restored (2 loads per hour) using 2 dryers D 15 An Ideal Pipeline ¢ Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) ¢ Repetition of identical operations § The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through exact the same steps) ¢ Repetition of independent operations § No dependencies between repeated operations ¢ Uniformly partitionable suboperations § Processing can be evenly divided into uniform-latency suboperations (that do not share resources) ¢ Fitting examples: automobile assembly line, doing laundry § What about the instruction processing “cycle”? 16 Ideal Pipelining combinational logic (F,D,E,M,W) BW=~(1/T) T psec T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 T/3 T/3 BW=~(3/T) ps (F,D) ps (E,M) ps (M,W) 17 More Realistic Pipeline: Cost ¢ Nonpipelined version with combinational cost G Cost = G + L where L = latch cost G gates ¢ k-stage pipelined version Costk–stage = G + Lk Latches increase hardware cost G/k G/k ¢ Intel Penryn(12-14), Nehalem(20-24), Kaby lake (14) ¢ ARM Cortex-R5(8), -R7(11), -A8(13), -A15(15-24), 18