MIPS Pipelining Lecture Notes PDF

The Power of Abstraction ¢ The concept of a control store of microinstructions enables the hardware designer with a new abstraction: microprogramming ¢ The designer can translate any desired operation to a sequence of microinstructions ¢ All the designer need to provide is § The sequence of microinstructions needed to implement the desired operation § The ability for the control logic to correctly sequence through the microinstructions § Any additional datapath elements and control signals needed (no need if the operation can be “translated” into existing control signals) 1 Advantages of Microprogrammed Control ¢ Allows a very simple design to do powerful computation by controlling the datapath (using a sequencer) § High-level ISA translated into microcode (sequence of u-instructions) § Microcode (u-code) enables a minimal datapath to emulate an ISA § Microinstructions can be thought of as a user-invisible ISA (u-ISA) ¢ Enables easy extensibility of the ISA § Can support a new instruction by changing the microcode § Can support complex instructions as a sequence of simple microinstructions ¢ Enables update of machine behavior § A buggy implementation of an instruction can be fixed by chaning the microcode in the field § See next slide 2 Update of Machine Behavior ¢ The ability to update/patch microcode in the field (after a processor is shipped ) enables § Ability to add new instructions without changing the processor! ¢ Example § IBM 370 Model 145: microcode stored in main memory, can be updated after a reboot § IBM System z: Simiar to 370/145 § B1700 microcde can be updated while the processor is running § User-microprogrammable machine! (of course, to superviser) 3 Update of Machine Behavior (cont’d) ¢ Example on modern CPUs 4 Update of Machine Behavior (cont’d) ¢ Example in terms of system security 5 New Attack Surfaces on ISA/uArch/uCode ¢ Breaking the x86 instruction set, Blackhat 2017 ¢ Reverse-engineering x86 Processor Microcode, Philipp Koppe et al., USENIX Security 2017 ¢ An Exploratory Analysis of Microcode as a Building Block for System Defenses, Benjamin Kollenda et al, CCS 2018 ¢ CHEx86: Context-Sensitive Enforcement of Memory Safety via Microcode-Enabled Capabilities, Rasool Sharifi, ISCA 2020 ¢ On the Design and Misuse of Microcoded (Embedded) Processors — A Cautionary Note, Nils Albartus et al, USENIX Security 2021 ¢ …. 6 Pipelining 471029: Introduction to Computer Architecture 18th Lecture Disclaimer: Slides are mainly based on COD 5th textbook and also developed in part by Profs. Dohyung Kim @ KNU and Computer architecture course @ KAIST and SKKU 7 Can We Do Better? ¢ We have discussed pros and cons between single-cycle and multi-cycle (also microprogrammed vs hardwired control) ¢ What limitations do you see with the multi-cycle design? ¢ Limited concurrency § Some hardware resources are idle during different phases of instruction processing cycle § E.g., “Fetch” logic is idle when an instruction is being “decoded” or C.4. THE CONTROL STRUCTURE 7 “executed” GateMARMUX GatePC 16 16 16 16 § E.g., Most of the datapath is idle MARMUX LD.PC 2 PC +2 PCMUX when a memory access is happening 16 16 REG FILE 3 + LD.REG DR 3 SR2 SR1 3 SR2 OUT OUT SR1 ZEXT & LSHF1 LSHF1 ADDR1MUX 16 [7:0] 2 ADDR2MUX 16 16 16 16 16 16 16 16 [10:0] 0 16 SEXT [8:0] SR2MUX SEXT [5:0] SEXT CONTROL [4:0] SEXT R LD.IR IR 2 B A 6 LD.CC N Z P SHF IR[5:0] 16 ALU ALUK LOGIC 16 16 16 GateALU GateSHF 8 GateMDR Can We Use the Idle Hardware to Improve Concurrency? ¢ Goal: More concurrency à higher instruction throughput (i.e., more “work” completed in one cycle) ¢ Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction § E.g., when an instruction is being decoded, fetch the next instruction § E.g., when an instruction is being executed, decode another instruction § E.g., when an instruction is accessing data memory (ld/st), execute the next instruction § E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 9 Pipelining: Basic Idea ¢ More systematically: § Pipeline the execution of multiple instructions § Analogy: “Assembly line processing” of instructions ¢ Idea: § Divide the instruction processing cycle into distict “stages” of processing § Ensure there are enough hardware resources to process one instruction in each stage, of course § Process a different instruction in each stage § Instructions consecutive in program order are processed in consecutive stages ¢ Benefit: Increases instruction processing throughput (1/CPI) ¢ Downside: Start thinking about this… 10 Example: Execution of Four Independent ADDs ¢ Multi-cycle: 4 cycles per instruction F D E W F D E W F D E W F D E W Time ¢ Pipelined: 4 cycles per 4 instructions (steady state) F D E W F D E W Is life always this beautiful? F D E W F D E W Time 11 The Laundry Analogy 6 PM 7 8 9 10 11 12 1 2 AM Time Task order A B C D ¢ “place on dirty load of clothes in the washer” ¢ “when the washer is finished, place the wet load in the dryer” ¢ “whenTime the 6 PM dryer 7 is finished, 8 9 take 10 out 11 the 12 dry1 load2 AMand fold” ¢ “whenTaskfolding is finished, ask your roommate (??) to put the clothes away” order A - steps to do a load are sequentially dependent B - no dependence between different loads C - different steps do not share resources D 12 Pipelining Multiple Loads of Laundry Time Task 6 PM 7 8 9 10 11 12 1 2 AM order 6 PM 7 8 9 10 11 12 1 2 AM TimeA Task B order A C D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM Time A - 4 loads of laundry in parallel Task order B - no additional resources A C - throughput increased by 4 B D (after steady state) C - latency per load is the same D 13 6 PM 7 8 9 10 11 12 1 2 AM Pipelining Multiple Loads of Laundry: In Practice Time Task order A 6 PM 7 8 9 10 11 12 1 2 AM Time B Task order C A D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM TimeA Task B order C A D B C the slowest step decides throughput D 14 6 PM 7 8 9 10 11 12 1 2 AM Pipelining Multiple Loads of Laundry: In Practice Time Task order A 6 PM 7 8 9 10 11 12 1 2 AM Time Task B order C A D B C D 6 PM 7 8 9 10 11 12 1 2 AM Time Task order 6 PM 7 8 9 10 11 12 1 2 AM Time A A Task B order B C A A B D B C throughput restored (2 loads per hour) using 2 dryers D 15 An Ideal Pipeline ¢ Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) ¢ Repetition of identical operations § The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through exact the same steps) ¢ Repetition of independent operations § No dependencies between repeated operations ¢ Uniformly partitionable suboperations § Processing can be evenly divided into uniform-latency suboperations (that do not share resources) ¢ Fitting examples: automobile assembly line, doing laundry § What about the instruction processing “cycle”? 16 Ideal Pipelining combinational logic (F,D,E,M,W) BW=~(1/T) T psec T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 T/3 T/3 BW=~(3/T) ps (F,D) ps (E,M) ps (M,W) 17 More Realistic Pipeline: Cost ¢ Nonpipelined version with combinational cost G Cost = G + L where L = latch cost G gates ¢ k-stage pipelined version Costk–stage = G + Lk Latches increase hardware cost G/k G/k ¢ Intel Penryn(12-14), Nehalem(20-24), Kaby lake (14) ¢ ARM Cortex-R5(8), -R7(11), -A8(13), -A15(15-24), 18

MIPS Pipelining Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript