Podcast
Questions and Answers
Which of the following best describes a pipelined processor?
Which of the following best describes a pipelined processor?
- Multiple instructions are broken up into a series of steps and executed concurrently. (correct)
- Each instruction executes in a single clock cycle.
- Instructions are executed sequentially, one after another.
- Each instruction is broken up into a series of shorter steps, but only one instruction executes at a time.
In the context of pipelined processors, what is a 'hazard'?
In the context of pipelined processors, what is a 'hazard'?
- A technique used to optimize the instruction fetch stage.
- A method for predicting branch outcomes.
- A situation that prevents the next instruction in the instruction stream from executing during its designated clock cycle. (correct)
- A condition that allows instructions to execute out of order.
What is the primary goal of pipelining in processor design?
What is the primary goal of pipelining in processor design?
- To simplify the instruction set architecture.
- To increase the clock frequency.
- To reduce the latency of individual instructions.
- To improve processor throughput by allowing multiple instructions to be processed concurrently. (correct)
Which of the following is a type of data hazard?
Which of the following is a type of data hazard?
What is 'data forwarding' (also known as bypassing) used for in a pipelined processor?
What is 'data forwarding' (also known as bypassing) used for in a pipelined processor?
What is 'stalling' in the context of pipeline hazards?
What is 'stalling' in the context of pipeline hazards?
What is the role of 'pipeline registers' in a pipelined processor?
What is the role of 'pipeline registers' in a pipelined processor?
Which stage typically calculates the target address for a branch instruction?
Which stage typically calculates the target address for a branch instruction?
What is a 'control hazard' in a pipelined processor?
What is a 'control hazard' in a pipelined processor?
What is branch prediction used for?
What is branch prediction used for?
What is the purpose of inserting 'nops' (no-operation instructions) into the code?
What is the purpose of inserting 'nops' (no-operation instructions) into the code?
Why is the Decode stage often a critical bottleneck in a pipelined processor?
Why is the Decode stage often a critical bottleneck in a pipelined processor?
What is the role of the 'hazard detection unit' in a pipelined processor?
What is the role of the 'hazard detection unit' in a pipelined processor?
In a pipelined processor, what does 'flushing' the pipeline refer to?
In a pipelined processor, what does 'flushing' the pipeline refer to?
Which of the following is most directly improved by using a pipelined architecture, compared to a single-cycle architecture?
Which of the following is most directly improved by using a pipelined architecture, compared to a single-cycle architecture?
Consider a pipelined processor with 5 stages. Ideally, what is the CPI (Cycles Per Instruction)?
Consider a pipelined processor with 5 stages. Ideally, what is the CPI (Cycles Per Instruction)?
For a program with 100 billion instruction where CPI = 1.15 and Tc = 550 ps, what is the execution time?
For a program with 100 billion instruction where CPI = 1.15 and Tc = 550 ps, what is the execution time?
Why does the ideal speedup of a pipelined processor is not achieved in real world?
Why does the ideal speedup of a pipelined processor is not achieved in real world?
Which type of hazard is resolved by branch prediction:
Which type of hazard is resolved by branch prediction:
In the context of Instruction Level Parallelism (ILP), which of the following is a technique used by ""Out of Order"" processors to deal with dependencies?
In the context of Instruction Level Parallelism (ILP), which of the following is a technique used by ""Out of Order"" processors to deal with dependencies?
In a laundry analogy to the pipeline, what is the data hazard?
In a laundry analogy to the pipeline, what is the data hazard?
In a laundry analogy to the pipeline, what is the structural hazard?
In a laundry analogy to the pipeline, what is the structural hazard?
Which of the following is NOT a stage for MIPS?
Which of the following is NOT a stage for MIPS?
If we want throughput to improves, what do we need to do?
If we want throughput to improves, what do we need to do?
How many ARM processors does ARM9E have? Look at evolution table.
How many ARM processors does ARM9E have? Look at evolution table.
If EX/MEM.RegisterRd 0 and EX/MEM.RegisterRd = ID/EX.RegisterRs, what type of forwarding will it be?
If EX/MEM.RegisterRd 0 and EX/MEM.RegisterRd = ID/EX.RegisterRs, what type of forwarding will it be?
What should happens if one branch occurs when determining hazards?
What should happens if one branch occurs when determining hazards?
Which of the following is not of the type of hazard:
Which of the following is not of the type of hazard:
Which of the following is not what we can do to handle data hazards?
Which of the following is not what we can do to handle data hazards?
When does the pipeline occurs?
When does the pipeline occurs?
Select all write types when Out of Order Processors runs:
Select all write types when Out of Order Processors runs:
Why learn pipeline?
Why learn pipeline?
Choose the single cycle parameter
Choose the single cycle parameter
What does hazard unit detects?
What does hazard unit detects?
What affects the overall datapath?
What affects the overall datapath?
Which step does the equality comparator fall be moved back into in beq control hazards?
Which step does the equality comparator fall be moved back into in beq control hazards?
When the EX stage destination register matches the next instruction's source register, what is it called?
When the EX stage destination register matches the next instruction's source register, what is it called?
What does Single cycle execution time mean
What does Single cycle execution time mean
What does it mean when cycles per instruction is a clock cycle of 1?
What does it mean when cycles per instruction is a clock cycle of 1?
Why is it crucial to learn about pipelining in processor design?
Why is it crucial to learn about pipelining in processor design?
In the context of pipelined processors, what is the purpose of data forwarding?
In the context of pipelined processors, what is the purpose of data forwarding?
What is the primary limitation of a single-cycle processor that pipelining aims to overcome?
What is the primary limitation of a single-cycle processor that pipelining aims to overcome?
Which of the following is NOT a common method for handling data hazards in pipelined processors?
Which of the following is NOT a common method for handling data hazards in pipelined processors?
What is the primary goal of Tomasulo's algorithm in out-of-order execution?
What is the primary goal of Tomasulo's algorithm in out-of-order execution?
Consider a pipelined processor. If an instruction in the Execute stage needs data that is only available in the Write Back stage of a previous instruction, what is the typical solution to resolve this data dependency?
Consider a pipelined processor. If an instruction in the Execute stage needs data that is only available in the Write Back stage of a previous instruction, what is the typical solution to resolve this data dependency?
In a pipelined processor, what is the purpose of the Hazard Detection Unit?
In a pipelined processor, what is the purpose of the Hazard Detection Unit?
How does increasing the number of pipeline stages typically affect the clock frequency of a processor?
How does increasing the number of pipeline stages typically affect the clock frequency of a processor?
What is a structural hazard in a pipelined processor?
What is a structural hazard in a pipelined processor?
In the laundry analogy for pipelining, which activity corresponds to 'Write Back' stage?
In the laundry analogy for pipelining, which activity corresponds to 'Write Back' stage?
Given a program running on a pipelined processor experiences both data and control hazards, which of the following is generally true regarding the processor's CPI (Cycles Per Instruction)?
Given a program running on a pipelined processor experiences both data and control hazards, which of the following is generally true regarding the processor's CPI (Cycles Per Instruction)?
What is the role of 'nops' (no-operation instructions) in the code?
What is the role of 'nops' (no-operation instructions) in the code?
What happens when a branch occurs when determining hazards?
What happens when a branch occurs when determining hazards?
What are the three types of dependencies?
What are the three types of dependencies?
Flashcards
Single-Cycle Processor
Single-Cycle Processor
Each instruction executes in a single clock cycle.
Multicycle Processor
Multicycle Processor
Each instruction is divided into a series of shorter steps.
Pipelined Processor
Pipelined Processor
Each instruction is broken into a series of steps; multiple instructions execute at once.
Program Execution Time
Program Execution Time
Signup and view all the flashcards
Single-cycle Performance Limit
Single-cycle Performance Limit
Signup and view all the flashcards
Pipelining
Pipelining
Signup and view all the flashcards
Pipelined Processor Stages
Pipelined Processor Stages
Signup and view all the flashcards
Temporal parallelism
Temporal parallelism
Signup and view all the flashcards
Major Pipeline Components
Major Pipeline Components
Signup and view all the flashcards
IF (Fetch) in Laundry Analogy
IF (Fetch) in Laundry Analogy
Signup and view all the flashcards
ID (Decode) in Laundry Analogy
ID (Decode) in Laundry Analogy
Signup and view all the flashcards
EX (Execute) in Laundry Analogy
EX (Execute) in Laundry Analogy
Signup and view all the flashcards
MEM (Memory Access) in Laundry Analogy
MEM (Memory Access) in Laundry Analogy
Signup and view all the flashcards
WB (Write Back) in Laundry Analogy
WB (Write Back) in Laundry Analogy
Signup and view all the flashcards
Pipeline Hazard
Pipeline Hazard
Signup and view all the flashcards
Structural Hazards
Structural Hazards
Signup and view all the flashcards
Data Hazards
Data Hazards
Signup and view all the flashcards
Control Hazards
Control Hazards
Signup and view all the flashcards
Data Hazard
Data Hazard
Signup and view all the flashcards
Read After Write(RAW) hazard
Read After Write(RAW) hazard
Signup and view all the flashcards
Handling Data Hazards
Handling Data Hazards
Signup and view all the flashcards
Data Forwarding
Data Forwarding
Signup and view all the flashcards
Stalling
Stalling
Signup and view all the flashcards
Compile-Time Hazard Elimination
Compile-Time Hazard Elimination
Signup and view all the flashcards
beq in Control Hazards
beq in Control Hazards
Signup and view all the flashcards
Corrected Pipelined Datapath
Corrected Pipelined Datapath
Signup and view all the flashcards
Pipelined Control
Pipelined Control
Signup and view all the flashcards
EX/MEM.RegisterRd field
EX/MEM.RegisterRd field
Signup and view all the flashcards
beq instruction (Appendix)
beq instruction (Appendix)
Signup and view all the flashcards
Data Forwarding (Appendix)
Data Forwarding (Appendix)
Signup and view all the flashcards
Stalling (Appendix)
Stalling (Appendix)
Signup and view all the flashcards
Stalling Hardware (Appendix)
Stalling Hardware (Appendix)
Signup and view all the flashcards
Out of Order Processor
Out of Order Processor
Signup and view all the flashcards
Instruction level What
Instruction level What
Signup and view all the flashcards
Study Notes
Pipeline Processor
- Instructions execute in a single cycle in a single-cycle processor.
- Instructions are broken into shorter steps in a multicycle processor.
- Instructions are broken into a series of steps, executing multiple instructions at once in a pipelined processor.
Topics Overview
- A review covered slides 3-5 focusing on single-cycle and multi-cycle processors.
- Pipelining is taught using a laundry analogy.
- The five stages of a pipelined processor are described.
- Pipeline hazards are covered including types, handling, data forwarding, stalling, and control hazards.
- Topics cover pipeline control, hazard detection, full hazard handling, and performance.
Processor Performance
- Program Execution Time = (# instructions) * (cycles/instruction) * (seconds/cycle).
- The formula can also be expressed as: # instructions * CPI * Tc.
Single-Cycle Processor
- Tc is limited by the critical path.
Multi-Cycle MIPS Processor
- Multicycle processor diagrams are shown which details components.
Reasons for Pipelining
- Improves processor throughput.
- Increases processor performance.
- Increases clock frequency or reduce clock cycle time.
Laundry Analogy for Pipelining
- Analogy includes Ann, Brian, Cathy, and Dave each doing laundry (wash, dry, fold).
- Each task (washer, dryer, folder, stasher) takes 30 minutes.
- Sequential laundry takes 8 hours for 4 loads.
- Pipelining overlaps tasks and reduces it to 3.5 hours for 4 loads and a latency of 2 hours.
- Throughput improves by a factor of 2.3 but decreases for more loads.
Laundry Analogy Stages and Key Parallelism
- IF (Fetch): Dirty clothes are loaded (instruction fetched from memory).
- ID (Decode): Sorting clothes by color/fabric (decoding opcode/registers).
- EX (Execute): Washing (ALU performs arithmetic/logic operations).
- MEM (Memory Access): Drying (loading/storing data from/to memory).
- WB (Write Back): Folding clean clothes (writing results to registers).
- Key Parallelism is achieved when Load #1 is drying, Load #2 is washing, and Load #3 is being sorted.
- Throughput: 5 loads finish in ~7 hours (vs. 25 hours sequentially)
Hazards Exampled
- Data Hazard: Load #2 needs a shirt still in Load #1's washer, causing a Stall (bubble).
- Structural Hazard: only one dryer, so Load #3 waits for Load #2 to finish.
Pipelined MIPS Processor
- Utilizes temporal parallelism.
- Divides a single-cycle processor into 5 stages.
- 5 Stages: Fetch, Decode, Execute, Memory, Write back.
- Pipeline registers added between stages.
- With pipelining, five instructions can run simultaneously, one in each stage.
- Because each stage has only one-fifth of the entire logic, the clock frequency is almost five times faster.
- The latency of each instruction remains ideally unchanged but the throughput is ideally five times better.
Pipelining Abstraction Points
- Assume the register file is written in the first part of a cycle and read in the second part.
- Major components: IM (instruction memory), RF (register file), DM (data memory).
Pipelined Datapath Stages
- The single cycle is chopped into five stages separated by pipeline registers.
- Each stage is doing a different instruction.
- The stages and their boundaries are indicated in blue.
- Signals are given a suffix (F,D,E, M or W) to indicate the stage in which they reside.
Corrected Pipelined Datapath
- All signals for an instruction must advance through the pipeline in unison.
- WriteReg must arrive at the same time as Result.
- The WriteReg signal is pipelined along through the Memory and Writeback stages, and remains synchronized with the instruction.
Pipelined Control Unit
- Combinational logic determines control, and data and control are staged along the pipeline.
- Use similar control unit as a single-cycle processor
- However the control must be delayed to the proper pipeline stage.
Pipeline Hazards
- When an instruction depends on results from a previous instruction that hasn't finished.
- Data hazard: register value not written back to register file yet
- Control hazard: next instruction not decided yet (caused by branches)
Pipeline Hazards can be:
- Structural Hazards which attempt to use the same resource by two instructions at the same time.
- Data Hazards, which attempt to use data before it is ready.
- Control Hazards attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address.
- Hazards are resolved by waiting, detecting the hazard by the pipeline control and actions taken to resolve them.
Data Hazards
- The second instruction reads $s0 on cycle 3 before the new value of $s0 has been produced and causes it to have the wrong value.
- Read After Write(RAW) hazard: one instruction writes a register and subsequent instructions read this register.
- Data hazard can be detected by checking if the current instruction's destination register is the same as the next instruction's source register.
Handling Data Hazards
- Forward data at run time
- Insert nops in code at compile time
- Rearrange code at compile time
- Stall the processor at run time
Data Forwarding Technique
- Solves data hazards by forwarding a result to a dependent instruction in the Execute stage.
- Requires adding multiplexers in front of the ALU.
- Forwarding is neeeded when an instruction in the Execute stage has a source register matching the destination register of an instruction in the Memory or Writeback stage.
Stalling
- Stalling is when the pipeline operation is held until the data is available.
- "lw" doesn't finish reading data until the end of Mem stage, the result can not be forwarded to the Execute stage of the next instruction, so it needs to stall.
Compile-Time Hazard Elimination
- Insert enough NOP instructions to allow for the result to be ready.
- Or move independent instructions forward.
Control Hazards occur with beq
- branch is not determined until the fourth stage of the pipeline
- Instructions after the branch are fetched before branch occurs, these instructions must be flushed if the branch happens
- Branch misprediction penalty relates to the number of instruction flushed when the branch is taken and may be reduced by determining the branch earlier.
Detecting Hazards
- An instruction must resolve destination register is the same as the next instruction's source register
- Note the the EX/MEM.RegisterRd field is the register destination for either an ALU instruction (which comes from the Rd field of the instruction) or a load (which comes from the Rt field).
Pipelined Processor with Full Hazard Handling
- Processors contain components which handle full hazards such as forwarind units
- Full hazard processors also may need to stall the process
Pipeline Performance Metrics
- Critical path considerations impact performance analysis
- Clock-per-instruction (CPI)
- Clock time (Tc Max)
Pipeline Performance Example and Metrics
- Pipelined processors ideally would have a CPI of 1, but stalls and flushes can contribute to this value.
- By calculating time and speed up it is possible to analyze processors
- A SpecINT2000 Pipelined processor has a CPI of 1.15
- The average CPI is the sum over each instruction of the CPI for that instruction multiplied by the fraction of time that instruction is used.
Pipeline Performance Example
- For a program with 100 billion instructions executing on a pipelined MIPS processor, CPI = 1.15.
- Execution time = (100 x 10^9)(1.15)(550x10^-12)
- Execution Time = 63 seconds
Performance of Processors
- Single Cylce takes 95 seconds
- MultiCycle takes 133 seconds
- Pipelined takes 63 seconds
- But its advantage over the single-cycle processor is nowhere near the fivefold speedup one might hope to get from a five stage pipeline.
- The pipeline hazards introduce a small CPI penalty. More significantly, the sequencing overhead (clk-to-Q and setup times) of the registers applies to every pipeline stage, not just once to the overall datapath.
- Sequencing overhead limits the benefits one can hope to achieve from pipelining.
Appendices Includes
- Control Hazards in beq for branching prediction for example
- Data Forwarding where a stage should forward from a stage if that stage if that stage will write a destination register and the destination register matches the source register
- Stalling which requires an analysis of read after write cycles
- Stalling hardware has stalling logic
Evolution of ARM processors
- Various processors have been released over the years such as the ARM1, ARM6, ARM7, ARM9E, ARM11, Cortex-A9, Cortex-A7, Cortex-A15, Cortex-M0, Cortex-A53, Cortex-A57
- The release year varies between the years 1985 and 2012
(Optional) Out of Order Processor
- Looks ahead across multiple instructions
- Issues as many instructions as possible at once
- Issues instructions out of order (as long as no dependencies)
- Dependencies: RAW (read after write), WAR (write after read), WAW (write after write)
- It has Instruction level parallelism (ILP) which relates to the number of instructions that can be issued simultaneously (average < 3)
- The Scoreboard: table that keeps track of: Instructions waiting to issue ,Available functional units and Dependencies
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.