quiz image

CAP 5 - Superscalar Processors and Pipelining

SelfDeterminationOmaha avatar
SelfDeterminationOmaha
·
·
Download

Start Quiz

Study Flashcards

60 Questions

What type of hazards does Tomasulo's algorithm eliminate?

RAW and WAW

What is the purpose of the reservation stations in Tomasulo's algorithm?

To rename registers

What happens to instructions due to RAW hazards in Tomasulo's algorithm?

They are blocked

What is the purpose of the load buffers in Figure 5.9?

To compute the effective address

What is the significance of the 'write result' in Figure 5.9?

It is dependent on the branch outcome verification

What is allowed by Tomasulo's algorithm, even without speculative execution?

Loop unrolling

What is depicted in Figure 5.9?

A snapshot of the Tomasulo's algorithm in action

What is the benefit of Tomasulo's algorithm?

It reduces delays caused by differences in execution times

What is true about Tomasulo's algorithm?

It allows out-of-order completion, but with in-order issue

What is shown in Figure 5.10?

The detailed operations in each step of Tomasulo's algorithm

What is the status of the load buffer Load1 in the reservation stations?

Not busy

Which register does not have a value yet in the reservation station Add1?

f2

What is the operation being handled by reservation station Add2?

Addition

What is the current status of the instruction at line 1 in the code?

Completed and written to CDB

What information is held by the field A in the reservation station Load2?

The effective address

What is the operation being handled by reservation station Mult1?

Multiplication

What is the status of all instructions in the scenario?

All issued

What is the current status of the register f6?

Available

What is the dependency of the result in reservation station Add2?

Both Qj = Add1 and Qk = Load2

What is the current status of the instruction at line 2 in the code?

Issued but not completed

What is the significance of the effective address computation in the load buffers?

It enables the computation of the values in the Vj field.

Why are the multiplications waiting on Qj = Load1 and Qj = Load2 respectively?

The multiplications depend on the values computed by the loads.

What is the reason for the stores being pending on stations Qk = Mult1 and Qk = Mult2?

The stores are dependent on the results of the multiplications.

What is the purpose of register renaming in Tomasulo's algorithm?

To eliminate WAR and WAW hazards.

What is the benefit of Tomasulo's algorithm in terms of instruction execution?

It enables out-of-order completion but in-order issue.

What is the significance of the Vk = Regs[f2] in Figure 5.9?

It shows that the register f2 is available.

What is the purpose of the field A in the load buffers?

It holds the effective address computed for the loads.

What is the status of the instructions in the scenario?

All instructions are pending.

What is the purpose of the field Vj in the reservation stations?

It holds the values available for renaming.

What is the benefit of Tomasulo's algorithm in terms of hazards?

It eliminates WAR and WAW hazards.

What is the purpose of the Qi field in the register file part?

It indicates which reservation station is computing which register value.

In Figure 5.8, what is the status of the multiplication operation?

It is ready to write its result.

What is the role of the reservation stations in the loop-based example?

They act as additional registers.

What is the purpose of the Vk field in the reservation stations?

It holds the available value.

What is the current status of the division operation in Figure 5.8?

It is still pending on Qj = Mult1 for the f0 value.

What is the advantage of Tomasulo's algorithm in terms of loop execution?

It allows multiple loop execution/iterations at once.

What is the role of the register renaming in Tomasulo's algorithm?

It eliminates WAW and WAR hazards.

What is the current status of the instruction at line 3 in the code?

It is finished.

What is the purpose of the field Vj in the reservation stations?

It holds the value of the operand.

What is the benefit of Tomasulo's algorithm in terms of instruction execution?

It enables multiple instructions to be executed simultaneously.

What is the purpose of the reservation station's Op field in Tomasulo's algorithm?

The Op field holds the operation to be performed on the source operands S1 and S2.

How does the Qi field in the register file indicate the status of a register?

The Qi field indicates the number of the reservation station that contains the operation whose result should be stored into that register.

What is the significance of the Vj and Vk fields in the reservation station?

The Vj and Vk fields hold the values of the source operands when they are already available.

What happens to the instructions when the store address and store value are available?

The result is written as quickly as the memory unit is available.

What is the purpose of renaming registers in Tomasulo's algorithm?

Register renaming helps to avoid unnecessary hazards, specifically false dependencies.

What is the role of the A field in the reservation station?

The A field holds the information for the memory address calculation for a load or store.

How do the Qj and Qk fields in the reservation station indicate the status of the source operands?

The Qj and Qk fields hold the reservation stations that will produce the corresponding source operands.

What is the benefit of dynamically determining when an instruction is ready to execute?

It helps to minimize RAW hazards.

What is the purpose of the Busy field in the reservation station?

The Busy field indicates that a reservation station and its accompanying functional unit are occupied.

What is the significance of the effective address computation in the load buffers?

It enables the computation of the effective address for a load or store instruction.

What is the primary function of the reservation station in Tomasulo's algorithm?

To hold the operation to perform on the source operands

How do the Qj and Qk fields in the reservation station indicate the status of the source operands?

A value of zero (or empty value) indicates that the source operand is already available

What is the purpose of the Vj and Vk fields in the reservation station?

To hold the values of the source operands when they are already available

What is the role of the A field in the reservation station?

To hold the information for the memory address calculation for a load or store

What is the significance of dynamically determining when an instruction is ready to execute?

It helps to minimize RAW hazards

What is the purpose of the Qi field in the register file?

To indicate the number of the reservation station that contains the operation whose result should be stored into this register

What happens to instructions when the store address and store value are available?

The result is written as quickly as the memory unit is available

What is the benefit of register renaming in Tomasulo's algorithm?

It helps to minimize WAR and WAW hazards

What is the significance of the Busy field in the reservation station?

It indicates that a reservation station and its accompanying functional unit are occupied

How do the reservation stations handle loads?

The field Vk is used to hold the offset part

Study Notes

Superscalar Processors

  • There are two approaches to exploit Instruction-Level Parallelism (ILP): hardware-based and software-based.
  • Hardware-based approaches aim to discover and exploit parallelism dynamically during runtime.
  • Software-based approaches aim to find parallelism statically at compile time.

Pipelining Limits

  • The pipelining approach creates a theoretical barrier, with a CPI (clock cycles per instruction) value of 1.
  • The actual pipeline CPI is greater than 1 due to stalls.
  • The ideal pipeline CPI is approximately 1.

Advancing ILP Exploration

  • Techniques to advance ILP exploration include pipeline, superpipeline, and multiple issue.
  • Multiple issue is about superscalar processors and Very Long Instruction Word (VLIW) approach.

Pipeline

  • Pipeline technique divides instructions into phases or stages and overlaps them to improve performance.
  • This is transparent at the software level, as the hardware takes care of it.

Superpipeline

  • Superpipeline explores the fact that various pipeline stages require less than half clock cycle to complete.
  • A doubled internal clock frequency enables two same stages to run within one external clock cycle.
  • Each pipeline stage function can be divided into two parts with no overlap, and each part is executed in half clock cycle.

Superscalar Processor

  • In a superscalar processor, there are multiple issues, i.e., the processor issues multiple scalar instructions per cycle.
  • The processor can execute independent instructions in parallel into different pipelines.
  • Out-of-order instructions execution is allowed, and the processor has to handle structural hazards, data hazards, and register renaming.

Tomasulo's Algorithm

  • Tomasulo's algorithm is used in superscalar processors to handle hazards and allow out-of-order execution.
  • The algorithm uses reservation stations, load buffers, and store buffers to manage instructions and data.
  • RAW hazards are avoided by executing instructions only when their operands are available.
  • WAR and WAW hazards are handled by register renaming in the reservation stations.
  • The algorithm has three basic steps: issue/dispatch, execute/issue, and write result.

Superscalar Processors

  • A superscalar processor can execute multiple scalar instructions per cycle, issuing multiple instructions to different pipelines.
  • The processor has multiple functional units, and out-of-order execution is allowed to handle structural hazards and data hazards.
  • The instruction issue frequency should be much greater than the execution frequency for a superscalar processor to be effective.

Superpipeline

  • The superpipeline approach doubles the internal clock frequency, allowing two same stages to run within one external clock cycle.
  • A doubled internal clock allows each pipeline stage function to be divided into two parts, with each part executed in half a clock cycle.
  • This is called superpipeline level 2, where two same pipeline stages can run within one clock cycle.

Very Long Instruction Word (VLIW)

  • VLIW encodes multiple instructions in one "long" word, including opcodes and operands.
  • Packing is handled at design time by the compiler, and the processor fetches the package and issues it entirely at once.
  • A VLIW processor also has multiple functional units.

Tomasulo's Algorithm

  • Tomasulo's algorithm is used in superscalar processors to handle out-of-order execution and avoid hazards.
  • The algorithm has three basic steps: issue/dispatch, execute/issue, and write result.
  • Reservation stations are used to hold instructions, operands, and data, and to detect and fix possible hazards.
  • Load buffers and store buffers are also used to handle loads and stores.
  • RAW hazards are avoided by executing instructions only when operands are available, and WAR and WAW hazards are handled by register renaming.

Reservation Station Details

  • A reservation station has seven fields: Op, Qj, Qk, Vj, Vk, A, and Busy.
  • The Op field holds the operation to perform on the source operands.
  • The Qj and Qk fields hold the reservation stations that will produce the corresponding source operands.
  • The Vj and Vk fields hold the values of the source operands when they are available.
  • The A field holds the information for memory address calculation for a load or store.
  • The Busy field indicates that a reservation station and its accompanying functional unit are occupied.

Register File Details

  • The Qi field in the register file indicates the number of the reservation station that contains the operation whose result should be stored in the register.
  • If the value of Qi is blank or 0, no current active instruction is computing a result destined for this register.

Example Scenario

  • Consider a RISC-V code snippet with six instructions, where all instructions are issued, but only the first instruction load has been completed and written its result in the common data bus (CDB).
  • The scenario is illustrated in Fig. 5.7, showing the instruction status, reservation station fields, and register file status.

Superscalar Processors

  • Superscalar processors issue multiple scalar instructions per cycle, executing independent instructions in parallel into different pipelines.
  • This approach allows for out-of-order execution, handling "new" problems such as structural hazards (e.g., registers) and data hazards (WAW and WAR).

Superpipeline

  • Superpipeline approach explores the fact that various pipeline stages require less than half clock cycle to complete.
  • A doubled internal clock frequency enables two same stages to run within one external clock cycle, allowing for two same pipeline stages per clock cycle.
  • Each pipeline stage function can be divided into two parts with no overlap, and each part is executed in half clock cycle.

Very Long Instruction Word (VLIW)

  • VLIW approach encodes various different instructions in one "long" word, including opcodes and their operands.
  • In the VLIW context, there is the packing, handled at design time by the compiler.
  • The processor fetches the package and issues it entirely at once, having multiple functional units.

The Quest for CPI < 1

  • The quest for a CPI (Cycles Per Instruction) less than 1 includes multiple issue processors such as superscalar and VLIW.
  • Superscalar processors can be ordered, or scheduled, by either software or hardware, considering modern compiler optimizations and register optimizations.

Out-of-Order Execution

  • Out-of-order execution allows for a safe execution of instructions, even in the presence of false dependencies.
  • Techniques to avoid false dependencies include register renaming, which is used to avoid false WAR and WAW dependencies.

Register Renaming

  • Register renaming is a technique used to avoid false dependencies, replacing registers with new ones not used or read until their next writing point.
  • This technique is applied to WAW and WAR dependencies, allowing for out-of-order execution.

Tomasulo's Algorithm

  • Tomasulo's algorithm is a hardware algorithm that allows out-of-order completion, but with in-order issue.
  • It eliminates WAR and WAW hazards by renaming registers, thanks to its reservation stations.
  • It blocks instructions due to RAW hazards, and allows loop unrolling, even without speculative execution.
  • The algorithm has three main steps: Execute/Issue, Write Result, and Reservation Station.

Instruction-Level Parallelism (ILP) Exploration

  • There are two approaches to exploiting ILP: hardware-based and software-based
  • Hardware-based approach discovers and exploits parallelism dynamically during runtime
  • Software-based approach finds parallelism statically at compile-time

Pipelining Limits

  • Pipelining creates a theoretical barrier, with a CPI (Cycles Per Instruction) value of 1
  • Actual pipeline CPI is greater than 1 due to stalls
  • Ideal pipeline CPI is approximately 1

Advancing ILP Exploration

  • Techniques to advance ILP exploration include:
    • Pipeline
    • Superpipeline
    • Multiple issue (superscalar processors)
    • Very long instruction word (VLIW) approach

Pipeline

  • Divides instructions into phases or stages and overlaps them to improve performance
  • Transparent at the software level, as hardware takes care of it

Superpipeline

  • Exploits the fact that pipeline stages require less than half a clock cycle to complete
  • Doubles internal clock frequency, enabling two same stages to run within one external clock cycle
  • Each pipeline stage function can be divided into two parts with no overlap

Superscalar Processors

  • Issues multiple scalar instructions per cycle
  • Executes independent instructions in parallel into different pipelines
  • Has multiple functional units (e.g., ALU, LOAD, STORE, FP units)
  • Requires out-of-order instructions execution to be really effective

Dynamic Scheduling Problems

  • WAR and WAW dependencies can occur due to structural hazards and data hazards
  • Examples of dependencies:
    • RAW (Read-After-Write)
    • WAR (Write-After-Read)
    • WAW (Write-After-Write)

False Dependencies Solution

  • Registers renaming technique:
    • Avoids false WAR and WAW dependencies
    • Allows for out-of-order execution while preserving the same result as in-order execution

Tomasulo's Algorithm

  • Uses reservation stations to execute instructions out-of-order
  • Stations have Qi (indicator of which reservation station is computing a register value)
  • Allows for multiple instructions to be executed simultaneously while avoiding dependencies

Superscalar Processors

  • There are two approaches to exploiting Instruction-Level Parallelism (ILP): hardware-based and software-based.
  • Hardware-based approach discovers and exploits parallelism dynamically during runtime, while software-based approach finds parallelism statically at compile time.
  • Aggressive compiler-based proposals have been tried since the 1980s, but have been successful only in domain-specific environments.

Pipelining Limits

  • The pipelining approach creates a theoretical barrier, with the CPI (clock cycles per instruction) value for a pipelined processor being the sum of the base CPI and contributions from stalls.
  • The ideal, or theoretical, pipeline CPI is approximately 1.
  • The actual pipeline CPI is greater than 1, prompting the question of how to change the processor to achieve a CPI less than 1.

Techniques for Advancing ILP Exploration

  • Techniques for advancing ILP exploration include pipeline, superpipeline, multiple issue, and Very Long Instruction Word (VLIW) approaches.

Pipelining

  • Pipelining divides instructions into phases or stages and overlaps these stages to improve performance.
  • This is transparent at the software level, as the hardware takes care of it.

Superpipelining

  • Superpipelining explores the fact that various pipeline stages require less than half a clock cycle to complete their operations.
  • A doubled internal clock frequency enables two same stages to run within one external clock cycle.
  • Each pipeline stage function can be divided into two parts with no overlap, and each part is executed in half a clock cycle.

Superscalar Processors

  • Superscalar processors issue multiple scalar instructions per cycle.
  • The processor executes independent instructions in parallel into different pipelines.
  • To be effective, out-of-order instructions execution has to be allowed and followed, and the processor has to handle structural hazards (e.g., with respect to registers) and data hazards (e.g., WAW and WAR).

Registers Renaming

  • Registers renaming is used to avoid false WAR and WAW dependencies.
  • This technique is used in out-of-order execution, allowing for safe parallel execution.

WAW Analysis

  • WAW (Write-After-Write) analysis involves false dependencies, which can be fixed by renaming registers.

WAR Analysis

  • WAR (Write-After-Read) analysis involves false dependencies, which can be fixed by renaming registers.

Execute/Issue Step

  • In the execute/issue step, the processor monitors the common data bus (CDB) to get operands.
  • When an operand is ready, it is put in the reservation station waiting for execution.
  • Execution occurs when all operands are available, and RAW hazards are prevented by delaying instruction execution until operands are available.

Write Result

  • In the write result step, a result is written to the CDB, then to the register file, and also to any reservation station and buffers waiting for that result.
  • Stores remain buffered until both store address and store value are available.

Superscalar Processors

  • Exploiting instruction-level parallelism (ILP) can be done using two approaches: hardware-based and software-based.
  • Hardware-based approach aims to discover and exploit parallelism dynamically during runtime.
  • Software-based approach aims to find parallelism statically at compile time.

Pipelining Limits

  • Pipelining creates a theoretical barrier, i.e., clock cycles per instruction (CPI) equal to 1.
  • The actual pipeline CPI is greater than 1 due to stalls.
  • CPI Pipeline = CPI IdealPipeline + Stalls Structural + Stalls DataHazard + Stalls Control.

Advancing in ILP Exploration

  • Advances in ILP consider techniques such as pipeline, superpipeline, and multiple issue.
  • Multiple issue is related to superscalar processors and very long instruction word (VLIW) approach.

False Dependencies Solution

  • The basic premise is to allow for out-of-order execution, preserving the same result as in-order execution.
  • Registers renaming is used to avoid false WAR and WAW dependencies.
  • Registers renaming replaces a register with another not used or read until its next writing point.

WAR and WAW Analysis

  • WAR analysis involves identifying false WAR dependencies and fixing them using register renaming.
  • WAW analysis involves identifying false WAW dependencies and fixing them using register renaming.

Tomasulo's Algorithm

  • Tomasulo's algorithm is a hardware solution for dynamic scheduling, allowing out-of-order execution.
  • It uses different functional units and renames registers to fix WAR and WAW hazards.
  • The algorithm consists of three steps: execute/issue, write result, and load/store.

Tomasulo's Algorithm Steps

  • Execute/issue: instructions are issued in FIFO order from the queue, and registers are renamed.
  • Write result: when a result is available, it is written to the CDB and then to the register file and reservation stations.
  • Load/store: loads and stores are executed in a two-part step, with loads executed right away when the memory unit is free.

Key Principles

  • Dynamically determining when an instruction is ready to execute helps minimize RAW hazards.
  • Renaming registers helps minimize WAR and WAW hazards.

Reservation Station Details

  • A reservation station has seven fields: Op, S1, S2, Qj, Qk, Vj, Vk, and A.
  • The fields hold information about the operation, source operands, and memory address calculation.

Register File Details

  • The field Qi indicates the number of the reservation station that contains the operation whose result should be stored into a register.
  • If the value of Qi is blank, no current active instruction is computing a result destined for this register.

This quiz covers two approaches to exploit Instruction-Level Parallelism in superscalar processors and the limits of pipelining in computer architecture.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Parallel CPU Execution Quiz
6 questions
Common Classes of Parallel Computers
6 questions
Pipelining and Superscalar Operations
5 questions
Use Quizgecko on...
Browser
Browser