Designing for Performance: Microprocessor Systems

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which technique involves moving data or instructions into a conceptual pipe where all stages process simultaneously?

  • Superscalar execution
  • Speculative execution
  • Data flow analysis
  • Pipelining (correct)

Which processor technique predicts which branches or instruction groups are likely to be processed next?

  • Superscalar Execution
  • Data Flow Analysis
  • Branch Prediction (correct)
  • Speculative Execution

What is the primary characteristic of superscalar execution in processor design?

  • Analyzing data dependencies to optimize instruction scheduling.
  • Executing instructions ahead of their appearance in the program.
  • Issuing more than one instruction per clock cycle. (correct)
  • Using branch prediction to optimize instruction flow.

What does data flow analysis achieve in contemporary processors?

<p>Optimizing the instruction schedule based on data dependencies. (C)</p> Signup and view all the answers

Which execution method involves processors executing instructions ahead of their actual appearance in the program?

<p>Speculative execution (D)</p> Signup and view all the answers

What strategy is used to compensate for the mismatch in capabilities among various computer components to improve performance?

<p>Adjusting the organization and architecture (A)</p> Signup and view all the answers

How do wider DRAMs improve performance balance in computer architecture?

<p>By increasing the number of bits retrieved at one time. (B)</p> Signup and view all the answers

What role do cache structures play in achieving performance balance?

<p>Reducing the frequency of memory access (C)</p> Signup and view all the answers

How does increasing the number of gates on a processor chip affect its performance?

<p>Increases the clock rate (D)</p> Signup and view all the answers

What is the primary effect of reduced propagation time for signals on a processor?

<p>Faster processing speeds (D)</p> Signup and view all the answers

What is a significant problem that arises from increasing clock speed and logic density in processors?

<p>Increased RC delay (D)</p> Signup and view all the answers

How does the reduction in component size on a chip affect resistance in wire interconnects?

<p>Increases resistance (C)</p> Signup and view all the answers

How does increased proximity of wires affect capacitance on a chip?

<p>Increases capacitance (A)</p> Signup and view all the answers

What is a primary advantage of using multicore processors?

<p>Increasing performance without raising the clock rate (B)</p> Signup and view all the answers

According to the presentation, what is the main idea behind Amdahl's Law?

<p>Deals with the potential speedup of a program using multiple processors compared to using a single processor. (C)</p> Signup and view all the answers

Which of the following is a key consideration highlighted by Amdahl's Law regarding the transition to multi-core architectures?

<p>Software must be adapted for parallel execution to effectively use multiple cores. (D)</p> Signup and view all the answers

What does Little's Law fundamentally describe?

<p>The relationship in a queuing system. (C)</p> Signup and view all the answers

In the context of Little's Law, what happens to an item when a server in a queuing system is idle?

<p>It is served immediately. (D)</p> Signup and view all the answers

According to Little's Law, the average number of items in a queuing system is equal to what?

<p>The average rate at which items arrive multiplied by the average time an item spends in the system. (C)</p> Signup and view all the answers

What characteristic makes the Arithmetic Mean (AM) an appropriate measure for comparing system performance?

<p>If the sum of measurements provides a meaningful value (D)</p> Signup and view all the answers

Why is the Arithmetic Mean (AM) considered a good candidate for evaluating computer systems?

<p>It is directly proportional to the total time (A)</p> Signup and view all the answers

What does the use of multiple runs with different inputs in simulation studies ensure when evaluating alternative products?

<p>The results are not biased by unusual features of a specific data set (B)</p> Signup and view all the answers

Which of the following best describes a desirable characteristic of a benchmark program?

<p>Written in high-level language allowing portability. (C)</p> Signup and view all the answers

What makes a benchmark program 'representative'?

<p>It reflects common programming domains and paradigms. (B)</p> Signup and view all the answers

What is a key aim of the System Performance Evaluation Corporation (SPEC)?

<p>To provide representative test of a computer in a particular application or system programming (B)</p> Signup and view all the answers

What type of applications is the SPEC CPU2017 benchmark suite most appropriate for measuring?

<p>Processor-intensive applications (A)</p> Signup and view all the answers

What programming languages are benchmarks in the SPEC CPU2017 suite written in?

<p>C, C++, and Fortran. (D)</p> Signup and view all the answers

What is the purpose of the 'base metric' in SPEC benchmarks?

<p>To establish a reference point on the system to be evaluated (A)</p> Signup and view all the answers

How does the 'peak metric' in SPEC testing differ from the 'base metric'?

<p>Allows optimization of the compiler output (B)</p> Signup and view all the answers

In SPEC documentation, what does the 'speed metric' measure?

<p>The time it takes to execute a compiled benchmark. (C)</p> Signup and view all the answers

What does the 'rate metric' measure?

<p>How many tasks are completed in an amount of time (A)</p> Signup and view all the answers

What is the primary purpose of calculating the geometric mean of all ratios in SPEC evaluation?

<p>To summarize performance across various benchmarks into a comparative metric (B)</p> Signup and view all the answers

What is the role of the 'reference machine' in the SPEC benchmarking process?

<p>To establish a baseline performance for benchmarks in comparison with results of System Under Test (B)</p> Signup and view all the answers

According to the presentation, which application would benefit MOST from the use of today's microprocessor-based systems?

<p>Image processing (B)</p> Signup and view all the answers

According to the presentation, which is NOT one of the techniques built into contemporary processors?

<p>Memory Optimization (B)</p> Signup and view all the answers

According to the presentation, which of the following is NOT an improvement in Chip Organization and Architecture?

<p>Reduce hardware speed of processor (D)</p> Signup and view all the answers

Signup and view all the answers

Flashcards

Pipelining

Moving data/instructions into a conceptual pipe where stages process simultaneously.

Branch Prediction

Predicting which branches of code are likely to be processed next.

Superscalar Execution

Issuing more than one instruction in a processor clock cycle using parallel pipelines.

Data Flow Analysis

Analyzing instruction dependencies to optimize the execution schedule.

Signup and view all the flashcards

Speculative Execution

Speculatively executing instructions ahead of program counter, holding results temporarily.

Signup and view all the flashcards

Wider DRAM

Increasing bits retrieved per access and utilizing wide bus data paths.

Signup and view all the flashcards

Efficient Cache Structures

Complex cache structures that reduces frequency of memory access.

Signup and view all the flashcards

Efficient DRAM interface

Improve the DRAM interface. Cache or buffering scheme on the DRAM chip.

Signup and view all the flashcards

Higher Speed Buses

Faster buses and bus hierarchies for improved data flow.

Signup and view all the flashcards

Hardware Speed Increase

Shrinking logic gate size which increases clock rate, reduces propagation time.

Signup and view all the flashcards

Size/Speed of Caches

Allocating part of processor chip to increase how fast cache is accessed.

Signup and view all the flashcards

Processor Reorganization

Enhancing instruction speed, increasing the speed due to parallelism.

Signup and view all the flashcards

Power Density

An increase in a computers logic and clock speed.

Signup and view all the flashcards

RC Delay

Resistance and capacitance limits speed.

Signup and view all the flashcards

Multicore

Multiple processors on a single chip increasing performance.

Signup and view all the flashcards

Many Integrated Core

Boosting performance with software exploitation challenges.

Signup and view all the flashcards

Graphics Processing Unit

Core design to perform on the graphic data and parallel execution.

Signup and view all the flashcards

Amdahl's Law

Potential program speedup with multiple processors.

Signup and view all the flashcards

Little's Law

System is statistically steady, it cannot be leaky.

Signup and view all the flashcards

Arithmetic Mean

Measure if sum of measurements is a meaningful and interesting value.

Signup and view all the flashcards

Benchmark Program

Portable, representative, measurable, and widely distributed programs.

Signup and view all the flashcards

Benchmark Suite

High level language programs in a benchmark, the language has to be definable.

Signup and view all the flashcards

SPEC CPU2017

Collection of benchmark. Industry standard for processor applications.

Signup and view all the flashcards

Peak Metric

Attempts to show how well the system preforms.

Signup and view all the flashcards

Speed Metric

Time measurement to calculate performance metrics.

Signup and view all the flashcards

Rate Metric

Tasks a computer finishes in a certain amount of time.

Signup and view all the flashcards

Study Notes

Designing for Performance

  • Computer system costs continue to decrease, while performance and capacity dramatically increase.
  • Today's laptops possess the computing power of an IBM mainframe from 10-15 years prior.
  • Microprocessors are now inexpensive enough to be disposable.
  • Power is required by desktop apps.
  • Image processing is a Microprocessor-based system.
  • Three-dimensional rendering needs powerful microprocessor-based systems.
  • Desktop apps like Speech recognition utilizes powerful microprocessor-based systems.
  • Videoconferencing requires modern microprocessor systems.
  • Multimedia authoring is a desktop app that needs processing power.
  • Voice and video annotation of files relies of microprocessor systems.
  • Simulation modeling programs are powerful apps today.
  • Businesses depend on servers for transactions.
  • Database processing relies on powerful servers.
  • Cloud service providers use server banks for high-volume apps.

Microprocessor Speed

  • Contemporary processors use Pipelining, which moves data or instructions through stages of simultaneous processing.
  • Branch prediction looks ahead at instructions fetched from memory.
  • It predicts which branches or instruction groups are likely to be processed next.
  • Superscalar execution is issuing more than one instruction per processor clock cycle, using parallel pipelines.
  • Data flow analysis optimizes instruction scheduling based on dependencies between instructions' results or data.
  • Speculative execution uses branch prediction and data flow analysis and executes instructions early.
  • Results from speculative execution are held in temporary locations, maximizing execution engine activity

Performance Balance

  • Performance balance involves adjusting an organization.
  • It involves adjusting an architecture to compensate for capabilities.
  • Mismatches exist among the various components.
  • Architectural examples increase retrieval bits by making DRAMs "wider" and using wide bus data paths.
  • Memory access frequency is reduced by incorporating cache structures between the processor and main memory.
  • The DRAM interface is modified for efficiency.
  • A cache or buffering scheme is included on the DRAM chip.
  • Interconnect bandwidth between processors and memory is increased.
  • Higher speed buses are installed with a bus hierarchy to structure data flow.

Chip Organization and Architecture Improvements

  • Hardware speed can be increased.
  • Logic gate size is fundamentally reduced to improve the hardware speed of a processor.
  • Clock rate increases from Packing more gates tighter
  • Propagation time for signals is reduced.
  • Cache size and speed are increased, with part of the processor chip dedicated to cache.
  • Increased cache size drops caches access times significantly.
  • Processor organization and architecture are manipulated.
  • Increase effective speed of instruction execution
  • Parallelism is increased by changing the processor architecture.

Problems with Clock Speed and Logic Density

  • Power density increases with the density of logic and clock speed.
  • Dissipating heat is an issue as clock speed increases.
  • RC delay: Electron flow speed is limited by resistance.
  • Capacitance of metal wires connecting them decreases speed.
  • Delay increases as the RC product (resistance times capacitance) increases.
  • As components decrease in size, wire interconnects become thinner.
  • Thinner wires increase resistance.
  • Wires being closer together increases capacitance.
  • Memory latency and throughput mean Memory access speed (latency) and transfer speed (throughput) lag processor speeds

Multicore Processing

  • Multiple processors on chip increase performance without increasing clock rate.
  • Goal is to use two simpler processors, strategy is to use two
  • With two processors, larger caches are good.
  • Caches became larger, performance made sense.

Many Integrated Core (MIC) and Graphics Processing Unit (GPU)

  • MIC: There is a leap in performance.
  • MIC: Challenges exist when you develop software to process many number of cores
  • MIC: Multicore and MIC strategy involves a collection of general processors on a chip.
  • GPU: Core that performs parallel operations on graphics data are called GPU
  • GPU: Encoding and Rendering 2D and 3D happens on plugin graphics card.
  • GPU chips process video.
  • GPU's can be used as a vector processor for repetitive apps.

Amdahl’s Law

  • Gene Amdahl is the namesake.
  • Amdahl's Law Addresses program speed with multiple processors.
  • It compares multiple processors speed to a single processor.
  • Amdahl's Law Illustrates problems in multi-core machine development.
  • Software must adapt to parallel execution.
  • Parallel adaptable software take advantage of the power of parallel processing.
  • Speed up and Technical is used to evaluate and design technical systems.

Little’s Law

  • Fundamental and simple relation with broad applications
  • Applies to systems statistically in steady state without leakage.
  • A queuing system operates such that if a server is idle, an item is served immediately
  • Arriving items join a queue.
  • A single queue can exist for a single server or multiple server

Average Number in queuing terms

  • The average number of items equals rate at which items arrive x time an item spends in the system
  • The Relationship between average number of items and rate at which items arrive requires very few assumptions
  • Its simplicity and generality make it extremely useful

Performance Factors and System Attributes

  • Instruction set architecture affects both Ic (instruction count) and p (cycles per instruction).
  • Compiler technology impacts Ic, p, and m (memory accesses).
  • Processor implementation influence p and 𝜏 (processor cycle time).
  • Cache and memory hierarchy impact m and 𝑘 (transfer speed).

Calculating the Mean

  • Benchmarks uses compares systems involves mean value.
  • The mean is calculating with data points.
    • The set of data points used are related to execution time
  • Common formulas for Mean is
  • Arithmetic mean
  • Geometric
  • Harmonic

Arithmetic Mean (AM)

  • AM is appropriate if the sum of all the measurements is meaningful.
  • AM suits comparing execution time of various systems.
  • Running simulation can also use AM, for system and performance comparisons

Arithmetic Mean property

  • AM is for runtime or for program time.
  • It is the same proportion to toal time.
  • The mean value doubles if you double all values.

Using geometric normalized Tables

  • Arithmetic mean: Add all results and divide my number of tests
  • Good to measure execution time

Benchmark Principles

  • Desirable benchmark program characteristics:
  • Written in a high-level language for portability.
  • Representative of a programming domain or paradigm.
  • It can be measured easily.
  • Widely Distributed.

System Performance Evaluation Corporation (SPEC)

  • Benchmark suite:
  • Collection defined in language
  • Test of area
  • SPEC is an industry consortium
  • Suites and evaluates systems.
  • Measures are used.

SPEC CPU2017

  • Best Suite is SPEC for benchmark.
  • Best for process intensive applications
  • Best for measure.
  • Measuring performance is good for application spend.
  • Written in C++.
  • Most have rates.
  • Over 11 million lines of code in the program.

Terminology

  • Benchmarks: A program is written, any computer can implement the program
  • System under test
  • Evaluated system
  • Reference machine
  • Used by SPEC, establish performance.
  • Base metrics
  • Compilation had strict guidelines
    • Peak metric
    • Compiler can optimize system performance
    • Speed metrics
    • Measure, computer takes time to execute compiled tests
    • Rate metric: Computer do, to test amount of time
      • Throughput test and capacity measure -- System is able to test same task, tests take advantages of processor.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser