Podcast
Questions and Answers
Which technique involves moving data or instructions into a conceptual pipe where all stages process simultaneously?
Which technique involves moving data or instructions into a conceptual pipe where all stages process simultaneously?
- Superscalar execution
- Speculative execution
- Data flow analysis
- Pipelining (correct)
Which processor technique predicts which branches or instruction groups are likely to be processed next?
Which processor technique predicts which branches or instruction groups are likely to be processed next?
- Superscalar Execution
- Data Flow Analysis
- Branch Prediction (correct)
- Speculative Execution
What is the primary characteristic of superscalar execution in processor design?
What is the primary characteristic of superscalar execution in processor design?
- Analyzing data dependencies to optimize instruction scheduling.
- Executing instructions ahead of their appearance in the program.
- Issuing more than one instruction per clock cycle. (correct)
- Using branch prediction to optimize instruction flow.
What does data flow analysis achieve in contemporary processors?
What does data flow analysis achieve in contemporary processors?
Which execution method involves processors executing instructions ahead of their actual appearance in the program?
Which execution method involves processors executing instructions ahead of their actual appearance in the program?
What strategy is used to compensate for the mismatch in capabilities among various computer components to improve performance?
What strategy is used to compensate for the mismatch in capabilities among various computer components to improve performance?
How do wider DRAMs improve performance balance in computer architecture?
How do wider DRAMs improve performance balance in computer architecture?
What role do cache structures play in achieving performance balance?
What role do cache structures play in achieving performance balance?
How does increasing the number of gates on a processor chip affect its performance?
How does increasing the number of gates on a processor chip affect its performance?
What is the primary effect of reduced propagation time for signals on a processor?
What is the primary effect of reduced propagation time for signals on a processor?
What is a significant problem that arises from increasing clock speed and logic density in processors?
What is a significant problem that arises from increasing clock speed and logic density in processors?
How does the reduction in component size on a chip affect resistance in wire interconnects?
How does the reduction in component size on a chip affect resistance in wire interconnects?
How does increased proximity of wires affect capacitance on a chip?
How does increased proximity of wires affect capacitance on a chip?
What is a primary advantage of using multicore processors?
What is a primary advantage of using multicore processors?
According to the presentation, what is the main idea behind Amdahl's Law?
According to the presentation, what is the main idea behind Amdahl's Law?
Which of the following is a key consideration highlighted by Amdahl's Law regarding the transition to multi-core architectures?
Which of the following is a key consideration highlighted by Amdahl's Law regarding the transition to multi-core architectures?
What does Little's Law fundamentally describe?
What does Little's Law fundamentally describe?
In the context of Little's Law, what happens to an item when a server in a queuing system is idle?
In the context of Little's Law, what happens to an item when a server in a queuing system is idle?
According to Little's Law, the average number of items in a queuing system is equal to what?
According to Little's Law, the average number of items in a queuing system is equal to what?
What characteristic makes the Arithmetic Mean (AM) an appropriate measure for comparing system performance?
What characteristic makes the Arithmetic Mean (AM) an appropriate measure for comparing system performance?
Why is the Arithmetic Mean (AM) considered a good candidate for evaluating computer systems?
Why is the Arithmetic Mean (AM) considered a good candidate for evaluating computer systems?
What does the use of multiple runs with different inputs in simulation studies ensure when evaluating alternative products?
What does the use of multiple runs with different inputs in simulation studies ensure when evaluating alternative products?
Which of the following best describes a desirable characteristic of a benchmark program?
Which of the following best describes a desirable characteristic of a benchmark program?
What makes a benchmark program 'representative'?
What makes a benchmark program 'representative'?
What is a key aim of the System Performance Evaluation Corporation (SPEC)?
What is a key aim of the System Performance Evaluation Corporation (SPEC)?
What type of applications is the SPEC CPU2017 benchmark suite most appropriate for measuring?
What type of applications is the SPEC CPU2017 benchmark suite most appropriate for measuring?
What programming languages are benchmarks in the SPEC CPU2017 suite written in?
What programming languages are benchmarks in the SPEC CPU2017 suite written in?
What is the purpose of the 'base metric' in SPEC benchmarks?
What is the purpose of the 'base metric' in SPEC benchmarks?
How does the 'peak metric' in SPEC testing differ from the 'base metric'?
How does the 'peak metric' in SPEC testing differ from the 'base metric'?
In SPEC documentation, what does the 'speed metric' measure?
In SPEC documentation, what does the 'speed metric' measure?
What does the 'rate metric' measure?
What does the 'rate metric' measure?
What is the primary purpose of calculating the geometric mean of all ratios in SPEC evaluation?
What is the primary purpose of calculating the geometric mean of all ratios in SPEC evaluation?
What is the role of the 'reference machine' in the SPEC benchmarking process?
What is the role of the 'reference machine' in the SPEC benchmarking process?
According to the presentation, which application would benefit MOST from the use of today's microprocessor-based systems?
According to the presentation, which application would benefit MOST from the use of today's microprocessor-based systems?
According to the presentation, which is NOT one of the techniques built into contemporary processors?
According to the presentation, which is NOT one of the techniques built into contemporary processors?
According to the presentation, which of the following is NOT an improvement in Chip Organization and Architecture?
According to the presentation, which of the following is NOT an improvement in Chip Organization and Architecture?
Flashcards
Pipelining
Pipelining
Moving data/instructions into a conceptual pipe where stages process simultaneously.
Branch Prediction
Branch Prediction
Predicting which branches of code are likely to be processed next.
Superscalar Execution
Superscalar Execution
Issuing more than one instruction in a processor clock cycle using parallel pipelines.
Data Flow Analysis
Data Flow Analysis
Signup and view all the flashcards
Speculative Execution
Speculative Execution
Signup and view all the flashcards
Wider DRAM
Wider DRAM
Signup and view all the flashcards
Efficient Cache Structures
Efficient Cache Structures
Signup and view all the flashcards
Efficient DRAM interface
Efficient DRAM interface
Signup and view all the flashcards
Higher Speed Buses
Higher Speed Buses
Signup and view all the flashcards
Hardware Speed Increase
Hardware Speed Increase
Signup and view all the flashcards
Size/Speed of Caches
Size/Speed of Caches
Signup and view all the flashcards
Processor Reorganization
Processor Reorganization
Signup and view all the flashcards
Power Density
Power Density
Signup and view all the flashcards
RC Delay
RC Delay
Signup and view all the flashcards
Multicore
Multicore
Signup and view all the flashcards
Many Integrated Core
Many Integrated Core
Signup and view all the flashcards
Graphics Processing Unit
Graphics Processing Unit
Signup and view all the flashcards
Amdahl's Law
Amdahl's Law
Signup and view all the flashcards
Little's Law
Little's Law
Signup and view all the flashcards
Arithmetic Mean
Arithmetic Mean
Signup and view all the flashcards
Benchmark Program
Benchmark Program
Signup and view all the flashcards
Benchmark Suite
Benchmark Suite
Signup and view all the flashcards
SPEC CPU2017
SPEC CPU2017
Signup and view all the flashcards
Peak Metric
Peak Metric
Signup and view all the flashcards
Speed Metric
Speed Metric
Signup and view all the flashcards
Rate Metric
Rate Metric
Signup and view all the flashcards
Study Notes
Designing for Performance
- Computer system costs continue to decrease, while performance and capacity dramatically increase.
- Today's laptops possess the computing power of an IBM mainframe from 10-15 years prior.
- Microprocessors are now inexpensive enough to be disposable.
- Power is required by desktop apps.
- Image processing is a Microprocessor-based system.
- Three-dimensional rendering needs powerful microprocessor-based systems.
- Desktop apps like Speech recognition utilizes powerful microprocessor-based systems.
- Videoconferencing requires modern microprocessor systems.
- Multimedia authoring is a desktop app that needs processing power.
- Voice and video annotation of files relies of microprocessor systems.
- Simulation modeling programs are powerful apps today.
- Businesses depend on servers for transactions.
- Database processing relies on powerful servers.
- Cloud service providers use server banks for high-volume apps.
Microprocessor Speed
- Contemporary processors use Pipelining, which moves data or instructions through stages of simultaneous processing.
- Branch prediction looks ahead at instructions fetched from memory.
- It predicts which branches or instruction groups are likely to be processed next.
- Superscalar execution is issuing more than one instruction per processor clock cycle, using parallel pipelines.
- Data flow analysis optimizes instruction scheduling based on dependencies between instructions' results or data.
- Speculative execution uses branch prediction and data flow analysis and executes instructions early.
- Results from speculative execution are held in temporary locations, maximizing execution engine activity
Performance Balance
- Performance balance involves adjusting an organization.
- It involves adjusting an architecture to compensate for capabilities.
- Mismatches exist among the various components.
- Architectural examples increase retrieval bits by making DRAMs "wider" and using wide bus data paths.
- Memory access frequency is reduced by incorporating cache structures between the processor and main memory.
- The DRAM interface is modified for efficiency.
- A cache or buffering scheme is included on the DRAM chip.
- Interconnect bandwidth between processors and memory is increased.
- Higher speed buses are installed with a bus hierarchy to structure data flow.
Chip Organization and Architecture Improvements
- Hardware speed can be increased.
- Logic gate size is fundamentally reduced to improve the hardware speed of a processor.
- Clock rate increases from Packing more gates tighter
- Propagation time for signals is reduced.
- Cache size and speed are increased, with part of the processor chip dedicated to cache.
- Increased cache size drops caches access times significantly.
- Processor organization and architecture are manipulated.
- Increase effective speed of instruction execution
- Parallelism is increased by changing the processor architecture.
Problems with Clock Speed and Logic Density
- Power density increases with the density of logic and clock speed.
- Dissipating heat is an issue as clock speed increases.
- RC delay: Electron flow speed is limited by resistance.
- Capacitance of metal wires connecting them decreases speed.
- Delay increases as the RC product (resistance times capacitance) increases.
- As components decrease in size, wire interconnects become thinner.
- Thinner wires increase resistance.
- Wires being closer together increases capacitance.
- Memory latency and throughput mean Memory access speed (latency) and transfer speed (throughput) lag processor speeds
Multicore Processing
- Multiple processors on chip increase performance without increasing clock rate.
- Goal is to use two simpler processors, strategy is to use two
- With two processors, larger caches are good.
- Caches became larger, performance made sense.
Many Integrated Core (MIC) and Graphics Processing Unit (GPU)
- MIC: There is a leap in performance.
- MIC: Challenges exist when you develop software to process many number of cores
- MIC: Multicore and MIC strategy involves a collection of general processors on a chip.
- GPU: Core that performs parallel operations on graphics data are called GPU
- GPU: Encoding and Rendering 2D and 3D happens on plugin graphics card.
- GPU chips process video.
- GPU's can be used as a vector processor for repetitive apps.
Amdahl’s Law
- Gene Amdahl is the namesake.
- Amdahl's Law Addresses program speed with multiple processors.
- It compares multiple processors speed to a single processor.
- Amdahl's Law Illustrates problems in multi-core machine development.
- Software must adapt to parallel execution.
- Parallel adaptable software take advantage of the power of parallel processing.
- Speed up and Technical is used to evaluate and design technical systems.
Little’s Law
- Fundamental and simple relation with broad applications
- Applies to systems statistically in steady state without leakage.
- A queuing system operates such that if a server is idle, an item is served immediately
- Arriving items join a queue.
- A single queue can exist for a single server or multiple server
Average Number in queuing terms
- The average number of items equals rate at which items arrive x time an item spends in the system
- The Relationship between average number of items and rate at which items arrive requires very few assumptions
- Its simplicity and generality make it extremely useful
Performance Factors and System Attributes
- Instruction set architecture affects both Ic (instruction count) and p (cycles per instruction).
- Compiler technology impacts Ic, p, and m (memory accesses).
- Processor implementation influence p and 𝜏 (processor cycle time).
- Cache and memory hierarchy impact m and 𝑘 (transfer speed).
Calculating the Mean
- Benchmarks uses compares systems involves mean value.
- The mean is calculating with data points.
-
- The set of data points used are related to execution time
- Common formulas for Mean is
- Arithmetic mean
- Geometric
- Harmonic
Arithmetic Mean (AM)
- AM is appropriate if the sum of all the measurements is meaningful.
- AM suits comparing execution time of various systems.
- Running simulation can also use AM, for system and performance comparisons
Arithmetic Mean property
- AM is for runtime or for program time.
- It is the same proportion to toal time.
- The mean value doubles if you double all values.
Using geometric normalized Tables
- Arithmetic mean: Add all results and divide my number of tests
- Good to measure execution time
Benchmark Principles
- Desirable benchmark program characteristics:
- Written in a high-level language for portability.
- Representative of a programming domain or paradigm.
- It can be measured easily.
- Widely Distributed.
System Performance Evaluation Corporation (SPEC)
- Benchmark suite:
- Collection defined in language
- Test of area
- SPEC is an industry consortium
- Suites and evaluates systems.
- Measures are used.
SPEC CPU2017
- Best Suite is SPEC for benchmark.
- Best for process intensive applications
- Best for measure.
- Measuring performance is good for application spend.
- Written in C++.
- Most have rates.
- Over 11 million lines of code in the program.
Terminology
- Benchmarks: A program is written, any computer can implement the program
- System under test
- Evaluated system
- Reference machine
- Used by SPEC, establish performance.
- Base metrics
- Compilation had strict guidelines
- Peak metric
- Compiler can optimize system performance
- Speed metrics
- Measure, computer takes time to execute compiled tests
- Rate metric: Computer do, to test amount of time
- Throughput test and capacity measure -- System is able to test same task, tests take advantages of processor.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.