Chapter 2 - Performance Issues (2024-2025) PDF
Document Details
Uploaded by CheerySugilite6436
null
Manal Abdulelah Areqi
Tags
Related
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-24-101-28-39.pdf
- University of Glasgow CSC1104 Computer Architecture Lecture 1 PDF
- MID TERM REVIEWER PDF
- Introduction to Computer Memory and Storage PDF
- High-Performance Computing Lecture Notes PDF
- Computer Evolution & Performance: PDF
Summary
This document discusses the various issues related to performance in computer systems. It covers concepts such as microprocessor speed, and different approaches to improving performance. It also goes into detail about techniques like pipelining, superscalar execution, branch prediction, speculative execution and data flow analysis.
Full Transcript
Manal Abdulelah Areqi [email protected] Designing for Performance The cost of computer systems continues to drop dramatically, while the performance and capacity of those systems continue to rise equally dramatically Desktop applications that require the great power of to...
Manal Abdulelah Areqi [email protected] Designing for Performance The cost of computer systems continues to drop dramatically, while the performance and capacity of those systems continue to rise equally dramatically Desktop applications that require the great power of today’s microprocessor- based systems include: Image processing Three-dimensional rendering Speech recognition Videoconferencing Multimedia authoring Voice and video annotation of files Simulation modeling Businesses are relying on increasingly powerful servers to handle transaction and database processing and to support massive client/server networks that have replaced the huge mainframe computer centers of yesteryear. Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-transaction-rate applications for a broad spectrum of clients. Performance Microprocessor Speed The development of computers continues. Due to the application of Moore's Law, chip makers can release a new generation of chips every three years - with four times the number of transistors. This leads to an increase in speed. Techniques built into contemporary processors to increase performance include Superscalar Branch Speculative Data flow Pipelining execution prediction execution analysis Microprocessor Speed Pipelining Pipelining is the process of sending multiple data packets serially without waiting for the previous acknowledgment. This technique is beneficial when the amount of data to be transferred is very large, and we send the data by dividing them into various parts. It facilitates parallelism in execution at the hardware level. “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently. Pipelining does not reduce the execution time of individual instructions but reduces the overall execution time required for a program. Microprocessor Speed Pipelining Pipelined vs Non-Pipelined Instruction Execution Microprocessor Speed Pipelining The functionalities of pipelining in the computer networks: High Performance Efficient use of resources Time Efficiency Fast Data Delivery Reduces the process waiting-time Microprocessor Speed Superscalar execution The ability to issue multiple independent instructions in parallel in every processor clock cycle. Multiple parallel pipelines are used. Microprocessor Speed Superscalar v Superpipelined Microprocessor Speed Branch prediction CPUs initially executed instructions one by one as they came in, but the introduction of pipelining meant that branching instructions could slow the processor down significantly as the processor has to wait for the conditional jump to be executed. The processor looks ahead in the instruction code fetched from memory and predicts which branches, or groups of instructions, are likely to be processed next. The purpose of the branch predictor is to improve the flow in the instruction pipeline. The prediction is executed and the results are kept temporarily, and if it is later detected that the guess was wrong, the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, causing a delay. Microprocessor Speed Speculative execution Using branch prediction and data flow analysis, some processors speculatively execute instructions ahead of their actual appearance in the program execution, holding the results in temporary locations, and keeping execution engines as busy as possible. Data flow analysis The processor analyzes which instructions are dependent on each other’s results, or data, to create an optimized schedule of instructions. Performance Performance Balance One difficulty in designing an efficient system is that It is necessary to different components adjust the operate at different speeds. organization and For example, DRAM is architecture to generally much slower than the processor compensate for this mismatch. This is why CPU The overall balance computer in the system is benchmarks are more important used to compare than the raw system performance of any one component. performance. Performance Performance Balance To overcome the imbalance between memory and processor speeds there are several approaches Increase the number of bits that Change the DRAM interface to are retrieved at one time by make it more efficient by making DRAMs “wider” rather including a cache or other than “deeper” and by using buffering scheme on the DRAM wide bus data paths – 8, 16, 32, chip. and 64-bit systems. Increase the interconnect Reduce the frequency of bandwidth between processors memory access by and memory by using higher- incorporating increasingly speed buses and a hierarchy of complex and efficient cache buses to buffer and structure structures between the data flow. processor and main memory(memory hierarchy). Performance Improvements in Chip Organization and Architecture Increase hardware speed of processor Fundamentally due to shrinking logic gate size More gates, packed more tightly, increasing clock rate Propagation time for signals reduced Increase size and speed of caches Dedicating part of processor chip Cache access times drop significantly Change processor organization and architecture Increase effective speed of instruction execution Parallelism Problems with Clock Speed and Login Density Traditionally, the dominant factor in performance gains has been increases in clock speed due and logic density. However, as clock speed and logic density increase, a number of obstacles become more significant. Power As the density of logic and the clock speed on a chip increase, so does the power density. The difficulty of dissipating the heat generated on high-density, high-speed chips is becoming a serious design issue. Problems with Clock Speed and Login Density RC delay R = resistance — the difficulty an electrical current has in passing through a conducting material. C = capacitance — the degree to which an insulating material holds a charge. RC delay = the delay in signal speed through the circuit wiring as a result of these two effects. As components on the chip decrease in size, the wire interconnects become thinner, increasing resistance. Also, the wires are closer together, increasing capacitance. Memory latency Memory speeds lag processor speeds New approach to improving performance With all of the difficulties cited, designers have turned to a fundamentally new approach to improving performance: Multicore: multiple processors on the same chip, with a large shared cache. Many Integrated Core (MIC) Graphics Processing Unit (GPU) The use of multiple processors on the same chip provides the Multicore potential to increase performance without increasing the clock rate Strategy is to use two simpler processors on the chip rather than one more complex processor With more than one healer, the number of caches increased As caches became larger it made performance sense to create two and then three levels of cache on a chip Many Integrated Core (MIC) Graphics Processing Unit (GPU) MIC GPU A large number of cores per A chip with multiple general- chip. purpose processors plus graphics Leap in performance as well processing units (GPUs) and as the challenges in specialized cores for video developing software to processing and other tasks. exploit such a large number Traditionally found on a plug-in of cores. graphics card, it is used to The multicore and MIC encode and render 2D and 3D strategy involves a graphics as well as a process homogeneous collection of video. general purpose processors Used as vector processors for a on a single chip. variety of applications that require repetitive computations. Basic Measures of Computer Performance Performance is one of the key parameters to consider, along with cost, size, security, reliability, and, in some cases, power consumption. Raw speed is far less important than how a processor performs when executing a given application. Traditional measures of processor speed: Clock Speed: oThe speed of a processor is dictated by the pulse frequency produced by a system clock. oClock speed is measured in cycles per second (Hertz) Instruction Execution Rate: oThe processor will have many different instructions it can perform and each will take a fixed number of cycles.