Computer System Structure & Performance PDF
Document Details
Uploaded by AngelicTheme
Tags
Related
- High-Performance Computing Hardware Architecture and Benchmarking PDF
- University of Glasgow CSC1104 Computer Architecture Lecture 1 PDF
- Computer Evolution & Performance: PDF
- Computer Organization and Architecture - Chapter 2 - Performance Concepts PDF
- Chapter 2 HPC PDF
- High-Performance Computing (HPC) Architectures - L1.4
Summary
This document presents a course on the structure and performance of computer systems. It discusses various computer types, architecture categories, and performance parameters. The goal is to provide an understanding of optimizing computer systems. Key topics include performance requirements, optimal computer architecture, and benchmarking.
Full Transcript
Structure of Computer Systems Course 2 Computer performance and optimality 1 Performance requirements small execution time short reaction time to external events high memory capacity and speed many input/output facilities (interfaces)...
Structure of Computer Systems Course 2 Computer performance and optimality 1 Performance requirements small execution time short reaction time to external events high memory capacity and speed many input/output facilities (interfaces) reach development facilities small dimensions and specific shapes predictability, safety and fault tolerance small costs: absolute and relative 2 Optimal computer architecture A compromise between performance parameters Depends on the purpose and type of the computer Computer types (based on purpose): General purpose computers high performance computers (HPC) personal computers mobile computers Computers for dedicated purposes scientific computing military computers (safety critical and highly reliable) industrial control and automation (embedded systems) measurement and analysis (e.g. medical devices, intelligent sensors) Old classification: mainframes – e.g. IBM 360/370, Felix 256 minicomputers – PDP11, SUN station, Independent, Coral microcomputers – microprocessor-based computers (e.g. PC, home computers) 3 Optimal computer architecture Classification based on architecture: single processor computer multiprocessor computers: parallel systems multi-core processors symmetric and asymmetric parallel systems distributed systems personal computers and network communication for a specific (common) purpose GRIDs Clouds: computer as a service storage as a service platform as a service software as a service 4 Optimal computer architecture Optimal performance parameters for different type of computers: HPC – high performance computers: highly parallel computers – 1.024 – 10.000.000 cores or processors usage: scientific computing (physics, astronomy, bioinformatics, chemistry), simulation (fluid’s flow, weather), cryptography speed: 1-20.000 Tflops memory capacity: 1-700 TBytes communication: InfiniBand (2-300 Gbs), Cray Gemini power consumption: 10KW- 10MW (Mariselu power station ~200MW) price: hard to tell see top 500 supercomputers ( http://www.top500.org/list/2012/06/100/) no 1 Fugaku/Japonia, 7,630,848 cores, 30 MW no. 2 Summit, IBM power system/SUA, 2,414,592 cores no. 3 SIERRA - IBM POWER SYSTEM/ SUA, 1,572,480 cores No. 4. Sunway TaihuLight/China, 10,649,600 cores 5 HPC – high performance computers 1+1=3 ? Where is that bit? HPC at CERN architecture: GRID organization: 3 tires at least 100.000 processors in 32 countries Blue Gene - IBM serves 5000 scientists in UTCN: 128 quad-core architecture: parallel processors, 512 cores 65,536 dual-core processors 6 360 teraflop peak speed HPC – high performance computers CG-UTCN – Centrul GRID al UTCN 64 processor boards 128 quad-core processors, 512 cores 1024 virtual processors (hyper-threading) storage: 12 Tbytes price: 2.000.000 RON 7 Optimal computer architecture Optimal performance parameters for different type of computers PC - personal computers: single or multi-core systems – 1-8 cores (1-2 processors) usage: engineering, accounting, administration, entertainment, document processing, communication speed: 1-200 Gflops memory capacity: 1-16 GBytes (internal), 0,5-1TBytes (external) communication: Ethernet (0,1-1 Gbs) power consumption: 400-800 W price: 500-1000 USD dimensional types: desktop, laptop, tablet, hand-held 8 Optimal computer architecture Optimal performance parameters for different type of computers Mobile devices: single or multi-core systems – 1-4 cores (1 processors) usage: communication, entertainment, place-holder for PC speed: 20-600 Mflops memory capacity: 0.5-2 GBytes (internal), communication: WiFi, Bluetoth (10-100 Mbs) power consumption: limited to the accumulator’s capacity price: 1- 500 USD dimensional limitations 9 Optimal computer architecture Optimal performance parameters for different type of computers Dedicated and embedded systems single processor systems – microcontroller, DSP (digital signal processor), MSP (mixed signal processor) usage: automation, measurement, sensors, medical devices speed: 1-20 MIPS memory capacity: 128-512 bytes (data), 0-32Kbytes (program), 1- 2Kbyte EEPROM communication: serial RS232, CAN, I2C (300-9600 bits/s) power consumption: very low (battery powered), with low power modes (1μA-10mA) price: 1- 20 USD dimension: very small packages (8, 16, 28, 40 pins) 10 Measuring the performance of a computer – benchmark programs Definition 1 (wikipedia): a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. Definition 2: a method of comparing the performance of various computer systems Measuring and assessing the performance of a system is not a trivial task: some computers/CPUs perform better for some tests and worse for others (e.g. good results for image processing but less good for database applications) performance should be a weighted average of a number of specific tests 11 Benchmark programs Real programs Component Benchmarks/ micro- word processing software benchmarks user's application software programs designed to measure Micro-benchmarks performance of a computer's basic components Designed to measure the automatic detection of computer's performance of a very small and hardware parameters like number of specific piece of code. registers, cache size, memory latency Kernel Synthetic Benchmarks contains codes that perform a Procedure for programming synthetic specific basic operation benchmark: normally abstracted from actual take statistics of all types of program operations from many application popular kernel: Livermore loops programs (every loop is a mathematical get proportion of each operation operation) write program based on the Linpack benchmark (contains proportion above basic linear algebra subroutines) Types of Synthetic Benchmark are: results are represented in Dhrystone – integer arithmetic MFLOPS Whetstone – integer and floating point arithmetic 12 Benchmark programs Other benchmarks I/O benchmarks Database benchmarks: to measure the throughput and response times of database management systems (DBMS') Parallel benchmarks: used on machines with multiple cores, processors or systems consisting of multiple machines Issues regarding good benchmarking: some processor architectures were designed for best benchmarking results, but with less overall performance many benchmarks concentrate on computations and less on other aspects such as: memory access time, input/output operation’s delays benchmarks are not relevant for wide distributed systems there is no unique measure of “performance” in computing 13 Computing the benchmark results Arithmetical mean benchmark n 1 BAM n ti i 1 where: ti – execution time of program “i” from the set of n test programs Weighted arithmetic mean n 1 BAM n wi * ti i 1 where: wi – the weight of program “i” from the set indicating its frequency of execution wi chosen so that on a reference computer the execution time of each benchmark (program) is equal => NORMALIZATION 14 14 Computing the benchmark results Geometrical mean n BGM ti i 1 Normalized Geometrical mean n BGM w i * ti i 1 15 Computing the benchmark results Effects of normalization: the result depends on the machine used as a reference: A, B and C Normalized to C t on A t on t on C Normalized to A Normalized to for A,B and C (s) B (s) (s) for A,B and C B for A,B and C Program 1 1 10 100 1 10 100 0.1 1 10 0.01 0.1 1 Program 2 1000 100 10000 1 0,1 10 10 1 100 0.1 0.01 1 Arithm. mean 500.5 55 550 1 5,05 55 5.05 1 55 0,055 0,055 1 Geom. mean 31.6 31.6 316.22 1 1 31,6 1 1 31.6 0,031 0,031 1 16 Conclusions of the previous table: for arithmetic mean: for geometric mean: if the reference is computer A: if the reference is computer A: A is as fast as A A is as fast as A B is ~5 times slower than A B is as fast as A C is 55 times slower than A C is ~32 times slower than A if the reference is computer B: if the reference is computer B: A is ~5 times slower than B A is as fast as B B is as fast as B B is as fast as B C is 55 times slower than B C is ~32 times slower than A if the reference is computer C if the reference is computer C A is 18 times faster than C A is ~32 times faster than C B is 18 times faster than C B is ~32 times faster than C C is as fast as C C is as fast as C 17 Computing the benchmark results Advantages of geometric mean: It is independent of the running times of the individual programs It does not matter which machine is used for normalization Disadvantage of geometric mean: It does not predict execution time 18 Benchmark programs Goal: to write a package of programs that best measure the performance of a computer system Solutions: real programs – that solve different classical problems synthetic programs – no practical result, but preserve the frequency of instructions measured in real cases 19 20 Features of a good benchmark Specify a certain load "workload" A clear definition of the operations that are performed A numerical representation of performance. Common values include: Time - For example, seconds to complete the workload. Produces at least one Flow - the volume of work completed per unit of time, metric for example, jobs per hour. If it is reproducibleIf repeted it will generate the same results If it is portableIf it can be executed on different systesm If the obtained values can be compared between If it can be compared systems Verifies the Verifies if a result is generated and the result is correct correctness of the “The results may be as good as you want if the operations corrtectness of the results is not verified” A clear definition of necessary and prohibited hardware Has execution rules software, optimization, adjustment and procedures. 21 Examples of benchmark programs Whetstone synthetic program Published in 1976 by the National Physical Laboratory (NPL), Great Britain preserves the frequency of instructions in scientific and engineering applications written in Algol and later in Fortran and Pascal floating point instructions have an important role Dhrystone synthetic program Published in 1984 preserves the frequency of instructions in system programming (e.g. operating system components) using Ada and C programming language frequency measurements are published no emphasis on FP operations Issues with synthetic benchmarks: does not reflect well the needs of a real application some computer architectures were optimized for best performance regarding synthetic benchmarks, but with less performance on real applications 22 Examples of benchmark programs Kernel benchmark programs based on time-critical components of real applications focused on measuring the performance of supercomputers running scientific applications examples: Livermore Loops: benchmark for parallel computers 24 “do” loops caring out different mathematical operations (e.g. solve linear systems, hydrodynamics matrix operations, etc.) Linpack: performs numerical linear algebra 23 Examples of benchmark programs SPEC - Standard Performance Evaluation Corporation a non-profit international organization focused on developing standard tools for measuring the performance of computer systems www.spec.org develops standard sets of benchmarks based on real applications benchmark sets contain source codes there are also tools for generating performance reports 24 Examples of benchmark programs Evolution of SPEC benchmark standards: SPEC89 The first benchmark set, released in 1989 benchmark value: geometric mean of execution times normalized to the VAX‑11/780 computer SPEC92 contains different benchmarks for integer (SPECINT) and floating‑point instructions (SPECFP) CPU95, CPU2000 Current version: CPU2006 Next version: CPUv6 SPEC consists of three interest groups Open Systems Group (OSG): Component and system level benchmarks High Performance Group (HPG): Benchmarks for high-performance computing Graphics Performance Characterization Group (GPCG): Benchmarks for graphics subsystems 25 Examples of benchmark programs Details for CPU 2017: Contains 43 tests organized in 4 collections : SPECspeed 2017 Integer, SPECspeed 2017 Floating Point, SPECrate 2017 Integer and SPECrate 2017 Floating Point. it can measure: speed: SPEC ratio - the time to execute one copy of the benchmark rate: SPEC rate - the number of jobs that can be executed in a given time (e.g. 24h) results are combined with geometric mean normalization is made on a Sun Microsystems Ultra 5/10 workstation, with a SPARC processor; for this system the result of the measurement is 1 26 Details for CPU2006 Examples of integer benchmarks 401.bzip2: compression program based on bzip2 403.gcc: C compiler based on gcc 3.2 445.gobmk: plays the game of go 458.sjeng: chess program 462.libquantum: library for the simulation of a quantum computer 473.astar: path-finding library for 2D maps (A* algorithm) 27 Details for CPU2006 Example floating-point benchmarks 435.gromacs: simulates the Newtonian equations of motion for particles 444.namd: simulates bio-molecular systems 459.GemsFDTD: solves the Maxwell equations in 3D in the time domain 465.tonto: quantum chemistry package 481.wrf: weather forecasting 482.sphinx3: speech recognition look on the Internet for the results of your processor 28