Performance Analysis of Computer Systems and Networks Lecture 2 PDF
Document Details
Uploaded by AdmiringAtlanta
October 6 University
2010
Raj Jain
Tags
Summary
This document is a lecture on computer systems performance analysis and computer networks, focusing on techniques for performance evaluation, modeling, and related metrics. It includes examples and outlines of related concepts covered.
Full Transcript
Performance Analysis of Computer Systems and Networks Lecture 2 26/09/2010 1 Text book Raj Jain, " The Art of computer Systems Performance Analysis: Techniques for experimental Design, Measuremen...
Performance Analysis of Computer Systems and Networks Lecture 2 26/09/2010 1 Text book Raj Jain, " The Art of computer Systems Performance Analysis: Techniques for experimental Design, Measurement, Simulation, and Modeling, Wiley, 1991. Most of the slides are obtained directly from the author (Prof. Raj Jain). 2 1 Performance Evaluation ? Performance evaluation:... applies certain techniques (measurements, analytical/ simulation modeling)... to existing or envisioned systems (in our case: computer systems, communication networks etc.)... to assess performance measures of interest (delay, response times, throughput, jitter, processing times, etc.) Performance is one of the most important non-functional aspects of any (hardware, software) system. 3 Basic Terms System: Any collection of hardware, software, and firmware. Metrics (Measures): Criteria used to evaluate the performance of the system. Workloads: The requests made by the users of the system. 4 2 CMPE-474 Objectives Specifying performance requirements Evaluating design alternatives Comparing two or more systems Determining the optimal value of a parameter (system tuning) Finding the performance bottleneck (bottleneck identification) Characterizing the load on the system (workload characterization) Determining the number and sizes of components (capacity planning) Predicting the performance at future loads (forecasting). 5 Typical Reasons for a Performance Study find performance bottlenecks in existing systems and develop improvements (should i buy more memory or a faster processor?) capacity planning: how much resources should i spend to obtain some desired level of service quality? For example: how much memory should my router have to avoid too many packet losses? performance comparison of systems / algorithms / protocols: given two protocols, which one is better in which respect? your Internet service provider guarantees you a certain minimum bandwidth. Is he true to you? Mostly economic reasons (investment decisions) 6 3 Performance Modeling Many performance evaluation studies require a model of the system under study A model is a simplified and purpose-oriented view on a system (a model is itself a system), but captures only the most essential aspects regarding the systems performance It is easier to learn about the modeling substrate (probability theory, simulation methodology, etc.) than about the modeling process itself To learn modeling is mostly a matter of experience and practice and always requires a thorough understanding of the systems to be modeled 7 Outline Overview of Performance evaluation: introduction and fundamentals (Example I) Measurement Techniques and Tools (Example II) Probability Theory / Statistics Refresher (Example III) Queueing Theory / modeling examples (Example IV) 8 4 Example I What performance metrics should be used to compare the performance of the following systems: – Two disk drives? – Two transaction-processing systems? – Two packet-retransmission algorithms? 9 Example II Which type of monitor (software or hardware) would be more suitable for measuring each of the following quantities: – Number of Instructions executed by a processor? – Degree of multiprogramming on a timesharing system? – Response time of packets on a network? 10 5 Example III The number of packets lost on two links was measured for four file sizes as shown below: Which link is better? 11 Example IV The average response time of a database system is three seconds. During a one-minute observation interval, the idle time on the system was ten seconds. Using a queueing model for the system, determine the following: – System utilization – Average service time per query – Number of queries completed during the observation interval – Average number of jobs in the system – Probability of number of jobs in the system being greater than 10 – 90-percentile response time – 90-percentile waiting time 12 6 The Art of Performance Evaluation Given the same data, two analysts may interpret them differently. Example1: The throughputs of two systems A and B in transactions per second is as follows: 13 Possible Solutions Compare the average: Conclusion: The two systems are equally good. Compare the ratio with system B as the base Conclusion: System A is better than B. 14 7 Solutions (Cont) Compare the ratio with system A as the base Conclusion: System B is better than A. Similar games in: Selection of workload, Measuring the systems, Presenting the results. Ratio games! 15 Example2 three different computer systems A, B, C are to be compared for the execution times of different programs P1 and P2. measurements give the following results: which computer is the best? standard answer: it depends!! 16 8 Example2 (cont) if the measure is the total execution time, then C is best if in a real workload, program P1 runs 500 times, and program P2 runs 5 times, the weighted total time is appropriate: total time for A = 500 1 sec + 5 1000 sec = 5500 sec total time for B = 500 10 sec + 5 100 sec = 5500 sec total time for C = 500 20 sec + 5 20 sec = 10100 sec we could “normalize" the results by selecting one system as a “baseline system“ A, or B, or C could be the winner! 17 Basic Components: System & Model Features of a system for performance evaluation purposes 18 9 Elements of a System (1) Workload: specifies the arrival of requests which the system is supposed to serve; examples: - arrival of packets to a communication network - arrival of programs to a computer system - arrival of instructions to a processor - arrival of read/write requests to a database Workload characteristics: - request type (e.g. TCP packet vs. UDP packet vs.... ) -request size / service time / resource consumption (e.g. packet lengths) - inter-arrival times of requests - (statistical) dependence between requests 19 Elements of a System (2) configuration or parameters: in general all inputs influencing the systems operation; examples: - maximum # of retransmissions in ARQ schemes - time slice length in a multitasking operating system factors: a subset of the parameters, which are purposely varied during a performance study to assess their influence 20 10 Elements of a System (3) error model: species the types and frequencies of failures of system components or communication channels: the system generates an output, some parts of which are presented to the user; it can also have an internal state, which determines its operations together with the input there could be feedback: some parts of the output serve as input to obtain desired performance measures, the output or the observable system state may have to be processed further, by some function “f” 21 Classifications of Systems (1) There are a number of classifications of systems: [A. Willig: TKN Group] static vs. dynamic systems: - in a static system the output depends only on the current input, but not on past inputs or the current time - a dynamic system might depend on older inputs (memory) or on the current time. A system needs internal state to have memory time-varying vs. time-invariant systems: - time-invariant: the output might depend on the current and past inputs, but not on the current time - in a time-varying system this restriction is removed 22 11 Classifications of Systems (2) open systems vs. closed systems: - open systems have an “outside world" which is not controllable and which might generate workloads, failures, or changes in configuration - in a closed system everything is under control stochastic systems vs. deterministic systems: - in a stochastic system at least one part of the input or internal state is a random variable / random process outputs are also random. almost all “real" systems are stochastic systems 23 Classifications of Systems (3) continuous time systems (CTS) vs. discrete time systems (DTS): - in CTS state changes might happen at any time. - in DTS there is at most a countable number of state changes at certain prescribed instants. we refer to discrete time systems also as discrete-event systems (DES) 24 12 Classifications of Systems (4) continuous state systems vs. discrete state systems: - in a continuous state system the state space (= set of possible system states) is uncountable - in a discrete-state the state space is finite or countably infinite Computer and communication systems are mostly viewed as dynamic and stochastic discrete state / discrete time systems 25 QoS (Quality of Service) ultimately, users are interested in their applications to run with “satisfactorily" or “good" QoS. QoS is subjective and application-dependent: - frame rate / level of detail / resolution of picture - sound / speech quality - network bandwidth and latency - etc. sometimes additional requirements may be posed, e.g.: to guarantee a certain end-to-end delay in a network, you must not exceed some given sending rate; example: -an ISDN B-channel guarantees 64 kbps; anything in excess is dropped 26 13 Typical Performance Measures (Metrics) system-oriented vs. application-oriented measures: - system-oriented measures are independent of specific applications - application-oriented measures might depend in complex ways on system-oriented measures example: video conferencing over Internet: - application-oriented measures: frame rate, resolution, color depth, SNR, absence of distortions, turnaround times, lip synchronization, … - system/network-oriented measures: throughput, delay, jitter, losses, blocking probabilities,.. 27 Common Performance Metrics (1) Response time and Reaction time 28 14 Response Time (Cont) 29 Common Performance Metrics (2) Throughput: Rate (requests per unit of time) Examples: – Jobs per second – Requests per second – Millions of Instructions Per Second (MIPS) – Millions of Floating Point Operations Per Second (MFLOPS) – Packets Per Second (PPS) – Bits per second (bps) – Transactions Per Second (TPS) Nominal Capacity: Maximum achievable throughput under ideal workload conditions. E.g., bandwidth in bits per second. The response-time may be too high at nominal capacity! Usable capacity: Maximum throughput achievable without exceeding a pre-specified response-time limit. 30 15 Common Performance Metrics (3) Efficiency: Ratio of usable capacity to nominal capacity. Or, the ratio of the performance of an n-processor system to that of a one-processor system is its efficiency. Utilization: The fraction of time the resource is busy servicing requests. Average fraction used for memory. 31 Common Performance Metrics (4) Reliability: – Probability of errors – Mean time between errors (error-free seconds). Availability: – Mean Time to Failure (MTTF) – Mean Time to Repair (MTTR) – MTTF/(MTTF+MTTR) 32 16 Measures for Communication Networks delay: - end-to-end (one-way) vs. round-trip (two-way) delay of packets - processing and queueing delays in network elements - medium access delay in MAC protocols jitter : delay variation throughput: number of requests / packets which go through the network per unit time goodput: similar to throughput, but without overhead (e.g. control packets, retransmissions are excluded) 33 Measures for Communication Networks loss rate: fraction of packets which are lost or erroneous utilization: fraction of time a communications link / resource is busy service providers love high utilizations, but customers often see large delays in highly utilized / loaded systems! blocking probability: prob. of getting no (connection- oriented) service sometimes you get no line when you pick up the phone! dropping probability (in cellular systems): probability that an ongoing call gets lost upon handover 34 17 Measures for Computer Systems desktop systems (single user): -response times, graphics performance (e.g. level of detail) server systems (multiple users): - throughput - reliability - availability - utilization embedded systems: - energy consumption - memory consumption - system utilization 35 Selecting Performance Metrics 36 18 Case Study: Two Congestion Control Algorithms Service: Send packets from specified source to specified destination in order. Possible outcomes: – Some packets are delivered in order to the destination. – Some packets are delivered out-of-order to the destination. – Some packets are delivered more than once (duplicates). – Some packets are dropped on the way (lost packets). 37 Case Study (Cont) Performance: For packets delivered in order, – Time-rate-resource Response time to deliver the packets. Throughput: the number of packets per unit of time. Processor time per packet on the source. Processor time per packet on the destination. Processor time per packet on the intermediate nodes. – Variability of the response time Retransmissions Response time: the delay inside the network 38 19 Case Study (Cont) – Out-of-order packets consume buffers Probability of out-of-order arrivals. – Duplicate packets consume the network resources Probability of duplicate packets – Lost packets require retransmission Probability of lost packets – Too much loss cause disconnection Probability of disconnect 39 Main Performance Evaluation Techniques (1) when the system under study already exists and is accessible, we can make measurements when the system does not exist or is too difficult to deal with (e.g. the system is the whole Internet), a performance model must be developed: -analytical models use mathematical concepts and notations. - simulation models are computer programs. both kinds of performance models restrict to the most important aspects and leave out many details 40 20 Main Performance Evaluation Techniques (2) Do not trust the results of a simulation model until they have been validated by analytical modeling or measurements Do not trust the results of an analytical model until they have been validated by a simulation model or measurements Do not trust the results of a measurement until they have been validated by simulation or analytical modeling 41 Select Evaluation Technique Depends upon time, resources and desired level of accuracy Analytic modeling – Quick, less accurate Simulation – Medium effort, medium accuracy Measurement – Typical most effort, most accurate 42 21 Advantages of Measurements saleability: you can always claim that your numbers are “real and not based on some “suspicious", "“arbitrary" or “unjustified" model you do not need to find “reasonable parameters" for intermediate elements as in model-based studies (e.g.: what could be a “reasonable “ queueing delay for a router in a VoIP application?) 43 Disadvantages of Measurements sometimes: hard to interpret, unreproducible, substantial time/effort needed to set up. you have to consider all details; in model-based techniques you can neglect some. workload selection can be tricky (how to find “representative“ workloads?). 44 22 Analytical Modeling uses mathematical notions and models describing certain aspects of a system. for modeling of computer systems and communication networks often probabilistic models are used: - task arrival times to a computer are random - user inputs are random - packet arrival times to a network are random - errors on communication links are random -... 45 Adv. and Disadv. of Analytical Modeling disadvantages: - many systems are too complex for analytical modeling, requiring simplifications / approximations - solid background in mathematics / probability theory is needed. advantages: - thorough understanding of the system required - can often be quickly set up and evaluated - even as approximations, analytical models often provide qualitative insights into systems 46 23 Simulation Modeling (1) a simulation model: - is a computer program, written in a general- purpose or specific simulation language - implements the most important aspects of the system under study in a simplified manner - allows for a greater level of detail than analytical modeling random input data produces random output data proper statistical evaluation needed 47 Simulation Modeling (2) accuracy of results often specified in terms of confidence intervals high statistical accuracy (small intervals) needs long simulation times higher variability of output leads to longer runtimes to reach a given accuracy target simulation runtimes are an issue! 48 24 Adv. and Disadv. of Simulation Modeling disadvantages: - sometimes long simulation times are needed - model setup and validation/verification time can be significant advantages: - simulation results are often much better reproducible than measurement results - all parameters are under control (at least in 49 principle) Summary Comparison of Techniques 50 25 Common Mistakes (1) Undefined Goals – There is no such thing as a general model – Describe goals and then design experiments – (Don’t shoot and then draw target) Biased Goals – Don’t show YOUR system better than HERS – (Performance analysis is like a jury) Unrepresentative Workload – Should be representative of how system will work under various conditions – Ex: large and small packets? Don’t test with only large or only small 51 Common Mistakes (2) Wrong Evaluation Technique – Use most appropriate: analytical model, simulation, measurement Inappropriate Level of Detail – Can have too much! Ex: modeling disk – Can have too little! Ex: analytic model for congested router No Sensitivity Analysis – Analysis is evidence and not fact – Need to determine how sensitive results are to settings 52 26 Common Mistakes (3) Improper Presentation of Results – It is not the number of graphs, but the number of graphs that help make decisions Omitting Assumptions and Limitations – Ex: may assume most traffic TCP, whereas some links may have significant UDP traffic – May lead to applying results where assumptions do not hold 53 A Systematic Approach 1. State goals and define boundaries 2. Select performance metrics 3. List system and workload parameters 4. Select factors and values 5. Select evaluation techniques 6. Select workload 7. Design experiments 8. Analyze and interpret the data 9. Present the results. Repeat. 54 27 Case Study Consider remote pipes (rpipe) versus remote procedure calls (rpc) – rpc is like procedure call but procedure is handled on remote server Client caller blocks until return – rpipe is like pipe but server gets output on remote machine Client process can continue, non-blocking Goal: study the performance of applications using rpipes to similar applications using rpcs 55 System Definition Client and Server and Network Key component is “channel ”, either a rpipe or an rpc – Only the subset of the client and server that handle channel are part of the system Client Network Server 56 28 Services There are a variety of services that can happen over a rpipe or rpc Choose data transfer as a common one, with data being a typical result of most client-server interactions Classify amount of data as either large or small Thus, two services: – Small data transfer – Large data transfer 57 Metrics Limit metrics to correct operation only (no failure or errors) Study service rate and resources consumed A) elapsed time per call B) maximum call rate per unit time C) Local CPU time per call D) Remote CPU time per call E) Number of bytes sent per call 58 29 Parameters System Workload Speed of CPUs Time between calls Local Number and sizes – Remote – of parameters Network – of results – Speed Type of channel – Reliability (retrans) – rpc Operating system – Rpipe overhead Other loads – For interfacing with – On CPUs channels/network – On network 59 Key Factors Type of channel – rpipe or rpc Speed of network – Choose short (LAN) across country (WAN) Size of parameters – Small or larger Number of calls – 11 values: 8, 16, 32 …1024 All other parameters are fixed 30 Evaluation Technique Since there are prototypes, use measurement Use analytic modeling based on measured data for values outside the scope of the experiments conducted 61 Workload Synthetic program generated specified channel requests Will also monitor resources consumed and log results 62 31 Experimental Design Full factorial (all possible combinations of factors) 2 channels, 2 network speeds, 2 sizes, 11 numbers of calls 2 x 2 x 2 x 11 = 88 experiments 63 Data Analysis Analysis of variance will be used to quantify the first three factors – Are they different? Regression will be used to quantify the effects of n consecutive calls – Performance is linear? Exponential? 64 32