Lecture 10.pdf
Document Details
Uploaded by GratifiedSlideWhistle
Tags
Full Transcript
Research Methodology 20/12/1445 Dr. Ayman 1 Outline Performance Evaluation Introduction Common Mistakes Some of the material is based on Dr. Cliff Shaffer’s Notes 20/12/1445 Dr. Ayman 2 ...
Research Methodology 20/12/1445 Dr. Ayman 1 Outline Performance Evaluation Introduction Common Mistakes Some of the material is based on Dr. Cliff Shaffer’s Notes 20/12/1445 Dr. Ayman 2 Examples 1/2 Evaluate design alternatives Compare two or more computers, programs, algorithms Speed, memory, usability Determine optimum value of a parameter (tuning, optimization) Locate bottlenecks Characterize load Prediction of performance on future loads Determine number and size of components required 20/12/1445 Dr. Ayman 3 Examples 2/2 Which is the best sorting algorithm? What factors affect data structure visualizations? Code-tune a program Which interface design is better? What are the best parameter values for a biological model? 20/12/1445 Dr. Ayman 4 The Art of Performance Evaluation Throughput in Transactions per Second System Workload1 Workload2 A 20 10 B 10 20 How Does system A compare to system B? 20/12/1445 Dr. Ayman 5 Evaluation Issues System Hardware, software, network: Clear bounding for the “system" under study Techniques Measurement, simulation, analytical modeling Metrics Response time, transactions per second Workload: The requests a user gives to the system Statistical techniques Experimental design Maximize information, minimize number of experiments 20/12/1445 Dr. Ayman 6 Common Mistakes 1/5 No goals Each model is special purpose Performance problems are vague when first presented Biased goals → OUR system is better than THEIRS Unsystematic approach Analysis without understanding People want guidance, not models Incorrect performance metrics Want correct metrics, not easy ones Unrepresentative workload 20/12/1445 Dr. Ayman 7 Common Mistakes 2/5 Wrong evaluation technique Easy to become married to one approach Overlooking important parameters Ignoring significant factors Parameters that are varied in the study are called factors. There's no use comparing what can't be changed Inappropriate experimental design Inappropriate level of detail 20/12/1445 Dr. Ayman 8 Common Mistakes 3/5 No analysis Erroneous analysis No sensitivity analysis Measure the effect on changing a parameter Ignoring errors in input Improper treatment of outliers Outliers are values that are too high or too low compared to a majority of values Some should be ignored (can't happen) Some should be retained (key special cases) 20/12/1445 Dr. Ayman 9 Common Mistakes 4/5 Assuming no change in the future Ignoring variability Mean is of low significance in the face of high variability Too complex analysis Complex models are \interesting" and so get published and studied Real world use is simpler Decision makers prefer simpler models 20/12/1445 Dr. Ayman 10 Common Mistakes 5/5 Improper presentation of results The proper metric to measure the performance of an analyst is the number of analyses that helped decision makers (not the number of analyses performed) Ignoring social aspects Omitting assumptions and limitations 20/12/1445 Dr. Ayman 11 The Error of one-sided Hypothesis Consider the hypothesis “X performs better than Y". The danger is the following chain of reasoning: Could this hypothesis be true? I have evidence that the hypothesis might be true. Therefore it is true. What got ignored is any evidence that the hypothesis might not be (or is not) true. 20/12/1445 Dr. Ayman 12 A Systematic Approach 1. State goals and define the system boundaries 2. List services and outcomes 3. Select metrics 4. List parameters (System and Workload) 5. Select factors to study 6. Select evaluation technique 7. Select workload 8. Design experiments 9. Analyze and interpret data 10. Present results. Start over, if necessary! 20/12/1445 Dr. Ayman 13 Technique Selection Choices: Analytical Modeling, Simulation, and Measurement “Until validated, all evaluation results are suspect." Validate one of these approaches by comparing against another. Measurement results are just as susceptible to experimental errors and biases as the other two techniques. Criteria for technique selection Stage of analysis Time required Tools Accuracy Trade-off evaluation Cost Saleability 20/12/1445 Dr. Ayman 14 Common Performance Metrics 1/3 Response Time Interval between a user’s request and the system response ❑Simplistic definition assuming request and responses are instantaneous Definition 1 → Time between the user finishes the request and the system starts response Definition 2 → Time between the user finishes the request and the system completes response 20/12/1445 Dr. Ayman 15 Common Performance Metrics 2/3 Throughput: The rate (requests per unit of time) at which requests can be serviced by the system. Throughput generally increases as the load initially increases. Eventually it stops increasing, and might then decrease. Nominal capacity is maximum achievable throughput under ideal workload conditions (bandwidth for computer networks) Usable capacity is maximum throughput achievable without violating a limit on response time. Efficiency is ratio of usable to nominal capacity. 20/12/1445 Dr. Ayman 16 Common Performance Metrics 3/3 Utilization Fraction of time the resource is busy servicing requests Ratio of busy time to total elapsed time over a given period Bottleneck: Component with highest utilization Improving this component often gives highest payoff Reliability Probability of errors Mean time between errors Availability Uptime and downtime Mean uptime (Mean time to Failure) Cost/performance ratio 20/12/1445 Dr. Ayman 17 Workloads A workload is the requests made by users of the system under study. A test workload is any workload used in performance studies A real workload is one observed on a real system. It cannot be repeated. A synthetic workload is a reproduction of a real workload to be applied to the tested system Examples (for CPU performance) Addition instruction Instruction mixes Kernels Synthetic programs Application benchmarks 20/12/1445 Dr. Ayman 18 Selecting Workloads 1. Determine the services for the SUT (System Under Test) View the system as a service provider Component Under Study (CUS)? 2. Select the desired level of detail 3. Confirm that the workload is representative 4. Is the workload still valid? A real-world workload is not repeatable Most workloads are models of real service requests 20/12/1445 Dr. Ayman 19 Selecting Workloads-Level of Detail Select the desired level of detail Most frequent request Frequency of request types Time-stamped sequence of requests (trace) Average resource demand ❑ Might be necessary to specify complete probability distribution Distribution of resource demands 20/12/1445 Dr. Ayman 20 Selecting Workloads-Representativeness A test workload representative of real application Test workload and real application should match in the following Arrival rate Resource demands Resource usage profile 20/12/1445 Dr. Ayman 21 Selecting Workloads-Other Factors Loading Level A workload might exercise a system to its ❑ full capacity (best case) ❑ Beyond its capacity (worst case) ❑ Load level observed in real workload (typical case) Impact of external components Repeatability 20/12/1445 Dr. Ayman 22 Some Workload Characterization Techniques 1/3 Averaging Uses arithmetic mean Alternatives? Specifying Dispersion Variance Markov Models Clustering 20/12/1445 Dr. Ayman 23 Some Workload Characterization Techniques 2/3 Markov Models Assume the next request depends only on the last request Next system state depends only on current system state Transition matrices ❑ Ex: typical distribution for some system is about 4 small packets followed by a single large packet ❑ Random chance: probability of small is always.8, large is always.2. ❑ Markov model: Small follows small.75, large follows small.25. In contrast, small always follows large. 20/12/1445 Dr. Ayman 24 Some Workload Characterization Techniques 3/3 Clustering To separate a population into groups with similar characteristics. minimize the within-group variance while maximizing the between-group variance Select representatives to simplify further processing 20/12/1445 Dr. Ayman 25 Introduction to Simulation If system not available, a simulation model provides an easy way to predict the performance or compare several alternatives If system is available, a simulation model may be preferred over measurements (Why?) Simulation models might fail! Produce no useful results Produce misleading results 20/12/1445 Dr. Ayman 26 Common Mistakes in Simulation Inappropriate level of detail Improper implementation language Unverified models Invalid models (may not represent the real system correctly) Improperly handled initial conditions Too short simulations Poor random-number generators Improper selection of seeds 20/12/1445 Dr. Ayman 27 Terminology 1/3 State variables variables whose values define the state of the system e.g., length of job queue in a CPU scheduling simulation Event a change in the system state Continuous-time and discrete-time models Continuous → system state defined at all times (e.g., CPU scheduling) Discrete → system state defined at particular instants of time 20/12/1445 Dr. Ayman 28 Terminology 2/3 Continuous-state and discrete-state models Nature of state variables. Time spent by students studying a certain subject (can take an infinite number of values) Queue length in a CPU scheduling simulation can only assume integer values (discrete state model) Discrete-state → discrete-event model Continuity of time does not imply continuity of state Deterministic and probabilistic models Output of a model can be predicted with certainty → deterministic Different results on repetition for same set of input parameters → probabilistic 20/12/1445 Dr. Ayman 29 Terminology 3/3 Static and dynamic models System state changes with time? Open and closed models Input external to the model and independent of it → open No external input → closed Stable and unstable models Steady state reached which is independent of time → stable 20/12/1445 Dr. Ayman 30 Programming Language Selection Simulation language General-purpose language Extension of a general-purpose language Extensions as a collection of routines to handle tasks commonly required in simulations Simulation package inflexibility 20/12/1445 Dr. Ayman 31 Types of Simulation 1/3 Emulation A simulation using hardware or firmware e.g., Terminal emulator, processor emulator. Monte Carlo Simulation Static simulation or one without a time axis Model probabilistic phenomenon that do not change characteristics with time Evaluate non-probabilistic expressions using probabilistic models 20/12/1445 Dr. Ayman 32 Types of Simulation 2/3 Trace-driven simulation Use a trace as input A trace is a time-ordered record of events on a real system e.g., A trace of page reference patterns of key programs can be input to compare different memory management schemes Advantages: (among others) credibility, easy validation, and accurate workload Disadvantages: (among others) complexity and being a single point of validation 20/12/1445 Dr. Ayman 33 Types of Simulation 3/3 Discrete-event simulation Components ❑Event scheduler ❑simulation clock and a time advancing mechanism ✓Unit time versus event-driven approaches ❑System state variables ❑Event routines ❑Input routines, report routines, initialization routines, and trace routines ❑Dynamic memory management 20/12/1445 Dr. Ayman 34 Analysis of Simulation Results Model Verification Techniques Model Validation Techniques Transient Removal Stopping Criteria 20/12/1445 Dr. Ayman 35 Model Verification Techniques Top-down modular design Anti-bugging Structured walk-through Deterministic models Run simplified cases Trace: time-ordered list of events and associated variables On-Line graphic displays Continuity test Degeneracy tests Consistency tests Seed independence 20/12/1445 Dr. Ayman 36 Model Validation Techniques Validate three key aspects of the model Assumptions Input parameter values and distributions Output values and conclusions Each of three aspects maybe subjected to a validity test by comparing with that obtained from possible sources Expert intuition Real system measurements Theoretical results 20/12/1445 Dr. Ayman 37 Transient Removal 1/5 Steady-state performance is of interest What constitutes the transient state? Methods for transient removal are heuristic Long runs Proper initialization Truncation Initial data deletion Moving average of independent replications Batch means 20/12/1445 Dr. Ayman 38 Transient Removal 2/5 Truncation Variability during steady state less than that during transient state Measure variability in terms of range- min and max of observations Observations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 10, 9, 10, 11, 10, 9, 10, 11, 10, 9 Given a sample of n observations ❑Ignore first l observations ❑Calculate min and max of n-l observations ❑Repeat until (l+1)th observation is neither min or max of remaining observations 20/12/1445 Dr. Ayman 39 Transient Removal 3/5 Initial data deletion Average does not change much as observations are deleted Randomness in observations causes averages to change slightly even during steady state Average across replications (complete run with no change in input parameter values, seed value is different) ❑Get a mean trajectory across replications ❑Get overall mean ❑Delete the first l observations (init l to 1), and get overall mean for n-l values ❑Compute relative change in overall mean ❑Determine the knee where the relative change graph stabilizes 20/12/1445 Dr. Ayman 40 Transient Removal 4/5 Moving average of independent replications Similar to initial data deletion but computes mean over a moving time interval window instead of the overall mean Get a mean trajectory across replications Plot a trajectory of moving average of successive 2k+1 values (init k to 1) Repeat with k= 2,3,… until plot sufficiently smooth Find the knee of the plot 20/12/1445 Dr. Ayman 41 Transient Removal 5/5 Batch means Run a very long simulation and later divide into several parts of equal duration (batch) Study the variance of batch means as a function of batch size A long run of N observations divided into m batches of size n each (start with n=2 for example) ❑For each batch, compute a batch mean ❑Compute overall mean ❑Compute variance of batch means ❑Increase n and repeat ❑Plot variance as a function of n 20/12/1445 Dr. Ayman 42 Stopping Criteria: Variance Estimation 1/3 Simulation could be run until confidence interval for the mean response narrows to a desired width Can obtain variance of sample mean = variance of observations/n Valid if observations are independent Observations in most simulations are not independent To correctly compute the variance of the mean of correlated observations Independent replications Batch means 20/12/1445 Dr. Ayman 43 Stopping Criteria: Variance Estimation 2/3 Independent replications Means of independent replications are independent though observations in a single replication are correlated Conduct m replications of size n+n0 (n0 is length of transient phase) ❑Compute a mean for each replication ❑Compute an overall mean for all replications ❑Calculate the variance of replicate means ❑Calculate confidence interval for mean response ❑Will discard mn0 initial observations 20/12/1445 Dr. Ayman 44 Stopping Criteria: Variance Estimation 3/3 Batch means Run a long simulation run, discard initial transient interval, and divide remaining observations into several batches Compute means for each batch Compute overall mean Calculate variance of batch means Calculate confidence interval for mean response Will discard n0 observations (less waste) Calculate covariance of successive batch means to find correct batch size n Covariance should be small compared to the variance 20/12/1445 Dr. Ayman 45