Econ 214 Lecture 4 - Measures of Variability PDF
Document Details
Uploaded by ExquisiteTrombone
University of Ghana
Abel Fumey (PhD)
Tags
Summary
This document provides lecture notes for an Econ 214 course covering measures of variability in statistics. It discusses concepts like range, standard deviation, and variance, along with their applications. The notes also introduce percentiles, quartiles, and deciles, and elaborate on skewness.
Full Transcript
ECON 214 ELEMENTS OF STATISTICS FOR ECONOMISTS Session 4 – Measures of Variability LECTURER: ABEL FUMEY (PhD) Session Overview This session is a continuation of the descriptive statistics. It follows from the measure of central tendency (Mean, Mode and The session wil...
ECON 214 ELEMENTS OF STATISTICS FOR ECONOMISTS Session 4 – Measures of Variability LECTURER: ABEL FUMEY (PhD) Session Overview This session is a continuation of the descriptive statistics. It follows from the measure of central tendency (Mean, Mode and The session will examine the various measure of variability among data sets. Some of the measure of variability to be examine include Range, Standard deviation and Variance. Finally the session will end position of a number which will examine range, interquartile range, and percentiles The Concept of Dispersion Dispersion means – variety, diversity , amount of variation between scores etc. The greater the dispersion of a variable, the greater the differences between the scores The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be In general, the more spread out a distribution is, the larger the measure of dispersion will be Eg. Large cities have more diversity e.g. Accra has more diversity than Koforidua The taller normal curve has less dispersion compared to the flatter normal curve Measures of Dispersion There are four main measures of dispersion: The range The semi-interquartile range (SIR) Variance / standard deviation Coefficient of Variation 4 The Range The range is defined as the difference between the largest score in the set of data and the smallest score in the set of data, X L – XS What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 The largest score (XL) is 9; the smallest score (XS) is 1; the range is XL - XS =9-1=8 5 When To Use the Range The range is used when you have ordinal data or you are presenting your results to people with little or no knowledge of statistics The range is rarely used in scientific work as it is fairly insensitive It depends on only two scores in the set of data, X L and XS Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9 6 The Semi-Interquartile Range The semi-interquartile range (or SIR) is defined as the difference of the first and third quartiles divided by two The first quartile is the 25th percentile The third quartile is the 75th percentile SIR = (Q3 - Q1) / 2 The SIR is often used with skewed data as it is insensitive to the extreme scores 7 SIR Example What is the SIR for the data to the right? 2 Determine the first 4 25th %tile quartile 6 5 is the first quartile 8 Determine the third 10 quartile 12 25 is the third quartile 14 SIR = (Q3 - Q1) / 2 = (25 – 20 5) / 2 = 10 75th %tile 30 8 60 Variance and Standard Deviation The most widely used measure of dispersion Calculated for interval or ratio variables Variance Variance is defined as the average of the square deviations: (X − )2 2 = N What does this formula mean?? First, it says to subtract the mean from each of the scores This difference is called a deviate or a deviation score The deviate tells us how far a given score is from the typical, or average, score Thus, the deviate is a measure of dispersion for a given score Square the deviations to avoid negative numbers 10 How do you interpret Variance? Variance is the mean of the squared deviation scores The larger the variance is, the more the scores deviate, on average, away from the mean The smaller the variance is, the less the scores deviate, on average, from the mean 11 Standard Deviation When the deviated scores are squared in variance, their unit of measure is squared as well E.g. If people’s weights are measured in pounds, then the variance of the weights would be expressed in pounds2 (or squared pounds) Since squared units of measure are often awkward to deal with, the square root of variance is often used instead The standard deviation is the square root of variance Standard deviation = variance Variance = standard deviation 2 12 Example: Calculating Variance X µ X- (X-)2 9 7 2 4 (X − ) 2 = 2 8 7 1 1 N 6 7 -1 1 12 = 5 7 -2 4 6 8 7 1 1 =2 6 7 -1 1 = 42 =0 = 12 13 Coefficient of Variation (CV) This is also known as relative standard deviation A statistical measure of the dispersion of data points around the mean The metric is commonly used to compare the data dispersion between distinct series of data. It is calculated as the ratio between the standard deviation and the mean. And usually expressed as a percentage. 14 Coefficient of Variation (CV) A study of the test scores in management principles and the years of service of the employees enrolled in the course resulted in the following statistics: Mean test score = 200; standard deviation = 40. Mean years of service = 20; standard deviation = 5. CV for test scores = (40/200)*100 = 20 percent CV for years of service = (5/20)*100 = 25 percent Hence although test scores had higher standard deviation, we cannot conclude that it has higher variation is its distribution compared to years of service. The CV shows that there is rather higher variation in years of service. CV- Illustration Pizza Price (GHS) Pizza Price (USD 6 1 12 2 The following are prices of a 18 3 standard small size pizza in 11 various pizza joints in Accra. 24 4 30 5 36 6 42 7 What do you notice about the 48 8 variance and the standard 54 9 deviation? 60 10 66 11 Average 36 6 Variance 396 11 Std. Dev 19.90 3.32 Application of CV- Finance Fred wants to find a new investment for his portfolio. He is looking for a safe investment that provides stable returns. He considers the following options for investment: Stocks: Fred was offered stock of ABC Corp. It is a mature company with strong operational and financial performance. The volatility of the stock is 10% and the expected return is 14%. ETFs: Another option is an Exchange Traded Fund (ETF) which tracks the performance of the S&P 500 index. The ETF offers an expected return of 13% with a volatility of 7%. Bonds: Bonds with excellent credit ratings offer an expected return of 3% with 2% volatility. By determining the coefficient of variation of different investment securities an investor identifies the risk-to-reward ratio of each security and develops an investment decision. Generally, an investor prefers a security with a lower coefficient (of variation) because it provides the most optimal risk-to-reward ratio with low volatility but high returns. Based on the calculations, which investment would Fred choose and why? Percentiles, Quartiles and Deciles The median divides the data arranged in ascending order into two equal halves; it is also the value such that 50% of observations are below and 50% above. Percentiles divide a set of observations into 100 equal parts. A percentile is the value such that P% of observations are below and (100-P)% are above this value. Percentiles, Quartiles and Deciles For example, the 10th percentile is the value such that 10% of observations are below this value and 90% are above. We saw earlier that we can locate the position of the median using the formula; (n+1)/2 We could write it generally as (n+1)(P/100) and since in this case P = 50, we get (n+1)(50/100) = (n+1)/2. Percentiles, Quartiles and Deciles Percentiles, Quartiles and Deciles Quartiles divide a set of observations into 4 equal parts. Hence there are 3 quartiles; 1st quartile (which is same as 25th percentile); 2nd quartile (which is same as 50th percentile or the median); and 3 rd quartile (which is same as 75th percentile). So we just calculated the 1st quartile (= 25th percentile). Percentiles, Quartiles and Deciles Similarly, deciles divide a set of observations into 10 equal parts; so there are 9 deciles. The 1st decile is the same as the 10th percentile and the 5th decile is the same as the 50th percentile or the median. In the same vein, each data set has 99 percentiles, thus dividing the data set into 100 equal parts. The percentile formula described on the previous slide is applied in calculating quartiles as well as deciles. Percentiles, Quartiles and Deciles Assuming we wish to calculate the 90th, then Lp = (12+1)(90/100) = 11.7 So the 90th percentile is 70% of the distance between the 11th and 12th observations, which gives 95 + 0.7(96-95) = 95.7 Percentiles, Quartiles and Deciles Percentiles, Quartiles and Deciles Percentiles, Quartiles and Deciles Percentiles, Quartiles and Deciles The Inter-quartile range is the distance between the third quartile Q3 and the first quartile Q1. Inter-quartile range = third quartile - first quartile = Q3 - Q1 The percentile range is the distance between two stated percentiles. The 10-to-90 percentile range is the distance between the 10th and 90th percentiles. Measure of Skew Skew is a measure of symmetry in the distribution of scores A distribution is symmetric if it looks the same to the left and to the right of the center point. Normal (skew = 0) Positive Skew/ skewed right ie.. Right tail is Negative Skew/ skewed longer than left tail left ie… left tail is longer than right tail 30 Measure of Skew The following formula can be used to determine skew: ( ) X− X 3 3 = N (X − X ) s 2 N 31 Measure of Skew If s3 < 0, then the distribution has a negative skew If s3 > 0 then the distribution has a positive skew If s3 = 0 then the distribution is symmetrical The more different s3 is from 0, the greater the skew in the distribution 32