MAT131 AIU Fall 2024 Lecture 3 PDF
Document Details
Alamein International University
2024
Dr. Mohammad Solayman
Tags
Summary
This lecture provides an overview of measures of dispersion in statistics, including range, interquartile range, and standard deviation. It explores how these measures describe the variability or spread of data points around a central tendency. Examples and solutions are included.
Full Transcript
Dr. Mohammad Solayman Lecture (3) Measures of Dispersion (Spread or Variability) Measures of Dispersion (variability or spread): describe the variability structure of a data set. That is, how observations are scattered away or...
Dr. Mohammad Solayman Lecture (3) Measures of Dispersion (Spread or Variability) Measures of Dispersion (variability or spread): describe the variability structure of a data set. That is, how observations are scattered away or close to each other. They describe whether the observations in a given data set are widely dispersed (large dispersion) or concentrated close to each other (small dispersion). For example: Center Dispersion Data set Mean Median Mode Range -5 5 5 15 5 5 5 20 - 5 5 30 5 5 5 50 20 - 5 5 50 5 5 5 90 40 The three data sets have the same value for the three measures of central tendency (mean, median, mode), but their deviations from the center are different. Note that all measures of dispersion are nonnegative. Also, a measure of dispersion takes on the value zero only if all observations in a data set have the same value. Measure of “absolute dispersion” are used to describe the variability structure of just “one” group, while measures of “relative dispersion or coefficients of variation” are used to compare the dispersion between “two or more groups”. 1- Measures of Absolute Dispersion 1- Range: The range is the simplest measure of dispersion to calculate. It is obtained by taking the difference between the largest and the smallest values in a data set. 𝑹 = 𝑳𝒂𝒓𝒈𝒆𝒔𝒕 − 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕 Dr. Mohammad Solayman Example (1) The following table gives the total areas in square miles of the four western south- central states of the United States. State Total Area (square miles) Arkansas 53,182 Louisiana 49,651 Oklahoma 69,903 Texas 267,277 Find the range of this data set. Solution: Range= Max – Min = 267,277 – 49,651 = 217,626 Using Minitab: 1- Enter the data. 2- Select Stat > Basic Statistics > Display Descriptive Statistics. 3- Select Statistics, and check Range. Dr. Mohammad Solayman 4- Result will appear in the Session window as follows: Some properties of the range: 1) Very simple to compute. 2) Its calculation is based on only two values, the largest and the smallest, and these two values may be outliers or extremes. 3) Sensitive to extreme values. 2- Inter-Quartile Range: The I.Q.R. measures the range of only 50% of the observations at the middle and it eliminates any information about the first and last quarters of the data. 𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 Example (2) the following table gives the 2016 market values of five international companies. Company Market value (billions of dollars) PepsiCo 75 Google 107 PetroChina 271 Johnson & Johnson 138 Intel 71 Find the IQR. Dr. Mohammad Solayman Solution: 1- Enter the data. 2- Select Stat > Basic Statistics > Display Descriptive Statistics. 3- Select Statistics, and check IQR. 4- Result will appear in the Session window as follows: Some properties of I.Q.R: 1) Depends on only 50% of the data. 2) Insensitive to extreme values or outliers. Dr. Mohammad Solayman Note that: Each of the range and the I.Q.R. depends on only two values, the smallest and the largest and, Q1 and Q3 respectively. A measure of dispersion, which makes a full use of the information provided by the data, will certainly be better. 3- Standard deviation and Variance: 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 = √𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 The standard deviation is the most used measure of dispersion. The value of the standard deviation tells how closely the values of a data set are clustered around the mean. The standard deviation is obtained by taking the positive square root of the variance. The variance calculated for population data is denoted by 𝝈𝟐 , and the variance calculated for sample data is denoted by 𝑺𝟐. Consequently, the standard deviation calculated data is denoted by 𝝈, and the standard deviation calculated for sample data is denoted by 𝑺. The formulas for the sample variance are: (∑ 𝒙)𝟐 𝟐 ∑(𝒙 − 𝒙 ̅) ∑𝒙 −𝟐 𝟐 𝑺 = = 𝒏 𝒏−𝟏 𝒏−𝟏 𝑺 = √𝑺𝟐 Note that: The values of the variance and the standard deviation are never negative. The measurement units of variance are always the square of the measurement units of the original data. Example (3) Find the sample standard deviation from the following data: 4, 9, 5 Dr. Mohammad Solayman Solution: 𝒙 ̅ 𝒙−𝒙 ̅)𝟐 (𝒙 − 𝒙 𝒙𝟐 4 𝟒 − 𝟔 = −𝟐 𝟒 16 9 𝟗−𝟔 = 𝟑 𝟗 81 5 𝟓 − 𝟔 = −𝟏 𝟏 25 ∑ 𝒙 = 𝟏𝟖 𝟎 𝟏𝟒 ∑ 𝒙𝟐 = 𝟏𝟐𝟐 Note that: ∑ 𝒙 𝟏𝟖 ̅= 𝒙 = =𝟔 𝒏 𝟑 The sample variance (∑ 𝒙)𝟐 ∑ 𝒙𝟐 − ̅)𝟐 ∑(𝒙 − 𝒙 𝟐 𝑺 = 𝒏 𝟐 𝑺 = 𝒏−𝟏 𝒏−𝟏 (𝟏𝟖)𝟐 𝟏𝟒 𝟏𝟐𝟐 − 𝟑 = 𝟏𝟐𝟐 − 𝟏𝟎𝟖 𝑺𝟐 = = 𝑺𝟐 = 𝟑−𝟏 𝟑−𝟏 𝟐 𝟏𝟒 𝟏𝟒 = =𝟕 = =𝟕 𝟐 𝟐 The sample standard deviation 𝑺 = √𝟕 = 𝟐. 𝟔𝟒𝟔 Using Minitab: 1- Enter the data. 2- Select Stat > Basic Statistics > Display descriptive statistics. Dr. Mohammad Solayman 3- Select Statistics, and check Variance and Standard deviation. 4- The results will appear in the session window as follows: Dr. Mohammad Solayman Example (4) The following data give the numbers of pieces of junk mail received by 10 families during the past month. 41 18 28 11 29 19 14 31 33 36 Find the range, IQR, variance, and standard deviation. Solution: 1- Enter the data. 2- Select Stat > Basic Statistics > Display descriptive statistics. 3- Select Statistics, and check Range, IQR, Variance and Standard deviation. Dr. Mohammad Solayman 4- The results will appear in the session window as follows: Some properties of S: 1) Depends on all values. 2) Sensitive to extreme values. Notes: A set of data has the same values, for example: 8, 8, 8, 8, 8, ▪ The mean = the median. ▪ No mode. ▪ All measures of dispersion = zero. If 𝒚 = 𝒂 + 𝒃. 𝒙, then ▪ 𝑹𝒂𝒏𝒈𝒆 (𝒚) = |𝒃|. 𝑹𝒂𝒏𝒈𝒆 (𝒙) ▪ 𝑰. 𝑸. 𝑹 (𝒚) = |𝒃|. 𝑰. 𝑸. 𝑹 (𝒙) ▪ 𝑺𝒅 (𝒚) = |𝒃|. 𝑺𝒅 (𝒙) Example (5) If 𝒚 = 𝟗 − 𝟐𝒙, and the variance (x)= 25, then 𝑺𝒅 (y)= |−𝟐| × 𝑺𝒅(𝒙) = 𝟐 × √𝟐𝟓 = 𝟐 × 𝟓 = 𝟏𝟎 Example: If 𝒚 = 𝟓 − 𝟐𝒙, and the variance of (x)= 10, then Variance of (y)= (−𝟐)𝟐 × 𝟏𝟎 = 𝟒 × 𝟏𝟎 = 𝟒𝟎 Dr. Mohammad Solayman 2- Measures of Relative Dispersion (Coefficients of Variation) Coefficient of variation "𝑪𝑽" 𝝈 𝑺 𝑰𝑸𝑹 or 𝑪𝑽 = |𝝁| × 𝑪𝑽 = |𝑿̅| × 𝟏𝟎𝟎 𝑪𝑽 = × 𝟏𝟎𝟎 |𝑸𝟐 | ✓ To compare the dispersion of two (or more) groups, we take the ratio of a measure of absolute dispersion to the corresponding measure of central tendency. ✓ Note that coefficients of variation are unitless. Example: If the mean annual salaries of country(A) is $20,000 with a standard deviation of $2000 while the mean annual salaries of country(B) is $4,000 with a standard deviation of $1000. Which country has less inequality in salaries distribution? Solution: A B 𝝁 20,000 4,000 𝝈 2,000 1,000 𝝈 𝟐𝟎𝟎𝟎 𝟏𝟎𝟎𝟎 𝑪𝑽𝟏 = × 𝟏𝟎𝟎 × 𝟏𝟎𝟎 = 𝟏𝟎% × 𝟏𝟎𝟎 = 𝟐𝟓% |𝛍| |𝟐𝟎𝟎𝟎𝟎| |𝟒𝟎𝟎𝟎| As country (A) is less varied than country (B), then country (A) has less inequality in salaries distribution than country (B). Dr. Mohammad Solayman