MAT131 AIU Fall 2024 Lecture 4 (Modified) PDF
Document Details
Uploaded by UnderstandableVuvuzela
Alamein International University
2024
AIU
Dr. Mohammad Solayman
Tags
Summary
This document is a lecture on standardized values and measures of skewness in statistics. It includes examples and calculations related to comparing different data sets. A job opportunity example is included.
Full Transcript
Dr. Mohammad Solayman Lecture (4) The standardized values (𝒁) In certain occasions we need to compare two (or more) values belonging to two different data sets to determine the relative position of each value inside its own group. An i...
Dr. Mohammad Solayman Lecture (4) The standardized values (𝒁) In certain occasions we need to compare two (or more) values belonging to two different data sets to determine the relative position of each value inside its own group. An important fact about standardized units is that their average is always zero and their standard deviation is always one. 𝐯𝐚𝐥𝐮𝐞 − 𝐦𝐞𝐚𝐧 𝒙−𝒙̅ 𝒁= = 𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 𝑺 Example: You may be offered two job opportunities in two different countries. The annual salary of the first is $36000 where the average annual salaries is $30,000 with standard deviation $4000. The annual salary of the second is $28,000 where the average annual salaries is $22000 with standard deviation $2000. Which offer is relatively better? Solution: To compare between the two offers we need first to eliminate the differences between the mean annual salaries and their standard deviations. This can be done by computing what we call the standardized score of each offer and then, compare the resulting values. First country Second country Annual salary $36,000 $28,000 Average annual salary $30,000 $22,000 Standard deviation $4,000 $2,000 𝟑𝟔𝟎𝟎𝟎 − 𝟑𝟎𝟎𝟎𝟎 𝟐𝟖𝟎𝟎𝟎 − 𝟐𝟐𝟎𝟎𝟎 𝒁= 𝒁= 𝒁 𝟒𝟎𝟎𝟎 𝟐𝟎𝟎𝟎 = 𝟏. 𝟓 𝒖𝒏𝒊𝒕𝒔 = 𝟑. 𝟎 𝒖𝒏𝒊𝒕𝒔 Thus, the second offer may be considered better. Dr. Mohammad Solayman Measures of skewness (𝜷) If one tail of the distribution curve is longer than the other, the distribution is said to be asymmetric or skewed. When the right tail is the longer one, the distribution is said to be skewed to the right (positively skewed). If the left tail is longer, the distribution is said to be skewed to the left (negatively skewed). To study the symmetry of the distribution of a data set we can check the histogram, the dot plot or the box plot introduced in the next section. Also, we can compute the skewness coefficient using one of the following two formulas: (𝑸𝟑 − 𝑸𝟐 ) − (𝑸𝟐 − 𝑸𝟏 ) 𝒙−𝒙̅ 𝟑 𝜷= ∑( ) (𝑸𝟑 − 𝑸𝟏 ) 𝜷= 𝑺 𝒏−𝟏 𝜷 = −𝒗𝒆 𝜷=𝟎 𝜷 = +𝒗𝒆 Skewed to left Symmetric Skewed to right (𝑸𝟑 − 𝑸𝟐 ) < (𝑸𝟐 − 𝑸𝟏 ) (𝑸𝟑 − 𝑸𝟐 ) = (𝑸𝟐 − 𝑸𝟏 ) (𝑸𝟑 − 𝑸𝟐 ) > (𝑸𝟐 − 𝑸𝟏 ) 𝑴𝒆𝒂𝒏 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑜𝑑𝑒 𝑴𝒆𝒂𝒏 = 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑴𝒐𝒅𝒆 𝑴𝒆𝒂𝒏 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑜𝑑𝑒 Skewed to Left Normal skewe to Right 40 40 40 30 30 30 Frequency Frequency Frequency 20 20 20 10 10 10 0 0 0 20 40 60 80 0 20 40 60 80 class class class Dr. Mohammad Solayman ❖ Using Minitab: Example (1): The following data is the H.W time (in minutes) per night for a group of 25 second- year students: 12 27 30 32 37 39 41 42 42 42 43 43 45 47 47 50 52 55 58 62 70 78 81 85 105 Calculate the coefficient of skewness and comment. Solution: 1- We sort the data. 2- We find Q1, Q2, Q3. 𝑛+1 25+1 Position of Q1= = = 6.5 4 4 Q1= X6 +0.5 (X7-X6) = 39+ 0.5 (41-39) = 40 𝑛+1 25+1 Position of Q2= = = 13 2 2 Q2= X13 =45 3(𝑛+1) 3(25+1) Position of Q3= = = 19.5 4 4 Q3 = X19 + 0.5 (X20 – X19) =58 +0.5 (62-58) =60 3- We calculate the coefficient of skeweness. (𝑸𝟑 − 𝑸𝟐 ) − (𝑸𝟐 − 𝑸𝟏 ) (𝟔𝟎 − 𝟒𝟓) − (𝟒𝟓 − 𝟒𝟎) 𝜷= = = 𝟎. 𝟓 (𝑸𝟑 − 𝑸𝟏 ) 𝟔𝟎 − 𝟒𝟎 Using Minitab: 1- Enter the data. 2- Select Stat > Basic Statistics > Display Descriptive Statistics. 3- Select Statistics, and check First quartile, Median, and Third quartile. Dr. Mohammad Solayman 4- The results will appear in the session window as follows: 5- Then we use these values to calculate the coefficient of skeweness: (𝑸𝟑 −𝑸𝟐 )−(𝑸𝟐 −𝑸𝟏 ) (𝟔𝟎−𝟒𝟓)−(𝟒𝟓−𝟒𝟎) 𝜷= = =0.5 (𝑸𝟑 −𝑸𝟏 ) 𝟔𝟎−𝟒𝟎 Comment: Since the coefficient of skeweness is positive, then the distribution is skewed to the right. Dr. Mohammad Solayman The Box- Whiskers Plot A box plot (also called box- whiskers plot) is a figure that is built up using 5 values. Yet, it reflects all the important information and features of the data distribution. To draw a box plot, 1- Find the five values: 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏 𝑸𝟏 𝑸𝟐 𝑸𝟑 𝑳𝒂𝒓𝒈𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏 2- Calculate two limits (fences): 𝐔𝐩𝐩𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (fence) = 𝑸𝟑 + 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (fence) = 𝑸𝟏 − 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) 3- Draw a vertical line (from Smallest to Largest observation) to represent the scale of measurement. 4- Draw a box extending from Q1 to Q3. 5- Draw a line across the box at the median. 6- The whiskers are lines that extend from the top and bottom of the box to: 𝐓𝐡𝐞 𝐮𝐩𝐩𝐞𝐫 “𝐰𝐡𝐢𝐬𝐤𝐞𝐫” 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐞𝐝 𝐭𝐨: 𝑴𝒊𝒏. ( 𝑳𝒂𝒓𝒈𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏, 𝑼𝒑𝒑𝒆𝒓 𝑳𝒊𝒎𝒊𝒕) 𝐓𝐡𝐞 𝐥𝐨𝐰𝐞𝐫 “𝐰𝐡𝐢𝐬𝐤𝐞𝐫” 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐞𝐝 𝐭𝐨: 𝑴𝒂𝒙. ( 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏, 𝑳𝒐𝒘𝒆𝒓 𝑳𝒊𝒎𝒊𝒕) 7- Outliers are points outside the lower and upper limits, plotted with asterisks (*). A value (𝒚) is said to be an outlier if it's value: 𝒚 < 𝐿𝑜𝑤𝑒𝑟 𝐿𝑖𝑚𝑖𝑡 𝒚 > 𝑈𝑝𝑝𝑒𝑟 𝐿𝑖𝑚𝑖𝑡 Dr. Mohammad Solayman Largest Outlier * Min (Largest, Upper) 𝑸𝟑 𝑸𝟐 𝑸𝟏 Smallest Max (Smallest, Lower) Example (1): Draw the Box Plot from the following ordered data: 1 2 3 4 5 6 7 8 9 10 11 50 60 70 72 73 75 88 89 90 105 140 Smallest Largest 𝑸𝟏 𝑸𝟐 𝑸𝟑 Solution: 11+1 11+1 3(11+1) 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄1 ) = = 3, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄2 ) = =6 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄3 ) = =9 4 2 4 ✓ 𝑄1 = 70, 𝑄2 = 75 , 𝑄3 = 90 and 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 90 − 70 = 20. ✓ Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 90 + 1.5(90 − 70) = 90 + 30 = 𝟏𝟐𝟎 ✓ Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1 ) = 70 − 1.5(90 − 70) = 70 − 30 = 𝟒𝟎 ✓ Draw a vertical line (from 𝟓𝟎 to 𝟏𝟒𝟎) to represent the scale of measurement. ✓ Draw a box using 𝑄1, 𝑄2 , 𝑄3 ✓ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (140, 120)= 𝟏𝟐𝟎 ✓ The lower “whisker” connected to: Max. ( Smallest, Lower Limit)= Max (50, 40)= 𝟓𝟎 Dr. Mohammad Solayman 140 130 120 110 100 90 80 70 60 50 Notes: The distribution is skewed to the right (or positively skewed) because (𝑄3 − 𝑄2 ) > (𝑄2 − 𝑄1 ) 140 is an outlier, because 140 > 120 (upper limit). Example (2): Draw the Box Plot from the following ordered data: 1 2 3 4 5 6 7 8 9 10 11 20 60 70 72 73 85 88 89 90 105 110 Smallest Largest 𝑸𝟏 𝑸𝟐 𝑸𝟑 Solution: 11+1 11+1 3(11+1) 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄1 ) = = 3, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄2 ) = =6 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄3 ) = =9 4 2 4 ✓ 𝑄1 = 70, 𝑄2 = 85 , 𝑄3 = 90 and 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 90 − 70 = 20. ✓ Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 90 + 1.5(90 − 70) = 90 + 30 = 𝟏𝟐𝟎 ✓ Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1 ) = 70 − 1.5(90 − 70) = 70 − 30 = 𝟒𝟎 ✓ Draw a vertical line (from 𝟐𝟎 to 𝟏𝟏𝟎) to represent the scale of measurement. ✓ Draw a box using 𝑄1, 𝑄2 , 𝑄3 ✓ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (110, 120)= 𝟏𝟏𝟎 ✓ The lower “whisker” connected to: Max. ( Smallest, Lower Limit)= Max (20, 40)= 𝟒𝟎 Dr. Mohammad Solayman 110 100 90 80 70 60 50 40 30 20 Notes: The distribution is skewed to the left (or negatively skewed) because (𝑄3 − 𝑄2 ) < (𝑄2 − 𝑄1 ) 20 is an outlier, because 20 < 40 (upper limit). Example (3): Draw the Box Plot from the following ordered data: 1 2 3 4 5 6 7 8 9 10 11 50 60 70 72 73 80 88 89 90 105 110 Smallest Largest 𝑸𝟏 𝑸𝟐 𝑸𝟑 Solution: 11+1 11+1 3(11+1) 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄1 ) = = 3, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄2 ) = =6 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄3 ) = =9 4 2 4 ✓ 𝑄1 = 70, 𝑄2 = 75 , 𝑄3 = 90 and 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 90 − 70 = 20. ✓ Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 90 + 1.5(90 − 70) = 90 + 30 = 𝟏𝟐𝟎 ✓ Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1 ) = 70 − 1.5(90 − 70) = 70 − 30 = 𝟒𝟎 ✓ Draw a vertical line (from 𝟓𝟎 to 𝟏𝟏𝟎) to represent the scale of measurement. ✓ Draw a box using 𝑄1, 𝑄2 , 𝑄3 ✓ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (110, 120)= 𝟏𝟏𝟎 ✓ The lower “whisker” connected to: Max. ( Smallest, Lower Limit)= Max (50, 40)= 𝟓𝟎 Dr. Mohammad Solayman 110 100 90 80 70 60 50 Notes: The distribution is symmetric because (𝑄3 − 𝑄2 ) = (𝑄2 − 𝑄1 ) There are no outliers. Example (4): Draw the Box Plot from the following ordered data: 1 2 3 4 5 6 7 8 9 10 11 10 60 70 72 73 80 88 89 90 105 130 Smallest Largest 𝑸𝟏 𝑸𝟐 𝑸𝟑 Solution: 11+1 11+1 3(11+1) 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄1 ) = = 3, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄2 ) = =6 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑄3 ) = =9 4 2 4 ✓ 𝑄1 = 70, 𝑄2 = 75 , 𝑄3 = 90 and 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 90 − 70 = 20. ✓ Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 90 + 1.5(90 − 70) = 90 + 30 = 𝟏𝟐𝟎 ✓ Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1 ) = 70 − 1.5(90 − 70) = 70 − 30 = 𝟒𝟎 ✓ Draw a vertical line (from 𝟏𝟎 to 𝟏𝟑𝟎) to represent the scale of measurement. ✓ Draw a box using 𝑄1, 𝑄2 , 𝑄3 ✓ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (130, 120)= 𝟏𝟐𝟎 ✓ The lower “whisker” connected to: Max. ( Smallest, Lower Limit)= Max (10, 40)= 𝟒𝟎 Dr. Mohammad Solayman 130 120 110 100 90 80 70 60 50 40 30 20 10 Notes: The distribution is symmetric because (𝑄3 − 𝑄2 ) = (𝑄2 − 𝑄1 ) 130 is an outlier, because 130 > 120 (upper limit). 10 is an outlier, because 10 < 40 (lower limit). ❖Using Minitab: Example Draw the Box Plot from the following ordered data using Minitab. 20 60 70 72 73 85 88 89 90 105 110 Solution: 1- Enter the data. Dr. Mohammad Solayman 2- Calculate the five values: minimum, maximum, Q1, Q2, Q3. To find the sample statistics we select Stat> Basic Statistics > Display Descriptive Statistics. And in the dialogue box obtained, enter the name of the column where the data are stored in the Variables box as shown below Then select Statistics button and choose the summary measures you want to calculate, and click OK. The result will appear in the session window as follows: 3- To draw the Box plot, we select Graph >Boxplot >Simple and click OK. Dr. Mohammad Solayman The result will be: Comment: The distribution is skewed to the left (or negatively skewed) because (𝑄3 − 𝑄2 ) < (𝑄2 − 𝑄1 ) 20 is an outlier, because 20 < 40 (upper limit). Dr. Mohammad Solayman Exercises Exercise (1): The following is the homework time (in minutes) per night for a group of 25 second-year students: 1 2 3 4 5 6 7 8 9 10 11 12 13 12 27 30 32 37 39 41 42 42 42 43 43 45 14 15 16 17 18 19 20 21 22 23 24 25 47 47 50 52 55 58 62 70 78 81 85 105 1) Find 𝑸𝟏 , 𝑸𝟐 , and 𝑸𝟑. 2) Draw the Box-Plot. 3) Is the value "5" an outlier? Why? 4) Determine the direction of the skewness (without calculation). 5) Calculate the coefficient of skewness, and comment. Exercise (2): Given the following information below: Smallest First quartile Median Third quartile Largest 2 16 20 24 32 1) Draw the box-Whiskers plot. 2) Calculate the coefficient of skewness, and comment. 3) Can you find the mean? Why? 4) Calculate a coefficient of variation. Exercise (3): Given the following measures on the Statistics scores for a group of 25 second year's students: The smallest 3 The first The third The largest 3 The median values quartile quartile values 30, 33, 37 40 45 60 72, 80, 98 1) Draw the Box-Plot. 2) Find a measure of central tendency and another of homogeneity. 3) Is the value "15" an outlier? Why? 4) Determine the direction of the skewness (without calculation). 5) Calculate the coefficient of skewness, and comment. 6) Is it a difficult exam? Why? Dr. Mohammad Solayman Some Comprehensive Examples Example (1): An experiment was conducted to compare the efficiency of males and females in executing a certain task. The following Box-Whiskers plots represent the distribution of the execution time (in minutes) of a random sample from each gender: Boxplot of Female; Male 25 23 21 19 17 Time 15 13 11 9 7 5 Female Male a) Use the figure to find the values of the measures shown in the following table: Smallest Largest Q1 Median Q3 observation observation Female Male b) Describe the type of skewness in each group. Justify your answer. c) Determine the outlying observations. Justify your answer. d) Compare the general level of observations of the males and females. e) Compare the dispersion (or homogeneity) of the two groups. Justify your answer. f) Which gender is more efficient in executing the task? Justify your answer. g) Explain why the upper whisker of the males’ plot is not extended to reach 25. Dr. Mohammad Solayman h) Explain why we can use the figure to determine the female mean execution time, while we cannot determine it for the male group. i) It is more achievable for a female or a male to execute the task in just one minute? j) Find the percentage of females who execute the task in more than 10 minutes. Solution: a) Min Q1 Median Q3 Max Female 5 7 10 13 15 Male 9 11 12 15 25 b) Female Male Symmetric Skewed to right (Q3- Q2 ) = (Q2- Q1) (Q3- Q2 ) > (Q2- Q1) 3=3 3>1 c) Female Male 𝐔𝐩𝐩𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 = 𝑸𝟑 + 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝐔𝐩𝐩𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 = 𝑸𝟑 + 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟏𝟑 + 𝟏. 𝟓(𝟔) = 𝟏𝟑 + 𝟗 = 𝟐𝟐 𝟏𝟓 + 𝟏. 𝟓(𝟒) = 𝟏𝟓 + 𝟔 = 𝟐𝟏 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 = 𝑸𝟏 − 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 = 𝑸𝟏 − 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟕 − 𝟏. 𝟓(𝟔) = 𝟕 − 𝟗 = −𝟐 𝟏𝟏 − 𝟏. 𝟓(𝟒) = 𝟏𝟏 − 𝟔 = 𝟓 15 not > 22 25 > 21. 25 is an outlier. 5 not < -2 9 not < 5 No outliers. 9 is not outliers. d) Female Male The median 𝑸𝟐 =10, less The median 𝑸𝟐 =12 than Dr. Mohammad Solayman e) Female Male 𝑰𝑸𝑹 𝟔 𝑪𝑽 = × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟔𝟎% 𝑰𝑸𝑹 𝟒 𝑸𝟐 𝟏𝟎 𝑪𝑽 = × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟑𝟑. 𝟑% 𝑸𝟐 𝟏𝟐 More dispersion. f) Female is more efficient, because the median of time for female less than the median of male. g) The upper “whisker” connected to: Min. (Largest, Upper Limit) =Min. (25, 21)= 𝟐𝟏 h) If the distribution is symmetric, then: Mean = Median = 10 (female), While the distribution of male is skewed to right, and then : Mean > Median. i) Female Male 1 is not less than -2 (lower limit), so 1 is not outlier. Then a female is more 1 35, 40 is an outlier. 40 > 37, 40 is an outlier. 40 not > 45, 40 not outlier. Dr. Mohammad Solayman (6) Program (3), because 40 is not outlier. (7) Program (1) Program (2) Program (3) Symmetric Skewed to right Skewed to left (Q3- Q2 ) = (Q2- Q1) (Q3 - Q2 ) > (Q2- Q1) (Q3 - Q2 ) < (Q2- Q1) 5=5 6 > 4 6 < 8 (8) Program (3), skewed to left. Example (3) The following table summarizes some measures on the total returns in three common stocks over a 30-year period. Stock (1) Stock (2) Stock (3) First quartile 7 8 10 Median 11 10 16 Third quartile 15 16 18 Required: (1) Compare the general level of the total returns in the three stocks. (2) Which stock is the most risky? (3) For each stock, determine whether a total return of 29 may be considered an outlier or not. (4) If some risky investor tries to achieve a total return of 29, which stock do you recommend him to invest in? (5) For each stock, determine the direction of skewness in the total returns. (6) Depending on your answer in (5), which stock do you recommend investors to invest in? (7) For each stock, determine the relationship between the mode and the median of the total returns. Dr. Mohammad Solayman Solution: (1) Stock (1) Stock (2) Stock (3) The median 𝐐𝟐 =16 The median 𝐐𝟐 =11 The median 𝐐𝟐 =10 (highest) (2) 𝐈𝐐𝐑 𝐈𝐐𝐑 𝐈𝐐𝐑 𝐂𝐕 = × 𝟏𝟎𝟎 = 𝐂𝐕 = × 𝟏𝟎𝟎 = 𝐂𝐕 = × 𝟏𝟎𝟎 = 𝐐𝟐 𝐐𝟐 𝐐𝟐 𝟏𝟓−𝟕 × 𝟏𝟎𝟎 = 𝟕𝟐. 𝟕% 𝟏𝟔 − 𝟖 𝟏𝟖 − 𝟏𝟎 𝟏𝟏 × 𝟏𝟎𝟎 = 𝟖𝟎% × 𝟏𝟎𝟎 = 𝟓𝟎% 𝟏𝟎 𝟏𝟔 More dispersion; more risky. (3) 𝐔𝐩𝐩𝐞𝐫 𝐅𝐞𝐧𝐜𝐞 𝐔𝐩𝐩𝐞𝐫 𝐅𝐞𝐧𝐜𝐞 𝐔𝐩𝐩𝐞𝐫 𝐅𝐞𝐧𝐜𝐞 = 𝐐𝟑 = 𝐐𝟑 = 𝐐𝟑 + 𝟏. 𝟓 + 𝟏. 𝟓 + 𝟏. 𝟓 × (𝐐𝟑 −𝐐𝟏 ) = × (𝐐𝟑 −𝐐𝟏 ) = × (𝐐𝟑 −𝐐𝟏 ) = 15+1.5×(8)=27 16+1.5×(108)=28 18+1.5×(8)=30 29 > 27, 29 is an outlier. 29 > 28, 4029 is an outlier. 29 not > 30, 29 not outlier. (4) Stock (3), because 29 is not outlier. (5) Symmetric Skewed to right Skewed to left (Q3- Q2 ) = (Q2- Q1) (Q3 - Q2 ) > (Q2- Q1) (Q3 - Q2 ) < (Q2- Q1) 4=4 6 > 2 2 < 6 (6) Stock (3), because the distribution is skewed to the left, which means that the majority of stocks earn high returns. (7) Mode = Median (11) Mode < Median (10) Mode > Median (16) Dr. Mohammad Solayman Example (4): Given the following information about the students' Math scores of sections: A and B. Section A B Smallest 40 50 First quartile 62 65 Median 70 70 Third quartile 72 85 Largest 80 100 1) Draw the Box-Plots for both sections. 2) Compare the general levels in the two sections. 3) Compare the stability of the scores in the two sections. 4) Calculate the coefficient of skewness in each section. Solution: 1) Section A Section B ✓ 𝐔𝐩𝐩𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟑 + 𝟏. 𝟓 × ✓ 𝐔𝐩𝐩𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟑 + 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟕𝟐 + 𝟏. 𝟓(𝟕𝟐 − 𝟔𝟐) = 𝟕𝟐 + (𝑸𝟑 −𝑸𝟏 ) = 𝟖𝟓 + 𝟏. 𝟓(𝟖𝟓 − 𝟔𝟓) = 𝟖𝟓 + 𝟏𝟓 = 𝟖𝟕 𝟑𝟎 = 𝟏𝟏𝟓 ✓ 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟏 − 𝟏. 𝟓 × ✓ 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟏 − 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟔𝟐 − 𝟏. 𝟓(𝟕𝟐 − 𝟔𝟐) = 𝟔𝟐 − (𝑸𝟑 −𝑸𝟏 ) = 𝟔𝟓 − 𝟏. 𝟓(𝟖𝟓 − 𝟔𝟓) = 𝟔𝟓 − 𝟏𝟓 = 𝟒𝟕 𝟑𝟎 = 𝟑𝟓 ✓ Draw a box using 𝑸𝟏 , 𝑸𝟐 , 𝑸𝟑 ✓ Draw a box using 𝑸𝟏 , 𝑸𝟐 , 𝑸𝟑 ✓ The upper “whisker” connected to: ✓ The upper “whisker” connected to: Min. ( Largest, Upper Limit) = Min. ( Largest, Upper Limit) = Min. (80, 87)= 𝟖𝟎 Min. (100, 115)= 𝟏𝟎𝟎 ✓ The lower “whisker” connected to: ✓ The lower “whisker” connected to: Max. ( Smallest, Lower Limit) = Max. ( Smallest, Lower Limit) = Max (40, 47)= 𝟒𝟕 Max (50, 35)= 𝟓𝟎 Dr. Mohammad Solayman ✓ 40 is an outlier, because 40 Basic Statistics > Display Descriptive Statistics. And in the dialogue box obtained, enter the name of the column where the data are stored in the Variables box as shown below Dr. Mohammad Solayman Then select Statistics button and choose the summary measures you want to calculate, and click OK. The result will appear in the session window as follows: 3- To draw the Box plot, we select Graph >Boxplot >Simple and click OK. Dr. Mohammad Solayman The result will be: Comment: The distribution is skewed to the left (or negatively skewed) because (𝑄3 − 𝑄2 ) < (𝑄2 − 𝑄1 ) 20 is an outlier, because 20 < 40 (upper limit). Dr. Mohammad Solayman Answers to exercises Exercise (1): 1) 𝑄1 = 39 + 0.5(41 − 39) = 40 𝑄2 = 45 𝑄3 = 58 + 0.5(62 − 58) = 60 2) Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 60 + 1.5(60 − 40) = 60 + 30 = 90 Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1) = 40 − 1.5(60 − 40) = 40 − 30 = 10 105 100 90 80 70 60 50 40 30 20 12 3) 5 < 10 (lower limit), then 5 is an outlier. 4) (Q3 - Q2 ) > (Q2- Q1) (60 - 45) > (45 - 40) Skewed to right (𝑄3 −𝑄2 )−(𝑄2 −𝑄1 ) 15−5 5) 𝛽 = = = 0.5 (skewed to right or positive skewed). (𝑄3 −𝑄1 ) 20 Exercise (2): 1) (𝑄3 −𝑄1 ) = (24 − 16) = 8 ▪ Upper Limit (fence) = 𝑄3 + 1.5 × (𝑄3 −𝑄1 ) = 24 + 1.5(8) = 24 + 12 = 36 ▪ Lower Limit (fence) = 𝑄1 − 1.5 × (𝑄3 −𝑄1 ) = 16 − 1.5(8) = 16 − 12 = 4 ▪ Draw a vertical line (from 2 to 32) to represent the scale of measurement. ▪ Draw a box using 𝑄1, 𝑄2 , 𝑄3 ▪ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (32, 36)= 32 Dr. Mohammad Solayman ▪ The lower “whisker” connected to: Max. ( Smallest, Lower Limit)= Max (2, 4)= 4 ▪ The graph: Boxplot 32 28 24 20 16 12 8 4 2 (𝑄3 −𝑄2 )−(𝑄2 −𝑄1 ) (24−20)−(20−16) 4−4 0 2) 𝜷 = (𝑄3 −𝑄1 ) = (24−16) = = 8 = 0 , (Symmetric distribution). 8 3) Because the distribution is symmetric, Mean = Median = 20 (𝑄3 −𝑄1 ) 8 4) 𝐶𝑉 = × 100 = 20 × 100 = 40% 𝑄2 Exercise (3): 1) 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 = 30, 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 = 98. ▪ Upper 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟑 + 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟔𝟎 + 𝟏. 𝟓(𝟔𝟎 − 𝟒𝟎) = 𝟔𝟎 + 𝟑𝟎 = 𝟗𝟎 ▪ 𝐋𝐨𝐰𝐞𝐫 𝐋𝐢𝐦𝐢𝐭 (𝐟𝐞𝐧𝐜𝐞) = 𝑸𝟏 − 𝟏. 𝟓 × (𝑸𝟑 −𝑸𝟏 ) = 𝟒𝟎 − 𝟏. 𝟓(𝟔𝟎 − 𝟒𝟎) = 𝟒𝟎 − 𝟑𝟎 = 𝟏𝟎 ▪ Draw a vertical line (from 𝟑𝟎 to 𝟗𝟖) to represent the scale of measurement. ▪ Draw a box using 𝑸𝟏 , 𝑸𝟐 , 𝑸𝟑 ▪ The upper “whisker” connected to: Min. ( Largest, Upper Limit)=Min (98, 90)= 𝟗𝟎 ▪ The lower “whisker” connected to: Max. (Smallest, Lower Limit)= Max (30, 10)= 𝟑𝟎 ▪ The graph: Dr. Mohammad Solayman 98 90 80 70 60 50 40 30 2) The Median "𝑸𝟐 "= 𝟒𝟓 is a measure of central tendency, and the measure of homogeneity is the "Interquartile Range"; 𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 = (60 − 40) = 20 3) 15 is not less than 10 (lower fence), so 15 is not outlier. 4) The distribution is skewed to right (or positive skewed), because (𝑸𝟑 − 𝑸𝟐 ) > (𝑸𝟐 − 𝑸𝟏 ) (𝑸𝟑 −𝑸𝟐 )−(𝑸𝟐 −𝑸𝟏 ) (𝟗𝟎−𝟕𝟓)−(𝟕𝟓−𝟕𝟎) 𝟏𝟓−𝟓 𝟏𝟎 5) 𝜷 = = = = 𝟐𝟎 = 𝟎. 𝟓 (skewed to right) (𝑸𝟑 −𝑸𝟏 ) (𝟗𝟎−𝟕𝟎) 𝟐𝟎 6) The distribution is skewed to the right, the majority get low scores, so the exam is difficult. Dr. Mohammad Solayman