Data Description: Measures of Variation PDF

Data description: measures of variation Clara Al Kosseifi Goals Goals The arithmetic means are both 200 mg/dL. However, the two samples appear radically different. This difference lies in the greater variability, or spread. Several different measures can be used to describe the variability of a sample. Range Range The range is very easy to compute. It is very sensitive to extreme observations. The range is that it depends on the sample size (n). The larger n is, the larger the range tends to be. We can’t compare ranges from data sets of differing size. Range: example Quantiles/Percentiles The range is very easy to compute. the pth percentile is the value 𝑉𝑉𝑝𝑝 such that p percent of the sample points are less than or equal to 𝑉𝑉𝑝𝑝. The median, being the 50th percentile, is a special case of a quantile. Quantiles/Percentiles Quantiles/Percentiles Quantiles/Percentiles To compute percentiles, the sample points must be ordered. This can be difficult if n is even moderately large. Frequently used percentiles are quartiles (25th, 50th, and 75th percentiles). quintiles (20th, 40th, 60th, and 80th percentiles) deciles (10th, 20th,... , 90th percentiles) Variance and standard deviation Data variation: is based on the difference or distance each data value is from the mean. This difference or distance is called deviation. Example Cholesterol measurement : ∑(X-μ)=0 To eliminate this problem we sum the squares and find the mean by dividing by n-1(n is total number of data) ➪variance s2= ∑(X-μ)2/(n-1) and the standard deviation s = √𝑠𝑠2 If the values are near to the mean ➪ the variance s2 is small If the values are far from the mean ➪ the variance s2 is large Variance and standard deviation Properties of the Variance and standard deviation If we create a translated sample 𝑥𝑥1 + 𝑐𝑐, ⋯ , 𝑥𝑥𝑛𝑛 + 𝑐𝑐 by adding a constant c to each data point then the variance and standard deviation to remain the same because the relationship of the points in the sample relative to one another remains the same: Properties of the Variance and standard deviation If we create a rescaled sample c. 𝑥𝑥1 , ⋯ , 𝑐𝑐. 𝑥𝑥𝑛𝑛 by multiplying by a constant c each data point then the arithmetic mean of the rescaled sample is also rescaled: Properties of the Variance and standard deviation Properties of the Variance and standard deviation Shortcut formulas To save time when repeated subtracting and squaring occur in the original formulas we have shortcut formulas mathematically equivalent to the previous The Coefficient of variation CV It is useful to relate the arithmetic mean and the standard deviation to 𝒔𝒔 each other: 𝑪𝑪𝑪𝑪 = × 𝟏𝟏𝟏𝟏𝟏𝟏 𝒙𝒙 The CV is most useful in comparing the variability of several different samples, each with different arithmetic means. A more accurate comparison could be made by comparing the CVs than by comparing the standard deviations. This measure remains the same regardless of what units are used because if the units change by a factor c, then both the mean and standard deviation change by the factor c; The CV remains the same. The Coefficient of variation CV Grouped data Consider the data set in Table 2.7, which represents the birthweights from 100 consecutive deliveries at a Boston hospital The simplest way to display the data is to generate a frequency distribution. A frequency distribution is an ordered display of each value in a data set together with its frequency, that is, the number of times that value occurs in the data set Grouped data the frequency (Count), relative frequency (Percent)= 𝑛𝑛 × 100 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 cumulative frequency (CumCnt), is the number of data in the sample that are less than or equal to b. Grouped data: steps The data could be grouped into broader categories: 1. Subdivide the data into k intervals, starting at some lower bound 𝑦𝑦1 and ending at some upper bound 𝑦𝑦𝑘𝑘+1. 2. the kth and last interval is from 𝑦𝑦𝑘𝑘 inclusive to 𝑦𝑦𝑘𝑘+1 exclusive. The first interval is from 𝑦𝑦1 inclusive to 𝑦𝑦2 exclusive; 3. The group intervals are generally chosen to be equal 4. A count is made of the number of units that fall in each interval Grouped data: steps Group interval Frequency (count) Percent CumCnt CumPct 29.5 ≤ 𝑥𝑥 < 69.5 5 5 5 5 100 × 100 = 5 69.5 ≤ 𝑥𝑥 < 89.5 10 10 15 15 89.5 ≤ 𝑥𝑥 < 99.5 11 11 26 26 99.5 ≤ 𝑥𝑥 < 109.5 19 19 45 45 109.5 ≤ 𝑥𝑥 < 119.5 17 17 62 62 119.5 ≤ 𝑥𝑥 < 129.5 20 20 82 82 129.5 ≤ 𝑥𝑥 < 139.5 12 12 94 94 139.5 ≤ 𝑥𝑥 < 169.5 6 6 100 100 Total 100 100 Resume Variance and standard deviation can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. It is useful in comparing two (or more) data sets to determine which is more (most) variable Variance and standard deviation are used to determine the consistency of a variable When 2 variables with different units should be compared we use the coefficient of variation

Data Description: Measures of Variation PDF

Document Details

Tags

Related

Summary

Full Transcript