Statistics Lecture Notes - Measures PDF

2.4 MEASURES Some numbers give quantitative information about the magnitude of the observations (measures of central tendency), the variability of the observations (measures of variability) and the shape of the observations (measures of shape), relative to a given random varable X. In the following, we will consider both simple data x1 , x2 ,... , xn (e.g. data that occur with unitary frequency) and weighted data x1 , x2 ,... , xr with absolute frequencies f1 , f2 ,... , fr (where we have f1 + f2 + · · · + fr = n), n is the number of data. Remark 2.1 In the case of a frequency distribution with grouped data, x1 , x2 ,... , xr represent the central values of the classes, f1 , f2 ,... , fr , the corresponding absolute frequencies and r is the number of classes. 2.4.1 MEASURES OF CENTRAL TENDENCY The most common measures of central tendency are the averages. Deﬁnition 2.2 The arithmetic mean or simple mean is given by for simple data: 1∑ n 1 E(X) = x = xi = (x1 + x2 + · · · + xn ); (3) n i=1 n for weighted data: 1∑ r 1 E(X) = x = xi fi = (x1 f1 + x2 f2 + · · · + xr fr ). (4) n i=1 n We note that the arithmetic mean will be indicated with µ when it refers to the population, and with x when it refers to the sample. Deﬁnition 2.3 The geometric mean is deﬁned only for positive data and is given by for simple data: v u n u∏ √ EG (X) = t n xi = n x1 · x2 · · · · · xn ; (5) i=1 for weighted data: v u r √ u∏ f EG (X) = t n xi = xf11 · xf22 · · · · · xfrr. i n (6) i=1 Deﬁnition 2.4 The quadratic mean or root mean square is given by for simple data: v √ u n u1 ∑ 1 2 EQ (X) = t x2i = (x + x22 + · · · + x2n ); (7) n i=1 n 1 15 for weighted data: v √ u u1 ∑ r 1 2 EQ (X) = t x2i fi = (x f1 + x22 f2 + · · · + x2r fr ). (8) n i=1 n 1 Deﬁnition 2.5 The harmonic mean is given by for simple data: n n EH (X) = ∑n = ; (9) 1 i=1 xi 1 x1 + 1 x2 + ··· + 1 xn for weighted data: n n EH (X) = ∑r =. (10) fi i=1 xi f1 x1 + f2 x2 + ··· + fr xr Deﬁnition 2.6 The median is the value that occupies the central position in an ordered set of data, that is the value such that, ordering the data in an increasing order, we have the same numbers of values on its left and on its right. Property 2.7 The median is not aﬀected by extreme values and so it is used when we want mitigate the eﬀect of extreme values very high or low. The median is also used in the case of ordinal scales. To calculate the median we have to: sort values; if the sample has an odd number of data, the median is the central value of the data, in the position (n + 1)/2; for example, the median of the ordered data 1, 4, 5, 7, 20 is 5; if the sample has an even number of data, the median is the average of the two mid- dle values (positions n/2 and n/2 + 1); for example, the median of the ordered data 1, 4, 5, 7, 20, 21 is (5 + 7)/2 = 6. N.B. To calculate the median of a frequency distribution for grouped data see the calculation of quantiles. Deﬁnition 2.8 The mode is the most frequent value of a distribution. Property 2.9 The mode is not aﬀected by the presence of any extreme value but in the fre- quency distributions for grouped data it is very sensitive to the way chosen for the construction of the classes. It has only a descriptive purposes. Deﬁnition 2.10 A frequency distribution is called unimodal if it has a unique mode, is called bimodal if it has a secondary mode (see Figure 9), is called multimodal if it has more than two modes. Deﬁnition 2.11 The mid-range is the arithmetic average between the smaller value and the larger value between the observed ones. 16 Figure 9: Example of a bimodal frequency distribution. Property 2.12 The mid-range can be rapidly calculated even with a high number of data. It is used only when there are no outliers (wrong value) in order to prevent a value of the mid-range very distorted. For example, it is not used in meteorology, for the calculation of the monthly average of precipitations or the monthly average of temperature in the case of a data series, since it is unlikely the presence of the extreme values. Deﬁnition 2.13 The quantile and percentile are measures of tendency not central with exclusive descriptive purposes, more precisely a q-quantile (0 < q < 1) is the value x such that a fraction equal to q of the observations are smaller than x or equivalently is the value x whose cumulative relative frequency is equal to q. Note that the q-quantile correspond to (q · 100)-percentile. Property 2.14 In order to compute the q-quantile (0 < q < 1), for grouped data, we assume an uniform data distribution within each class, let α < β be the unique two cumulative relative frequences such that α < q ≤ β (note that if α does not exist then we put α = 0) and let a < b the upper actual endpoints of the corresponding classes (if α = 0 then a is the lower actual endpoint of the ﬁrst class) then the q-quantile is the value x such that q−α x−a =. β−α b−a Note that the calculation above is performed by the formula expressing the equation of a straight line (x, q) passing through two points (a, α) and (b, β). For example, by considering the frequency distribution of Table 5, we have that the cumulative relative frequency has the graph in Figure 10 and to compute the 0.05-quantile, we have q = 0.05, α = 0.025, β = 0.1, a = 79, b = 99 and so 0.05 − 0.025 x − 80 0.025 = ⇒ x = 80 + 20 = 86.6 0.1 − 0.025 100 − 80 0.075 that is the 0.05-quantile is 85.6 as we can see also from the following ﬁgure. Remark 2.15 The median of a distribution of grouped data coincides with the 0.5-quantile. 17 pCi 1 0.8 0.6 0.4 0.2 Height of plants 60 80 100 120 140 160 180 200 Figure 10: The cumulative relative frequancy of the hight of 40 plants of Table 4. 2.4.2 MEASURES OF VARIABILITY Deﬁnition 2.16 The range is the diﬀerence between the maximum and the minimum value of the data. Property 2.17 The range is intuitive and easy to compute, especially when the data is sorted. It is unable to measure how the data are distributed within the range. It is aﬀected by the presence of outliers and it is a purely descriptive measure. Deﬁnition 2.18 The mean absolute deviation or average deviation is given by for simple data: 1∑ n Sm = |xi − x|; (11) n i=1 for weighted data: 1∑ r Sm = |xi − x|fi. (12) n i=1 Deﬁnition 2.19 The median absolute deviation is the average of the absolute deviations of the observations from their median and it is calculated as above, by replacing the arithmetic mean with the median. Deﬁnition 2.20 The sum of the squared deviations from the mean (SQ) is the basis of the data variability measures and it can be calculated in two diﬀerent ways: I METHOD: – for simple data (it is the sum of the squared deviations from the mean): ∑ n SQ = (xi − x)2 ; (13) i=1 18

Statistics Lecture Notes - Measures PDF

Document Details

Tags

Related

Summary

Full Transcript