Statistics Lecture Notes - Measures PDF
Document Details
Uploaded by BeneficiaryFactorial
Tags
Related
- Measures of Central Tendency PDF
- QTTM 409 Class Presentaiton of Dr P James Daniel Paul MSB LPU PDF
- Quantitative Research Methods in Political Science Lecture 3 PDF
- Basic Statistics (FBQT 1024) Chapter 3 PDF
- Quantitative Research Methods in Political Science Lecture 3 PDF
- Lecture Notes on Basic Statistics - Measures of Central Tendency PDF
Summary
This document discusses various measures in statistics, including arithmetic mean, geometric mean, quadratic mean, and harmonic mean for simple and weighted data. It covers the definitions and formulas for each measure, highlighting the applications of these measures in various statistical contexts.
Full Transcript
2.4 MEASURES Some numbers give quantitative information about the magnitude of the observations (measures of central tendency), the variability of the observations (measures of variability) and the shape of the observations (measures of shape), relative to a given random varable X. In the follow...
2.4 MEASURES Some numbers give quantitative information about the magnitude of the observations (measures of central tendency), the variability of the observations (measures of variability) and the shape of the observations (measures of shape), relative to a given random varable X. In the following, we will consider both simple data x1 , x2 ,... , xn (e.g. data that occur with unitary frequency) and weighted data x1 , x2 ,... , xr with absolute frequencies f1 , f2 ,... , fr (where we have f1 + f2 + · · · + fr = n), n is the number of data. Remark 2.1 In the case of a frequency distribution with grouped data, x1 , x2 ,... , xr represent the central values of the classes, f1 , f2 ,... , fr , the corresponding absolute frequencies and r is the number of classes. 2.4.1 MEASURES OF CENTRAL TENDENCY The most common measures of central tendency are the averages. Definition 2.2 The arithmetic mean or simple mean is given by for simple data: 1∑ n 1 E(X) = x = xi = (x1 + x2 + · · · + xn ); (3) n i=1 n for weighted data: 1∑ r 1 E(X) = x = xi fi = (x1 f1 + x2 f2 + · · · + xr fr ). (4) n i=1 n We note that the arithmetic mean will be indicated with µ when it refers to the population, and with x when it refers to the sample. Definition 2.3 The geometric mean is defined only for positive data and is given by for simple data: v u n u∏ √ EG (X) = t n xi = n x1 · x2 · · · · · xn ; (5) i=1 for weighted data: v u r √ u∏ f EG (X) = t n xi = xf11 · xf22 · · · · · xfrr. i n (6) i=1 Definition 2.4 The quadratic mean or root mean square is given by for simple data: v √ u n u1 ∑ 1 2 EQ (X) = t x2i = (x + x22 + · · · + x2n ); (7) n i=1 n 1 15 for weighted data: v √ u u1 ∑ r 1 2 EQ (X) = t x2i fi = (x f1 + x22 f2 + · · · + x2r fr ). (8) n i=1 n 1 Definition 2.5 The harmonic mean is given by for simple data: n n EH (X) = ∑n = ; (9) 1 i=1 xi 1 x1 + 1 x2 + ··· + 1 xn for weighted data: n n EH (X) = ∑r =. (10) fi i=1 xi f1 x1 + f2 x2 + ··· + fr xr Definition 2.6 The median is the value that occupies the central position in an ordered set of data, that is the value such that, ordering the data in an increasing order, we have the same numbers of values on its left and on its right. Property 2.7 The median is not affected by extreme values and so it is used when we want mitigate the effect of extreme values very high or low. The median is also used in the case of ordinal scales. To calculate the median we have to: sort values; if the sample has an odd number of data, the median is the central value of the data, in the position (n + 1)/2; for example, the median of the ordered data 1, 4, 5, 7, 20 is 5; if the sample has an even number of data, the median is the average of the two mid- dle values (positions n/2 and n/2 + 1); for example, the median of the ordered data 1, 4, 5, 7, 20, 21 is (5 + 7)/2 = 6. N.B. To calculate the median of a frequency distribution for grouped data see the calculation of quantiles. Definition 2.8 The mode is the most frequent value of a distribution. Property 2.9 The mode is not affected by the presence of any extreme value but in the fre- quency distributions for grouped data it is very sensitive to the way chosen for the construction of the classes. It has only a descriptive purposes. Definition 2.10 A frequency distribution is called unimodal if it has a unique mode, is called bimodal if it has a secondary mode (see Figure 9), is called multimodal if it has more than two modes. Definition 2.11 The mid-range is the arithmetic average between the smaller value and the larger value between the observed ones. 16 Figure 9: Example of a bimodal frequency distribution. Property 2.12 The mid-range can be rapidly calculated even with a high number of data. It is used only when there are no outliers (wrong value) in order to prevent a value of the mid-range very distorted. For example, it is not used in meteorology, for the calculation of the monthly average of precipitations or the monthly average of temperature in the case of a data series, since it is unlikely the presence of the extreme values. Definition 2.13 The quantile and percentile are measures of tendency not central with exclusive descriptive purposes, more precisely a q-quantile (0 < q < 1) is the value x such that a fraction equal to q of the observations are smaller than x or equivalently is the value x whose cumulative relative frequency is equal to q. Note that the q-quantile correspond to (q · 100)-percentile. Property 2.14 In order to compute the q-quantile (0 < q < 1), for grouped data, we assume an uniform data distribution within each class, let α < β be the unique two cumulative relative frequences such that α < q ≤ β (note that if α does not exist then we put α = 0) and let a < b the upper actual endpoints of the corresponding classes (if α = 0 then a is the lower actual endpoint of the first class) then the q-quantile is the value x such that q−α x−a =. β−α b−a Note that the calculation above is performed by the formula expressing the equation of a straight line (x, q) passing through two points (a, α) and (b, β). For example, by considering the frequency distribution of Table 5, we have that the cumulative relative frequency has the graph in Figure 10 and to compute the 0.05-quantile, we have q = 0.05, α = 0.025, β = 0.1, a = 79, b = 99 and so 0.05 − 0.025 x − 80 0.025 = ⇒ x = 80 + 20 = 86.6 0.1 − 0.025 100 − 80 0.075 that is the 0.05-quantile is 85.6 as we can see also from the following figure. Remark 2.15 The median of a distribution of grouped data coincides with the 0.5-quantile. 17 pCi 1 0.8 0.6 0.4 0.2 Height of plants 60 80 100 120 140 160 180 200 Figure 10: The cumulative relative frequancy of the hight of 40 plants of Table 4. 2.4.2 MEASURES OF VARIABILITY Definition 2.16 The range is the difference between the maximum and the minimum value of the data. Property 2.17 The range is intuitive and easy to compute, especially when the data is sorted. It is unable to measure how the data are distributed within the range. It is affected by the presence of outliers and it is a purely descriptive measure. Definition 2.18 The mean absolute deviation or average deviation is given by for simple data: 1∑ n Sm = |xi − x|; (11) n i=1 for weighted data: 1∑ r Sm = |xi − x|fi. (12) n i=1 Definition 2.19 The median absolute deviation is the average of the absolute deviations of the observations from their median and it is calculated as above, by replacing the arithmetic mean with the median. Definition 2.20 The sum of the squared deviations from the mean (SQ) is the basis of the data variability measures and it can be calculated in two different ways: I METHOD: – for simple data (it is the sum of the squared deviations from the mean): ∑ n SQ = (xi − x)2 ; (13) i=1 18