2024 BSc Statistics and Probability PDF
Document Details
Uploaded by UnquestionableParabola
2024
Pia Domschke
Tags
Summary
This document is a BSc Statistics and Probability lecture script for January 2024. It covers topics such as descriptive statistics, probability theory, and inferential statistics. The document also includes references for further reading and an index.
Full Transcript
Bachelor of Science Statistics and Probability Module coordinator: Pia Domschke as of January 29, 2024 Course overview I Descriptive Statistics 1 Statistical attributes and variables.......................................................... 5 2 Measures to describe statistical distributions......
Bachelor of Science Statistics and Probability Module coordinator: Pia Domschke as of January 29, 2024 Course overview I Descriptive Statistics 1 Statistical attributes and variables.......................................................... 5 2 Measures to describe statistical distributions............................................. 31 3 Two dimensional distributions............................................................. 71 4 Linear regression..........................................................................103 II Probability Theory 5 Combinatorics and counting principles.................................................. 120 6 Fundamentals of probability theory....................................................... 131 7 Random variables in one dimension...................................................... 169 8 Multidimensional random variables....................................................... 210 9 Stochastic models and special distributions.............................................. 237 10 Limit theorems............................................................................. 259 1-1 III Inferential Statistics 11 Point estimators for parameters of a population..........................................277 12 Interval estimators........................................................................ 286 13 Statistical testing.......................................................................... 317 IV Appendix Literature...................................................................................348 - 1 Index....................................................................................... 349 - 1 1-2 Statistics 2 Literature: [Anderson et al., 2012] Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., and Cochran, J. D. (2012). Statistics for Business and Economics. South-Western Cengage Learning, 12th edition. [Bleymüller, 2012] Bleymüller, J. (2012). Statistik für Wirtschaftswissenschaftler. Vahlen, 16th edition. [Newbold et al., 2013] Newbold, P., Carlson, W. L., and Thorne, B. M. (2013). Statistics for business and economics. Pearson, 8th edition. [Schira, 2016] Schira, J. (2016). Statistische Methoden der VWL und BWL – Theorie und Praxis. Pearson, 5th edition. 3 Part I – Descriptive Statistics 1 Statistical attributes and variables.......................................................... 5 2 Measures to describe statistical distributions............................................. 31 3 Two dimensional distributions............................................................. 71 4 Linear regression..........................................................................103 4 1 Statistical attributes and variables 1.1 Statistical units and populations....................................................... 6 1.2 Attributes/Characteristics and their values............................................. 8 1.3 Subpopulations, random samples.....................................................12 1.4 Statistical distribution................................................................. 15 1.5 Frequency and distribution function................................................... 20 1.6 Frequency density and histograms.................................................... 24 according to [Schira, 2016], chapter 1 see also: [Anderson et al., 2012], chapter 1 & 2; and [Newbold et al., 2013], chapter 1 1. Statistical attributes and variables 5 Statistical units and populations Definition: Statistical units are the objects whose attributes are of interest in a given context and are focussed on and observed, surveyed, or measured within the scope of empirical investigation. The identification of similar statistical units belonging to a statistical population are es- sentially given by objective and precise identification criteria (IC) relating to 1. time 2. space and 3. objectivity Examples of statistical units: Motor vehicles, buildings, horses, students, civil servants, farms, branches, apples, sales, marriages, births, accidents, bank accounts, etc. 1. Statistical attributes and variables 6 Statistical units and populations Definition: The set Ω := {ω | ω fulfills (IC) } of all statistical units ω , that fulfill the well defined identification criteria (IC) is called the population. Synonyms are statistical mass, collective Examples of statistical populations: Traffic accidents in Bavaria in 2002 Traffic accidents with personal injury in Germany in 1999 Students in lectures on Tuesday, 25.02.2014 at 2:15pm in Frankfurt School of Finance and Management, Germany Registered bankruptcies of building companies in North Rhine Westphalia in April 2002 1. Statistical attributes and variables 7 Attributes/Characteristics and their values Statistical units ω are not of direct interest, but some of their attributes M (ω). Distinguishable manifestations of a characteristic are called characteristic values or modes. Examples: The characteristic gender has the possible values {male, female, diverse}. The characteristic eye color has the possible values {blue, green, grey, brown}. For the characteristic body weight of adult humans all values between 30 and 300 kg have to be allowed as possible values. 1. Statistical attributes and variables 8 Statistical variable Definition: The statistical variable assigns a real number x to a statistical unit ω or its characteristic M (ω). Thus x = X (ω) = Fkt (M (ω)). X (ω) is a real-valued function of the characteristic values M (ω) and thus of the statistical units X :Ω→R ω 7→ X (ω) = x Difference between M and X : X (ω) is always a real number. M (ω) can also be „green“ or „married“. Characteristic (if numerical) and statistical variable are often used as synonyms, although strictly speaking they do not precisely denote the same thing. One often simply says: „the statistical variable X “ or „the characteristic X “ 1. Statistical attributes and variables 9 Types of attributes Qualitative characteristics are e.g.: gender religious belief legal form of companies, etc. Quantitative characteristics or variables are e.g.: age (in whole years) number of children discrete income living space continuous length of a line characteristics or variables 1. Statistical attributes and variables 10 Measurement levels (scales) Nominally measurable variables Increasing degree of measurability Nominal attributes are always qualitative. Examples: religion, nationality, professions, color of a car, the legal structure of a company Ordinally measurable variables There exists a natural or meaningful way to determine the ranking. Examples: intelligence quotient, school grades or table positions in the German football league. Cardinally measurable variables In the case of cardinally scaled variables, also the difference between outcomes is meaningful Examples: GDP, investment, inflation, costs, revenue and profit 1. Statistical attributes and variables 11 Subpopulations, random samples Possibilities to sample the values of a characteristic Full sampling ( but in many cases not feasible) Partial sampling Definition: Each proper subset Ω∗ of Ω is called a subpopulation or sample of the whole popula- tion. Subpopulations are called random samples if chance played a significant role in the selection of the elements. 1. Statistical attributes and variables 12 Subpopulations, random samples Pure random sampling: Each part of the population has the same chance of being selected in the random sample. Representative random sampling: The aim is to select a subpopulation that is representative of the whole population. As the structure of the characteristics we are interested in is unknown before sampling, we try to ensure that it is representative with respect to other characteristics where we assume that the characteristic to be investigated has a certain „statistical relationship“ to this other characteristic. 1. Statistical attributes and variables 13 Subpopulations, random samples Example A research institute creates an election forecast. For this purpose, 3000 entitled voters are asked the so-called Sunday question: „Which party would you choose if there were elections next Sunday?“ To get more realistic results , the random sample is selected on a representative basis: consequently other characteristics are taken into consideration that could have a statistical influence on party preference. The random sample needs to include the share of women in the population of all eligible voters. The age structure should also conform to the whole population. This already makes the sample quite representative for this purpose. It would certainly still be important to take the geographical distribution into account to avoid a situation where too many respondents happen to live in Baden-Württemberg. Furthermore, it would be good if the professional structure were at least analogous in the characteristics workers, employees, civil servants, self-employed. Yes, and of course students must be in the sample, otherwise Green voters might be underrepresented. 1. Statistical attributes and variables 14 Statistical distribution Raw data table Elements ω1 ω2... ωi... ωn Possible outcomes x1 x2... xi... xn Definition: The (finite) sequence of the n values x1 , x2 ,... , xi ,... , xn withxi = X (ωi ) fori = 1,... , n is called observation series of the variable X or simply data set X. 1. Statistical attributes and variables 15 Statistical distribution If the order of the observations does not matter, it is often helpful to sort and renumber the variable values. x1 ≤ x2 ≤ x3 ≤ · · · ≤ xi ≤ · · · ≤ xn Example: n = 20 observations 1.6 1.6 3.0 3.0 3.0 3.0 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 5.0 5.0 5.0 5.0 5.0 5.0 withk = 4 different outcomes: 1.6 3.0 4.1 5.0 1. Statistical attributes and variables 16 Absolute and relative frequency Definition: The absolute frequency ni := absF(X = xi ) indicates how often the statistical variable X takes a certain value xi. The relative frequency ni hi := relF(X = xi ) = , 0 < hi ≤ 1 n indicates the share of the characteristic expression xi in the population. 1. Statistical attributes and variables 17 Statistical distribution Definition: The tables x1 x2... xk x1 x2... xk and n1 n2... nk h1 h2... hk k P k P with ni = n and hi = 1 i =1 i =1 are called absolute and relative frequency distribution of the statistical variable X , respectively. Example: xi 1.6 3.0 4.1 5.0 xi 1.6 3.0 4.1 5.0 ni 2 4 8 6 hi 0.1 0.2 0.4 0.3 4 P 4 P ni = 20 hi = 1 i =1 i =1 1. Statistical attributes and variables 18 Statistical distribution 1.6 1.6 3.0 3.0 3.0 3.0 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 5.0 5.0 5.0 5.0 5.0 5.0 ni hi 10 0.5 8 0.4 6 0.3 4 0.2 2 0.1 xi 1.6 3 4.1 5 Graph of a frequency distribution 1. Statistical attributes and variables 19 Frequency and distribution function Definition: The function ( hi , if x = xi h(x ) = 0 otherwise is called the (relative) frequency function of the statistical variable X. The function X H (x ) = h ( xi ) xi ≤x is called the empirical cumulative distribution function of the statistical variable X. 1. Statistical attributes and variables 20 Frequency and distribution function Example xi 1.6 3.0 4.1 5.0 hi 0.1 0.2 0.4 0.3 Hi 0.1 0.3 0.7 1.0 h(x ) frequency function 0.5 0.25 x 1 2 3 4 5 6 H (x ) distribution function 1 0.75 0.5 step function 0.25 x 1 2 3 4 5 6 1. Statistical attributes and variables 21 Frequency and distribution function Properties of the empirical cumulative distribution function The empirical cumulative distribution function H is 1. everywhere at least continuous to the right lim H (x + ∆x ) = H (x ) ∆x →0+ (at jumps it is only continuous to the right), 2. monotonically increasing H (a) ≤ H (b), if a < b 3. and has lower limit 0 and upper limit 1 lim H (x ) = 0, lim H (x ) = 1 x →−∞ x →∞ 1. Statistical attributes and variables 22 Frequency and distribution function Further properties of the distribution function 1. For a < b, the difference H (b) − H (a) = relF(a < X ≤ b) specifies the relative frequency of observed values of the variable X that are greater than a , but not greater than b. 2. The function value at a point x indicates the relative frequency of which values less than or equal to x occur in the data set: H (x ) = relF(X ≤ x ) 3. At each point, the values of the frequency function are obtained from the empirical distribution function as the difference h(x ) = H (x ) − lim H (x − ∆x ) ∆x →0+ 1. Statistical attributes and variables 23 Frequency density and histograms Due to limitations in practice, we often apply Formation of class intervals or layers Income distribution Germany as a whole: Class size Class frequency Distribution function of the classes Approximation by polygons Frequency density of the classes Frequency density function and histogram Approximation by smooth curves continuous density function 1. Statistical attributes and variables 24 Frequency density and histograms Formation of class intervals or layers with appropriately selected class limits ξ0 , ξ 1 , ξ 2 ,... , ξ m : x ξ0 ξ1 ξ2 ξ3...... ξm The m sections have the class sizes ∆i := ξi − ξi −1 , i = 1,... , m and the class frequency of the values in each size class is hi := relH(ξi −1 < X ≤ ξi ), i = 1,... , m Note: [Schira, 2016] uses right hand inclusion while [Anderson et al., 2012] and [Newbold et al., 2013] use left hand inclusion 1. Statistical attributes and variables 25 Frequency density and histograms Definition: By assigning the class frequencies to the upper limits of the classes (an alternative possibilty would be to assign the class frequencies to the class centers), the following frequency table can be drawn from the values ξ1 ξ2... ξm k P hi = 1 h1 h2... hm i =1 and hence the so-called distribution function of the classes HK (x ). Exercise: How does the distribution function for the classes shown on the left look like? (Choose an appropriate upper limit for the final class). 1. Statistical attributes and variables 26 Frequency density and histograms By focussing on the upper limits (or any other single point) of the classes we lose information about the distribution within the classes. The assumption of a uniform distribution leads to the definition below. Definition: Let HK (x ) be the distribution function of a characteristic X obtained by size classes with upper class limits ξ1 , ξ2 ,... , ξm. Then, the ratio HK (ξi ) − HK (ξi −1 ) hi = ξi − ξi −1 ∆i is called the (average) frequency density of the ith size class (i = 1,... , m). 1. Statistical attributes and variables 27 Frequency density and histograms distribution function Approximation of the distribution function Hk by a polygonal line H̄ (x ) Taking the derivative of H̄ (x ) leads to the (average) frequency density function h̄(x ) of the size classes: dH̄ (x ) h̄(x ) := dx Its graph is called histogram. histogram The area of a column corresponds to the relative class frequency. The total area of the columns of the histogram is one. 1. Statistical attributes and variables 28 Frequency density and histograms distribution function Approximation of the distribution function of the classes Hk by a smooth curve H̃ (x ) Taking the derivative of H̃ (x ) leads to density function. density function dH̃ (x ) h̃(x ) := dx 1. Statistical attributes and variables 29 Control questions 1. What is the difference between characteristic and variable? 2. What different types of scales are there? Examples! 3. Why are mainly representative random samples taken into account in practice? 4. What are the properties of the step function? What is its information content? 5. Why is the formation of size classes often necessary? 6. What is the implicit assumption underlying the approximating distribution function H̄ (x )? 7. What is the difference between a bar chart (look up definition if necessary) and a histogram? Under what condition do they both look the same? 1. Statistical attributes and variables 30 2 Measures to describe statistical distributions 2.1 Measures to describe statistical distributions..................................... 32 2.2 Measures of central tendency...................................................... 33 2.3 Mode.................................................................................34 2.4 Arithmetic mean...................................................................... 36 2.5 Geometric mean..................................................................... 38 2.6 Harmonic mean...................................................................... 43 2.7 Measures of variation............................................................... 45 2.8 Variance and standard deviation...................................................... 49 2.9 Robust measures................................................................... 58 2.10 Median............................................................................... 59 2.11 Quartiles............................................................................. 61 2.12 Quantiles............................................................................ 62 2.13 Five-point summary and box plot..................................................... 69 according to [Schira, 2016], chapter 2 see also: [Anderson et al., 2012], chapter 3; and [Newbold et al., 2013], chapter 2 2. Measures to describe statistical distributions 31 Measures to describe statistical distributions Especially for statistical data sets with many different characteristical values, one would like to describe the entire distribution of the characteristic with the help of a few numbers. Such numbers are called measures or parameters of a distribution. We distinguish between measures of location measures of dispersion 2. Measures to describe statistical distributions 32 Measures of central tendency Definition: A measure of central tendency or measure of location is a parameter used to de- scribe the distribution of a random variable and provides a „typical“ value. In particular, it describes the location of the data set, i.e. where or in which order of magnitude the values of the variable are located. 2. Measures to describe statistical distributions 33 Mode Definition: A number xMod with h(xMod ) ≥ h(xi ) for all i is called mode or modal value of an empirical data set. Useful measure of location especially for purely qualitative characteristics 2. Measures to describe statistical distributions 34 Mode Examples: The data set 2 3 3 4 4 4 5 6 has the mode xMod = 4 (and is thus unimodal) There are two „most frequent“ values in the data set 1 2 3 3 3 4 5 6 6 6 7 namely the values 3 and 6. The mode is the value that occurs with highest frequency 2. Measures to describe statistical distributions 35 Arithmetic mean Definition: The value n 1X x̄ := xj n j =1 is called arithmetic mean, mean value, or average of a statistical distribution. Alternative calculation using absolute or relative frequencies: k k 1X X x̄ = nj xj = hj xj n j =1 j =1 2. Measures to describe statistical distributions 36 Arithmetic mean Properties of the arithmetic mean 1. Central property n X (xj − x̄ ) = 0 j =1 2. Shifting all values of a data set X by the constant value a shifts the arithmetic mean by exactly this value: yi := xi + a ⇒ ȳ = x̄ + a 3. Multiplication of all values of a data set X with the constant factor b multiplies the arithmetic mean by exactly this value: zi := b · xi ⇒ z̄ = b · x̄ 2. Measures to describe statistical distributions 37 Geometric mean Definition: For the geometric mean √ n GX := x1 · x2 · · · · · xn , xi > 0 For the geometric mean, the individual characteristic values are multiplied and the n-th root is taken from the product. It is only defined if all values of the data set X are positive. The logarithm of the geometric mean corresponds to the arithmetic mean of the logarithms. (important for the calculation of overall return on an investment) n 1X log GX = log xi n i =1 2. Measures to describe statistical distributions 38 Geometric mean Example: For the data set X with the values 2 6 12 9 the geometric mean is GX = 6 and the arithmetic mean is x̄ = 7.25 Note: The geometric mean for each set with only positive values is always smaller than the arithmetic mean unless all the values in the data set are the same. 2. Measures to describe statistical distributions 39 Geometric mean Example In five consecutive years the turnover Y given in thousand C) of a company developed as follows: 2. Measures to describe statistical distributions 40 Geometric mean Example (cont’d) Question: Which mean is best suited for the calculation of the average growth? Arithmetic mean: 1 + r = (1.20 + 0.85 + 1.40 + 1.25)/4 = 1.175 Geometric mean: √ 4 G1+r = 1.20 · 0.85 · 1.40 · 1.25 = 1.1559 An average increase in turnover of 17.5 % would result in a turnover of 2287 kC in 2001 whereas an average increase in turnover of 15.59 % would result in the actual value of 2142 kC. 2. Measures to describe statistical distributions 41 Example Stock return Financial advisor: The share is a top investment, with an average return of Arithmetic mean: 25 %. 1 1+r = 2 · ((1 + 100 %) + (1 − 50 %)) = 1 + 25 % ⇒ 25 % Geometric mean: √ 2 G1+r = 2 · 0.5 = 1 ⇒ 0% 2. Measures to describe statistical distributions 42 Harmonic mean Definition: From the values xi > 0 of a data set, one can calculate the reciprocal values 1/xi and then calculate the arithmetic mean of these values 1 1 1 + ··· + n x1 xn Taking the reciprocal of the result again, yields the so-called harmonic mean n HX := n 1. P j =1 xj 2. Measures to describe statistical distributions 43 Harmonic mean Example: For the data set X with the values 2 6 12 9 the harmonic mean is HX = 4.645 the geometric mean is GX = 6 and the arithmetic mean is x̄ = 7.25 Note: For every data set with (different) positive values, it can be shown that HX < GX < x̄ 2. Measures to describe statistical distributions 44 Example: Two trucks travel at speeds of v1 = 60 km/h and v2 = 80 km/h on the highway. Thus the average speed (arithmetic mean) is 1 km km km v̄ = 60 + 80 = 70. 2 h h h To estimate the (average) transport times t̄ and thus transport capacities and transport costs for a distance of say from Hamburg to Duisburg, one would divide the corresponding distance d = 420 km by this value and obtained with d 420 km = = 6h, v̄ 70 km/h a wrong value. Indeed, the transport times of the two trucks are t1 = 7 h and t2 = 5.25 h. Thus the average transport time is t̄ = 6.125 h. If, on the contrary, one divides the distance by the harmonic mean 2 480 km km HV = 1 1 = ≈ 68.57 , 60 km/h + 80 km/h 7 h h 44 - 1 then one receives with d 420 km = 480 = 6.125 h = t̄ HV 7 km/h the correct result. If you are doing a salary calculation based on an hourly wage, then this question is highly relevant. Question: Why is the first calculation wrong? In this example we want to calculate an average transport time for a fixed distance d. The problem with the average speed is that it is not valid over the whole time, because the first truck arrives already after 5.25 h and then stops while the other one is still moving. For the calculation of the mean transportation time, the speeds are in the denominator due to the principle ti = vd. This leads to the i harmonic mean: 1 1 d d t̄ = (t1 + · · · + tn ) = + ··· + n n v1 vn 1 1 1 1 d =d· + ··· + =d· =. n v1 vn HV HV 44 - 2 If, in contrast, we want to know how far the trucks have come on (arithmetic) average after a certain time t, the calculation 1 1 d̄ = (d1 + · · · + dn ) = (t · v1 + · · · + t · vn ) n n 1 =t· (v1 + · · · + vn ) = t · v̄ n is the correct one. 44 - 3 Measures of variation Three histograms: distribution with different degrees of spread or variability The extent of the spread or variation or dispersion of a distribution needs to be ex- pressed as a measure. 2. Measures to describe statistical distributions 45 Measures of variation The descriptive statistics provides some measures of variation: Definition: The range is the difference between the largest and the smallest value in a data set: range := xmax − xmin Definition: The so-called mean absolute deviation n 1X MAD := |xj − x̄ | n j =1 is calculated as the arithmetic mean of the amounts of the deviations of the characteristic values from their mean. 2. Measures to describe statistical distributions 46 Measures of variation We recall the median and the quartiles Q1 ≤ Q2 = xMed ≤ Q3 , that divide the ordered data set into four approximately equally sized parts. Definition: The difference IQR := Q3 − Q1 is known as the interquartile range. Definition: The arithmetic mean of the deviation of the quartiles from the median is called the quar- tile deviation or semi-interquartile range (Q3 − Q2 ) + (Q2 − Q1 ) IQR QD := = 2 2 2. Measures to describe statistical distributions 47 Measures of variation Example: For a data set with n = 14 values we are looking for the quartile deviation. As median we take the arithmetic mean of both neighbours and obtain Q2 = xMed = 26.8. 2. Measures to describe statistical distributions 48 Variance and standard deviation These are the most important measures of variation in statistics: Definition: The average quadratic deviation from the arithmetic mean n 2 1X sX := (xj − x̄ )2 n j =1 is called empirical variance or in short variance of an observed data set X. Calculation also using relative frequencies: k X 2 2 sX := hj (xj − x̄ ) j =1 Definition: The positive root of the variance q sX := + sX2 is called standard deviation. 2. Measures to describe statistical distributions 49 Variance and standard deviation Example Pk 2 Variance calculation using relative frequencies: sX2 := j =1 hj (xj − x̄ ) The following distribution is given: xi 4 5 6 1 1 1 hi 4 2 4 The arithmetic mean is x̄ = 5. Its variance is 2 1 2 1 1 sX = (4 − 5) ·+ (5 − 5)2 · + (6 − 5)2 · 4 2 4 1 1 1 = +0+ = 4 4 2 and its standard deviation is r 1 1 sX = = √ ≈ 0.7071. 2 2 2. Measures to describe statistical distributions 50 Variance and standard deviation Example For the data set 3 5 9 9 6 6 3 7 7 6 7 6 5 7 6 9 6 5 3 5 consisting of 20 numbers, we use the following working table: Arithmetic mean : 6 Variance : 3.1 Standard deviation: 1.761 2. Measures to describe statistical distributions 51 Variance and standard deviation The variance can also be calculated for density functions. Example: Assume that the distribution of a statistical variable has been approximated by a frequency density function h(x ) in the interval 0 < x < 2 given by the parabola ( As the graph shows, for symmetry reasons, the 3 x − 12 x 2 x ∈ (0, 2) h (x ) = 2 mean value is equal to 1. To calculate the value, 0 otherwise. the summation symbol is replaced by the integral Z∞ x̄ = x · h(x ) dx −∞ In the same way the variance is calculated by Z∞ 2 sx = (x − x̄ )2 h(x ) dx −∞ 2. Measures to describe statistical distributions 52 Variance and standard deviation Example (continued): Hence we calculate the variance as the definite integral Z2 Z2 2 2 23 1 2 sX = (x − x̄ ) h(x ) dx = (x − 1) x− x dx 2 2 0 0 Z2 3 5 2 3 1 4 = x− x + 2x − x dx 2 2 2 0 2 3 1 2 5 3 1 4 1 5 = x − x + x − x 2 2 6 2 10 0 3 4 40 16 32 24 1 = − + − = 3 − 10 + 12 − = 2 2 6 2 10 5 5 q 1 and the standard deviation as its root: sX = 5 ≈ 0.4472. 2. Measures to describe statistical distributions 53 Variance and standard deviation Properties of the variance 1. The variance is always greater than or equal to zero: 2 sX ≥ 0 2. Shifting all values of a data set X by the constant value a leaves the variance unchanged: 2 2 yi := xi + a ⇒ sY = sX 3. Multiplication of all values of a data set X by a constant factor b multiplies the variance by the square of this value: 2 2 2 zi := b · xi ⇒ sZ = b · sX Note: sZ = |b| sX 2. Measures to describe statistical distributions 54 Variance and standard deviation S TEINER’s translation theorem For each constant d ∈ R it holds that n n 1X 1X (xj − x̄ )2 = (xj − d )2 − (x̄ − d )2 (1) n n j =1 j =1 where (x̄ − d ) is the shift (=translation) from the mean. Properties of the variance (contd.) 4. In the special case of d = 0, the following formula for the simplified calculation of the variance is obtained n 2 1X 2 2 2 sX = xj − x̄ = x 2 − x̄ (2) n j =1 Exercise : Use formula (2) to recalculate the variances of the preceding examples. 2. Measures to describe statistical distributions 55 Minimum property of the variance Since in the translation theorem (1) the term (x̄ − d )2 can never be negative, for every d ̸= x̄ it always holds n n 1X 2 1X 2 xj − x̄ < xj − d. n n j =1 j =1 This means that the average quadratic deviation from the arithmetic mean x̄ is always smaller than the average quadratic deviation from any other value d (minimum property). Multiplying the equation by n, we get for the sum of the squared deviations, orthe residual sum of squares (RSS), from any d ∈ R: n n X 2 X 2 RSS(d ) := xj − d ≥ xj − x̄. j =1 j =1 That is, RSS becomes minimal in x̄. This provides us with an alternative definition of the mean: Definition: RSS(d ) −→ min d ⇒ Principle of least squares. 55 - 1 Variance and standard deviation Definition: The quotient of the standard deviation and the absolute value of the mean of a data set with x̄ ̸= 0 sX CVX := |x̄ | is called the coefficient of variation. The coefficient of variation is a relative measure. It measures the dispersion relative to the level or absolute size of the data set and thus makes sets with different scales comparable. 2. Measures to describe statistical distributions 56 Variance and standard deviation Example: Over a period of 250 trading days, the Volkswagen share price had a mean value of 174.56 C and a standard deviation of 10.28 C. For the same period, a standard deviation of 4.68 C with a mean value of 36.96 C is determined for the BMW AG share. The two coefficients of variation as a measure of the volatility of the share prices are as follows 10.28 C CVX = = 0.0589 for VW and 174.56 C 4.68 C CVY = = 0.1266 for BMW 36.96 C Thus, despite a lower absolute standard deviation, BMW stock is more volatile in relative terms. 2. Measures to describe statistical distributions 57 Robust measures The measures presented so far are quite sensitive to outliers. This means that strong deviations of individual values significantly influence these measures. This is not the case with so-called robust measures. Definition: Starting with the raw data x = (x1 , x2 ,... , xn ) of a data set of size n, the characteristic values xi are arranged in ascending order: x(1) ≤ x(2) ≤ · · · ≤ x(n). The resulting list x( ) = (x(1) , x(2) ,... , x(n) ) is called an ordered sample (for the raw data x). Annotation: In the following, the parentheses in the index are omitted for ordered samples. It should always be clear from the context whether a data set is ordered or not. 2. Measures to describe statistical distributions 58 Median First order and renumber the observed values: x1 ≤ x2 ≤ x3 ≤ · · · ≤ xi ≤ · · · ≤ xn. New indexing is also called „rank“. Definition: A number xMed with relF(X ≤ xMed ) ≥ 50 % relF(X ≥ xMed ) ≥ 50 % is called the median or central value of the empirical data set X. For an even number n it can happen that the median is not uniquely determined. For a unique value we define x n + 1 if n odd xMed = 1 2 2 x n + x n +1 if n even 2 2 2. Measures to describe statistical distributions 59 Median Outliers tend to influ- Example: ence the mean, but not the median. The data set 4 7 7 7 12 12 13 16 19 23 23 97 has the arithmetic mean x̄ = 20 and the median xMed = 12.5 2. Measures to describe statistical distributions 60 Annotation: With the definition of the median via the relative frequencies relF(X ≤ xMed ) ≥ 50 % relF(X ≥ xMed ) ≥ 50 % one obtains the potentially non-unique definition ( xMed = x n+1 if n odd, 2 x n ≤ xMed ≤ x n +1 if n even. 2 2 Strictly speaking, in the previous example, 12 or 13 or 12.2 would also be medians, since they divide the data in the middle as well. 60 - 1 In practice, the arithmetic mean and the median are the most important characteristic measures of location for a given distribution. Colloquially, however, a distinction is not always made between the two measures, especially when it comes to income or wealth distributions. Example: Median income is the income for which there are just as many people with a higher income as with a lower income. Me- dian income, which is explicitly not identical with average income, is used in the social sci- ences and economics to undertake poverty calculations, for example. It is more robust against outliers in a sample and is therefore often preferred to the arithmetic mean (aver- age). Think about when and why the average income differs from the median income? 60 - 2 Quartiles In addition to the median, two other values can be defined that further divide the ordered statistical data set: Definition: The characteristic values of the data set are arranged in ascending order x1 ≤ x2 ≤ · · · ≤ xn and divided into four segments with (as far as possible) the same number of values. The three values Q1 ≤ Q2 = xMed ≤ Q3 are called quartiles and are defined in such a way that they lie in beween the four segments just as the median xMed does. Consequently about 50 % of the observations are found between Q1 and Q3. Median and quartiles more general quantiles 2. Measures to describe statistical distributions 61 Quantiles Definition: A number x[q ] with 0 < q < 1 is called q-quantile if it splits the data set X such that at least 100 · q % of its observed values are less than or equal to x[q ] and at the same time at least 100 · (1 − q )% are greater than or equal to x[q ] , that is: relF(X ≤ x[q ] ) ≥ q and relF(X ≥ x[q ] ) ≥ 1 − q. Special quantiles: Quartiles: Q1 = x[0.25] lower quartile Q2 = x[0.5] = xMed Median Q3 = x[0.75] upper quartile Deciles: x[0.1] , x[0.2] , x[0.3] ,... , x[0.9] Percentiles: x[0.01] , x[0.02] , x[0.03] ,... , x[0.99] 2. Measures to describe statistical distributions 62 Quantiles Calculation of the q-quantiles: For continuously approximated distribution functions, the following holds for the q-quantiles H (x[q ] ) = q , This yields the q quantiles from the inverse of the distribution function, x[q ] = H −1 (q ) This also works for step function shaped distribution functions if one directly hits a jump. However, if one lands on a stairstep, the inverse function is not uniquely determined. Then, in fact, every value between the adjacent jumps is a q-quantile: xi ≤ x[q ] ≤ xi +1. To obtain a unique value, one then usually takes the arithmetic mean of both jumps x[q ] = 21 (xi + xi +1 ). 2. Measures to describe statistical distributions 63 Quantiles 2. Measures to describe statistical distributions 64 Quantiles For a data set, the q-quantile can also be determined without the detour via the graph of the distribution function The q-quantile of an ordered data set x1 ,... , xn is determined by ( 1 2 (xn·q + xn·q +1 ) if n · q is an integer x[q ] = x⌈n·q ⌉ otherwise Here ⌈n · q ⌉ means that the number n · q is rounded up to the nearest integer. 2. Measures to describe statistical distributions 65 Quantiles Example Table: Turnover of large industrial companies in Germany (2005, in million C) Company Turnover Company Turnover 20 DaimlerChrysler 149 776 10 Bayer 27 383 19 Volkswagen 95 268 9 Shell Deutschland 24 300 18 Siemens 75 445 8 RAG AG 21 869 17 E.ON 51 854 7 Hochtief 14 854 16 BMW 46 656 6 MAN 14 671 15 BASF 42 745 5 Continental 13 837 14 ThyssenKrupp 42 064 4 Henkel 11 974 13 Bosch 41 461 3 ZF Friedrichshafen 10 833 12 RWE 40 518 2 EnBW 10 769 11 Deutsche BP 37 432 1 Vattenfall 10 543 1 1 x[0.80] = 2 (x16 + x17 ) = 2 (46 656 + 51 854) = 49 255 2. Measures to describe statistical distributions 66 Quantiles Example (cont’d) 150 000 120 000 90 000 60 000 80 %-quantile = 49 255 30 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Turnover of 20 industrial companies and 80 % quantile 2. Measures to describe statistical distributions 67 Quantiles Example 2. Measures to describe statistical distributions 68 Five-point summary and box plot The distribution of a data set can be analyzed quite well with only a few values. In practice, one often uses the so-called five-point summary. (xmin , x[0.25] , xMed , x[0.75] , xmax ) It divides the data set into four parts, so that each part contains about a quarter of the observed values. It contains the median as a measure of location and the range and interquartile range IQR as measures of variation. Definition: The graphical representation of the five-point summary is called a box plot. xmin xmax x[0.25] xMed x[0.75] 2. Measures to describe statistical distributions 69 Control questions 1. What is the central property of the arithmetic mean? 2. When is the arithmetic and when is the geometric mean used and why? 3. How does the variance change if all values of a data set are converted from DM to euro? 4. Describe the translation theorem as a property of the variance! What feature results from the special case d = 0? 5. What is the minimum property of the variance? What does the principle of least squares mean in this context? 6. What does the coefficient of variation mean? Which measure from portfolio theory in business administration comes to your mind? 2. Measures to describe statistical distributions 70 3 Two dimensional distributions 3.1 Scatterplot and joint distribution...................................................... 72 3.2 Marginal distributions................................................................. 77 3.3 Conditional distributions and statistical correlations................................... 82 3.4 Covariance........................................................................... 90 3.5 Correlation coefficient................................................................ 94 3.6 Rank correlation...................................................................... 99 according to: [Schira, 2016], chapter 3 see also: [Anderson et al., 2012], chapter 3.5 and [Newbold et al., 2013], chapter 2.4 3. Two dimensional distributions 71 Scatterplot and joint distribution Multi-dimensional statistics: Each statistical unit ωi of a population Ω can have a variety of characteristics. Definition: The univariate statistics takes only one characteristic or variable into account. The multivariate statistics considers several variables for each unit ωi X1 (ωi ), X2 (ωi ),... , Xm (ωi ) Example: For a person ωi we measure the duration of education X1 (ωi ) and the income X2 (ωi ) five years past to the end of education. 3. Two dimensional distributions 72 Scatterplot and joint distribution Most simple case: two variables X (ωi ) andY (ωi ) are of interest. The result is paired data (xi , yi ) for each ωi. These can be represented as points P1 := (x1 , y1 ), P2 := (x2 , y2 ),..., Pn := (xn , yn ) in a scatter plot. 3. Two dimensional distributions 73 Scatterplot and joint distribution Definition: The contingency table represents the joint distribution of the statistical variables X and Y in a concise way. y1 y2... yj... yl Σ x1 n11 n12... n1j... n1l n1 x2 n21 n22... n2j... n2l n2.................. sum of row xi ni1 ni2... nij... nil ni.................. xk nk 1 nk 2... nkj... nkl nk Σ n 1 n 2... n j... n l n total sum sum of column Here nij = absF(X = xi ∩ Y = yj ) is the absolute frequency with which the combination (xi , yj ) was observed, Pl Pk ni = j =1 nij or n j = i =1 nij the absolute frequency with which xi or yj was observed. ⇒ marginal frequency 3. Two dimensional distributions 74 Scatterplot and joint distribution Real life example: Routes of three soccer players at Bayern Munich Task: Match the following players to their respective contingency tables: 1. Thomas Müller 2. Franck Ribéry 3. Arjen Robben A B C This representation of the contingency table is also called a heat map. Solution: 1B, 2C, 3A 3. Two dimensional distributions 75 Scatterplot and joint distribution A representation with relative frequencies is also common. For this purpose, the absolute frequencies, including the marginal frequencies, are divided by n. y1 y2... yj... yl Σ x1 h11 h12... h1j... h1l h1 x2 h21 h22... h2j... h2l h2.................. sum of row xi hi1 hi2... hij... hil hi.................. xk hk 1 hk 2... hkj... hkl hk Σ h 1 h 2... h j... h l 1 total sum Here sum of column hij = relF(X = xi ∩ Y = yj ) is the relative frequency with which the combination (xi , yj ) was observed, Pl Pk hi = j =1 hij or h j = i =1 hij the relative frequency with which xi or yj was observed. ⇒ marginal frequency 3. Two dimensional distributions 76 Marginal distributions Definition: The one-dimensional distributions ni hi = relF(X = xi ) = , i = 1,... , k n and n j h j = relF(Y = yj ) = , j = 1,... , l n are called marginal distributions of the statistical variables X and Y. 3. Two dimensional distributions 77 Marginal distributions Calculation of mean and variance: The mean and variance of individual components X and Y of two- or multidimensional random variables are easily calculated using the marginal distributions (for cardinally measurable variables): k X l X x̄ = hi xi ȳ = h j yj i =1 j =1 k X l X 2 2 2 2 sX = hi (xi − x̄ ) sY = h j (yj − ȳ ) i =1 j =1 3. Two dimensional distributions 78 Marginal distributions Example Abstract calculation example for a two-dimensional frequency distribution: Characteristic values for X : x1 = 30, x2 = 60; andY : y1 = 1, y2 = 2, y3 = 4 Observed data: (30, 1), (30, 2), (60, 4), (30, 2), (60, 1), (30, 4),... , (60, 2). Sort and count: 24 × (30, 1), 24 × (30, 2), 32 × (30, 4),... , 68 × (60, 4) Contingency table Y 1 2 4 30 24 24 32 80 X 60 16 36 68 120 40 60 100 200 3. Two dimensional distributions 79 Marginal distributions Example The relative frequencies are obtained by dividing all values by n = 200 Y marginal distribution of X 1 2 4 30 0.12 0.12 0.16 0.4 X 60 0.08 0.18 0.34 0.6 0.2 0.3 0.5 1 marginal distribution of Y 3. Two dimensional distributions 80 Marginal distributions Example The marginal distribution for X : k X x̄ = hi xi = 48 xi 30 60 i =1 k hi 0.4 0.6 2 X 2 sX = hi (xi − x̄ ) = 216 i =1 The marginal distribution for Y : l X ȳ = h j yj = 2.8 yj 1 2 4 j =1 l h j 0.2 0.3 0.5 2 X 2 sY = h j (yj − ȳ ) = 1.56 j =1 3. Two dimensional distributions 81 Conditional distributions and statistical correlations We now consider the distribution of X , given that (conditional on) Y has a fixed value yj. Definition: Normalizing the columns of the contingency table to a column sum of 1 leads to a total of l one-dimensional distributions for j = 1,... , l. These are called conditional distri- butions of X (conditional on Y = yi ), hij hi |Y =yj = relF(X = xi |Y = yj ) =. h j Similarly, normalizing the rows to a row sum of 1 for i = 1,... , k leads to the conditional distributions of Y (conditional on X = xi ), hij hj |X =xi = relF(Y = yj |X = xi ) =. hi 3. Two dimensional distributions 82 Conditional distributions and statistical correlations Example: For the joint distribution of the previous numerical example, there are three conditional distributions of X and a marginal distribution of X : X hi |Y =1 hi |Y =2 hi |Y =4 hi 30 0.60 0.40 0.32 0.4 60 0.40 0.60 0.68 0.6 1 1 1 1... and two conditional distributions of Y and a marginal distribution of Y Y 1 2 4 hj |X =30 0.300 0.300 0.400 1 hj |X =60 0.133 0.300 0.567 1 h j 0.2 0.3 0.5 1 Observation: The conditional distributions differ. This gives an indication of a dependence of the statistical variables X andY. 3. Two dimensional distributions 83 Conditional distributions and statistical correlations Definition: If the joint distribution hij of the statistical variables X and Y is equal to the product of the two marginal distributions hij = hi · h j fori = 1,... , k andj = 1,... , l, then X and Y are called statistically independent. Otherwise, there is a statistical correlation. We can distinguish between linear and nonlinear statistical correlations. 3. Two dimensional distributions 84 Conditional distributions and statistical correlations Properties of independent statistical variables For independent statistical variables, the conditional distributions are identical and equal to the marginal distribution. Thus, for all j = 1,... , l conditional distributions of X , it holds that hi |Y =yj = hi , i = 1,... , k and for all i = 1,... , k conditional distributions of Y hj |X =xi = h j , j = 1,... , l 3. Two dimensional distributions 85 Conditional distributions and statistical correlations Practical example: stock returns in the US (since 1963) Are daily stock returns on announcement days (AD = FED meetings and labor statistics) different from non-announcement days (ND)? absolute frequency ≤−2 % (−2 %,−1 %] (−1 %,0 %] (0 %,1 %] (1 %,2 %] >2 % AD 27 63 305 385 111 31 922 ND 286 963 4341 5349 1003 264 12206 313 1026 4646 5734 1114 295 13128 relative frequency ≤−2 % (−2 %,−1 %] (−1 %,0 %] (0 %,1 %] (1 %,2 %] >2 % AD 0.0021 0.0048 0.0232 0.0293 0.0085 0.0024 0.0702 ND 0.0218 0.0734 0.3307 0.4074 0.0764 0.0201 0.9298 0.0238 0.0782 0.3539 0.4368 0.0849 0.0225 1 3. Two dimensional distributions 86 Conditional distributions and statistical correlations conditional distribution ≤−2 % (−2 %,−1 %] (−1 %,0 %] (0 %,1 %] (1 %,2 %] >2 % hj |X =AD 0.0293 0.0683 0.3308 0.4176 0.1204 0.0336 1 hj |X =ND 0.0234 0.0789 0.3556 0.4382 0.0822 0.0216 1 h j 0.0238 0.0782 0.3539 0.4368 0.0849 0.0225 1 Daily returns in the U.S. are dependent from AD or ND! 3. Two dimensional distributions 87 Conditional distributions and statistical correlations conditional distribution ≤−2 % (−2 %,−1 %] (−1 %,0 %] (0 %,1 %] (1 %,2 %] >2 % hj |X =AD 0.0293 0.0683 0.3308 0.4176 0.1204 0.0336 1 hj |X =ND 0.0234 0.0789 0.3556 0.4382 0.0822 0.0216 1 h j 0.0238 0.0782 0.3539 0.4368 0.0849 0.0225 1 Daily returns in the U.S. are dependent from AD or ND! Question: Given independence, how would the joint distribution look like if we keep the marginal distributions the same? relative frequency ≤−2 % (−2 %,−1 %] (−1 %,0 %] (0 %,1 %] (1 %,2 %] >2 % AD 0.0702 ND 0.9298 0.0238 0.0782 0.3539 0.4368 0.0849 0.0225 1 3. Two dimensional distributions 87 Conditional distributions and statistical correlations Mean of the sum and the difference: The elements ωi , i = 1,... , n, of a statistical mass Ω of extent n have been analyzed for two characteristics, and the statistical variables xi = X (ωi ) and yi = Y (ωi ) have been collected as paired data. From both variables, both the means and the variances have been calculated. It holds that: The mean value of a sum (difference) is equal to the sum (difference) of the mean values: x + y = x̄ + ȳ x − y = x̄ − ȳ This is true regardless of the joint distribution and equally true for statistically independent as well as statistically dependent variables. 3. Two dimensional distributions 88 Conditional distributions and statistical correlations Variance of the sum and the difference: The variance is calculated by applying the binomial formula: Variance of the sum n 2 2 2 1X sX +Y = sX + sY + 2 · (xj − x̄ )(yj − ȳ ) n j =1 Variance of the difference n 2 2 2 1X sX −Y = sX + sY − 2 · (xj − x̄ )(yj − ȳ ) n j =1 Special case : n 2 2 2 1X sX ±Y = sX + sY , if cXY := (xj − x̄ )(yj − ȳ ) = 0 n j =1 3. Two dimensional distributions 89 Covariance Definition: The quantity calculated from the n pairs of values (xi , yi ) n 1X cXY := (xj − x̄ )(yj − ȳ ) n j =1 is called the empirical covariance or, in short, the covariance between the statistical variables X and Y and is a measure for the linear dependency between X and Y. Simplified calculation : n 1X cXY := xj · yj − x̄ · ȳ = x · y − x̄ · ȳ n j =1 3. Two dimensional distributions 90 Covariance Illustration of the covariance: 3. Two dimensional distributions 91 Covariance The covariance can also be calculated using the relative frequencies from the contin- gency table: k X X l cXY := hij (xi − x̄ )(yj − ȳ ) i =1 j =1 Simplified calculation : k X X l cXY := hij xi yj −x̄ · ȳ i =1 j =1 | {z } x ·y 3. Two dimensional distributions 92 Covariance Covariance and dependency Proposition : If two variables X and Y are statistically independent, then the covariance cXY between them is zero. This proposition is not reversible because the covariance measures only the linear part of the statistical dependence. correct: X andY are independent ⇒ cXY = 0 correct: cXY ̸= 0 ⇒ X andY are linearly dependent incorrect: cXY = 0 ⇒ X andY are independent 3. Two dimensional distributions 93 Correlation coefficient Definition: The ratio cXY rXY := sX · sY is called (empirical) correlation coefficient between X and Y Scatter plots and correlation coefficients: rXY = 0.97 rXY = −0.52 rXY = 0.06 3. Two dimensional distributions 94 Correlation coefficient Properties : The correlation coefficient represents a normalized measure of the strength of the linear statistical relationship: −1 ≤ rXY ≤ 1 The absolute value of the correlation coefficient remains unchanged if one or both variables are transformed linearly. Suppose that U := a1 + b1 X and V := a2 + b2 Y with b1 , b2 ̸= 0. Then we obtain cUV b1 · b2 · cXY b1 · b2 rUV = = = rXY. sU · sV |b1 | sX · |b2 | sY |b1 | · |b2 | This means that |rUV | = |rXY |. 3. Two dimensional distributions 95 Correlation coefficient Example: Calculation of the correlation coefficient For the joint distribution from the numerical example (slide 79 ff.), one obtains for the covariance using the simplified calculation approach 2 X X 3 2 X X 3 cXY = hij (xi − x̄ )(yj − ȳ ) = hij xi yj − x̄ · ȳ = x · y − x̄ · ȳ i =1 j =1 i =1 j =1 = 138 − 48 · 2.8 = 138 − 134.4 = 3.6 The correlation coefficient is thus 3.6 rXY = √ √ = 0.1961 , 216 · 1.56 which indicates a weak positive correlation. 3. Two dimensional distributions 96 Correlation coefficient Examples goals against age of trainer body weight body height goals scored league position Note: The covariance or the correlation coefficient do not necessarily mean a causal relationship between the characteristics. Merely the just available observations show a statistical tendency, which however could also be purely by chance. Correlation of the day: www.correlated.org :-) 3. Two dimensional distributions 97 Correlation vs. causality Son’s height (inch) Father’s height (inch) causality? correlation Father’s height Son’s height causality? 3. Two dimensional distributions 98 Rank correlation Besides the correlation coefficient according to B RAVAIS -P EARSON there is another one, namely the one according to S PEARMAN, also called rank correlation coefficient. Definition: The rank correlation coefficient or correlation coefficient named after C HARLES E D - WARD S PEARMAN Sp rXY := rrg(X ),rg(Y ) is the correlation coefficient between the ranks of the observations. It is used for ordinally scaled characteristics. 3. Two dimensional distributions 99 Rank correlation Example school grades The following table shows the results of the Abitur examinations of ten students in the subjects German (feature G) and History (feature H). The maximum achievable score is 15 in each case. Pupil i German (G) History (H) rg(G) rg(H) 1 13 15 4 1 2 14 8 2.5 (2) 4 (3) 3 8 1 9 10 4 10 7 7 6.5 (6) 5 15 9 1 2 6 1 5 10 9 7 14 8 2.5 (3) 4 (4) 8 12 7 5 6.5 (7) 9 9 6 8 8 10 11 8 6 4 (5) 3. Two dimensional distributions 100 Rank correlation Example school grades Question: Are the grades correlated? Does good performance in German go along with good knowledge of history? First, we determine the rankings for each student in each of the two subjects. To do this, we arrange the students according to the results they obtained in the subjects. Students with the same result are assigned the arithmetic mean of those rankings they would have received if they had been arranged randomly (given in parentheses in each case). This may result in rankings like 2.5 or 6.5. Then we compute variances, standard deviations, and the covariance of the ranks and obtain with Sp 6.95 rGH = = 0.8581 2.8636 · 2.8284 a fairly positive correlation, which was to be expected. (Compare: rGH = 0.549 ) Question: When should rank correlation be preferred to normal correlation? 3. Two dimensional distributions 101 Control questions 1. What is the difference between univariate and multivariate statistics? Think about an example of bivariate statistics. 2. What is the structure and function of contingency tables? Are there also contingency tables for more than two characteristics? 3. How many marginal distributions does a 3-dimensional statistical distribution have? 4. When is the variance of a sum smaller than the sum of the variances? 5. What is statistical independence? What is the relationship between covariance and independence? 6. What is the meaning of the correlation coefficient? Does an empirical correlation coefficient of 0 imply that there is no factual relationship between the characteristics under consideration? 7. What is a rank correlation? How do you measure it? 3. Two dimensional distributions 102 4 Linear regression 4.1 The regression line.................................................................. 104 4.2 Properties of regression lines....................................................... 110 4.3 Nonlinear and multiple regression...................................................118 according to: [Schira, 2016], chapter 4 see also: [Anderson et al., 2012], chapter 14; and [Newbold et al., 2013], chapter 11 4. Linear regression 103 The regression line Correlation and regression calculation: Covariance and correlation coefficient are only measures. In correlation analysis, the statistical variables (X , Y ) are considered completely equal. The regression calculation goes one step further: The average linear relationship between the characteristic values of a two-dimensional statistical variable (X , Y ) is now to be represented by a linear function, i.e. a straight line y = a + bx in the scatter plot. Here we distinguish between an (in the mathematical sense) independent variable X and a dependent variable Y. This straight line is supposed to be a mean straight line, that is, it is supposed to pass through the observed characteristic values (xi , yi ) in such a way that it indicates the location and main direction of the point cloud in the scatter plot. 4. Linear regression 104 The regression line Example scatter plot regression line 4. Linear regression 105 The regression line y = a + bx „deviation“ ei Point cloud and straight line in scatter plot The method of least squares (LSM) uniquely assigns a mean straight line to the scatter plot. 4. Linear