Stochastics for Engineers PDF 2024-2025
Document Details
Uploaded by FreedJackalope
Universität Klagenfurt
2024
Benjamin A. Robinson, Michaela Szölgyenyi
Tags
Summary
These lecture notes provide an introduction to stochastics for engineers. Topics include probability theory and finite probability spaces, discrete distributions (like uniform, Bernoulli, binomial, geometric, and Poisson distributions), as well as conditional distributions and independence. The notes discuss practical examples, such as coin flipping and dice rolling, and include code samples in Python.
Full Transcript
Stochastics for Engineers Winter Semester 2024–25 Benjamin A. Robinson, Michaela Szölgyenyi Praeamble Preamble Acknowledgements The authors would like to thank two student assistants for preparing the manuscript of these lecture notes. Composition of the manuscript was done by C...
Stochastics for Engineers Winter Semester 2024–25 Benjamin A. Robinson, Michaela Szölgyenyi Praeamble Preamble Acknowledgements The authors would like to thank two student assistants for preparing the manuscript of these lecture notes. Composition of the manuscript was done by Christiane Liebminger. The proofreading was done by Lisa Wulz. 2 Contents Contents 0 Introduction 4 1 Introductory examples 5 1.1 Flipping a coin..................................... 5 1.2 Rolling a dice...................................... 6 2 Discrete probability spaces 9 2.1 Examples of discrete distributions........................... 12 3 Conditional distribution and independence 14 Index 18 3 Chapter 0. Introduction §0 Introduction Stochastics is a subdiscipline of mathematics and contains probability theory and statistics. Both areas study phenomena influenced by randomness. Probability theory engages in the mathematical – theoretical aspect and statistic in collection of data and deducing statements from events in the real world. The theoretical foundation is once again tightly connected to probability theory. In contrast to that, the motivation for studying theoretical questions is often statistics. We are interested in the question: “What are probabilities?” Laplace Experiment: In a Laplace experiment the probability is given as the number of cases of consideration divided by the number of possible cases. In case of a dice: 1 P({3}) =. 6 This only works in finite probability spaces ( # cases < ∞ ) and under the assumption of uniform distribution. Frequentist: We roll the dice 1000 times and observe the outcome. We observe that the 400 number 3 occures 400 times. Hence the probability is set to 1000 = 0.4. Possible problems with this experiment are that the dataset could be too small or the dice could be not fair. Subjectiveness: The professor claims, that 50% of the students are awake in today’s class. It follows that the probability for being awake in class is 12. She certainly did not run an experiment. This value results from experience and personal emotion. (In Bayesian statistics this would be called a prior distribution.) In the 1930s, probability theory was axiomised by the Russian mathematician Andrej Kolmogorov. We will meet these axioms soon. 4 Chapter 1. Introductory examples §1 Introductory examples Finite probability spaces are going to be motivated in the following section by introductory examples. 1.1 Flipping a coin We start by flipping a coin. Therefore we consider a finite set Ω = {Head, Tail}, called sample space and a function P : P(Ω) → R, that assigns each subset of Ω its probability. That is P(Ω) = ∅, {Head}, {Tail}, {Head, Tail}. When using a fair coin, the probabilities are ▷ P(∅) = 0, ▷ P {Head} = P {Tail} = 12 , ▷ P {Head, Tail} = P {Head} + P {Tail} = 1. If we do not know, whether the coin is fair, we may start a test series. Under equal conditions we repeat the single experiment N times, e.g., N = 100. It is important to choose N sufficiently large; a good size of N depends on the experiment. We can let the computer do the coin toss. In Python this experiment works as follow: import random random. choice (["Head", "Tail"]) We repeat the experiment N = 10000 times and get a universe of N = 10000 in our random sample. 5 Chapter 1. Introductory examples 1.2. Rolling a dice Result Head Tail Amount 4931 5069 Table 1.1: Result of throwing a coin with N = 10000 repetitions. The frequency of Head in the random sample is 4931, that of Tail is 5069. 4931 The relative frequency of Head is N = 0.4931 and that of Tail is 0.5096. This value is also called empirical probability. How can we generate random numbers on a computer? The simplest way to generate pseudo-random numbers x1 , x2 ,...xn ∈ (0, 1), which are uniformly distributed, are linear congruence generators (by D.H. Lehmer (1948)). Algorithm 1 generating random numbers INPUT: choose x0 ∈ (0, 1), choose m a (large) prime, choose a ∈ N OUTPUT: x1 , x2 ,...xn for k ∈ 1,..., n do xk = a · xk−1 mod m end for How to choose a: a should satisfy, that the smallest k ∈ N for which ak ≡ 1 mod m is given by k = m − 1. Under the “good” parameter choice, you will be able to generate n = m − 1 pseudo-random numbers, before they start to repeat (with period m − 1). 1.2 Rolling a dice For rolling a dice with 6 faces the sample space is Ω = {1, 2, 3, 4, 5, 6}. It is sufficient to define P only for the subsets of Ω that contain one element: {1}, {2}, {3}, {4}, {5}, {6}. In the case of a fair dice: 1 P({1}) = P({2}) =... = P({6}) =. 6 Again we will run a test series with N = 10000 in Python: dice = [ random. randint (1 ,6) for i in range (10000)] Now we want to plot this data in a histogram. In Python: 6 Chapter 1. Introductory examples 1.2. Rolling a dice Result 1 2 3 4 5 6 Amount 1431 1712 1613 1543 1880 1821 Table 1.2: Result of rolling a dice with N = 10000 repetitions. matplot. pyplot import matplotlib. pyplot as plt plt.hist(dice , bins =6) Figure 1.1: Histogram of the random experiment “rolling a dice”. We are ready to do some basic statistics. Mean: N 1 X x= xi , N i=1 where xi are the realisations in our random sample. In our case 10000 1 X x= xi = 3.6192. 10000 i=1 7 Chapter 1. Introductory examples 1.2. Rolling a dice Empirical variance: It measures the deviation from the mean and is given by N 10000 1 X 1 X ṽ = (xi − x)2 = (xi − 3.6192)2. N i=1 10000 i=1 N 1 X v= (xi − x)2 → unbiased. N − 1 i=1 Empirical standard deviation: √ √ s = v, respectively s̃ = ṽ. Median: First, the sample needs to be sorted, yielding the sorted random sample y. Then ( y n+1 n odd, median(x) := 2 y n2 n even. In our example: median(x) = 4. There are many more statistical quantities out there which we are not going to discuss in this lecture. We are ready to put our considerations onto a theoretical basis. 8 Chapter 2. Discrete probability spaces §2 Discrete probability spaces We want to give our thoughts a theoretical basis and define a notion of probability that satisfies certain axioms. Definition 2.1: sample space, outcomes, events We call Ω ⊆ N0 = {0, 1, 2,... } a sample space. The elements ω ∈ Ω are called outcomes, and the subsets A ⊆ Ω events. Let P(Ω) denote the power set of Ω, i.e. the collection of all subsets of Ω (or equivalently the collection of all events). Definition 2.2: probability A probability is a map P : P(Ω) → R that satisfies the following 3 statements, which are known as Kolmogorov’s axioms: 1. For each event A ⊂ Ω the probability lies between 0 and 1, that is 0 ≤ P(A) ≤ 1; 2. The “sure” event Ω has probability 1, that is P(Ω) = 1; 3. Let I be a countable index set (e.g., I = N) and let (Ai )i∈I be a collection of pairwise disjoint events; i.e. Ai ∩ Aj = ∅ for all i, j ∈ I with i ̸= j. Then the probability of the union of these events is equal to the sum of their probabilities; i.e. [ X P Ai = P(Ai ). i∈I i∈I 9 Chapter 2. Discrete probability spaces Definition 2.3: discrete probability space A discrete probability space (Ω, p) consists of a sample space Ω, which is an at most countable set, and a probability function p : Ω → [0, 1] satisfying X p(ω) = 1. ω∈Ω The function P : P(Ω) → [0, 1] given for all A ⊆ Ω by X P(A) = p(ω) ω∈A is called a discrete probability (measure/distribution). We may then also write (Ω, P) to describe the same discrete probability space. Example 2.4 (Rolling a dice) Consider again the experiment of rolling a dice from Section 1.2. The sample space is Ω = {1, 2, 3, 4, 5, 6} and the probability function p : Ω → [0, 1] is given by p(ω) = 1/6 for all ω ∈ Ω. The event of “rolling an odd number” is given by A = {1, 3, 5} ∈ P(Ω) and we can calculate its probability by 1 1 1 1 P(A) = p(1) + p(3) + p(4) = + + =. 6 6 6 2 △ For any finite set A, let #A denote the number of elements in A. Example 2.5 (Uniform distribution) Let Ω be a finite set and define p : Ω → [0, 1] by p(ω) = 1/#Ω, for all ω ∈ Ω. The associated probability P is called the uniform distribution on Ω. For every event A ⊆ Ω, X 1 X #A # outcomes in A P(A) = p(ω) = = =. ω∈A ω∈A #Ω #Ω #total outcomes △ Remark. We know that events are subsets of Ω. That is the reason why we use set operations: Let A, B ⊆ Ω 10 Chapter 2. Discrete probability spaces A and B A∩B A or B A∪B A minus B A\B not A Ω \ A = Ac A and B are disjoint A∩B =∅ A subset of B A⊂B Table 2.1: Set operations of logical expressions Theorem 2.6 Let (Ω, P) be a discrete probability space. Let A, B ⊆ Ω and let A1 , A2 , · · · ⊆ Ω. Then it holds that ▷ P(∅) = 0, ▷ A ⊆ B ⇒ P(A) ≤ P(B), ▷ P(A) + P(B) = P(A ∪ B) + P(A ∩ B), S P ▷ P i∈N Ai ≤ i∈N P(Ai ), ▷ P(Ac ) = 1 − P(A) Definition 2.7 Let Ω be a set and A, (Ak )k∈N subsets of Ω. We write T ▷ Ak ↓ A for k → ∞, if Ak+1 ⊆ Ak and k∈N Ak = A for every k ∈ N. S ▷ Ak ↑ A for k → ∞, if Ak ⊆ Ak+1 and k∈N Ak = A for every k ∈ N. Proposition 2.8 Let (Ω, P) be a discrete probability space. Then for all subsets A, (Ak )k∈N of Ω with Ak ↑ A or Ak ↓ A, lim P(Ak ) = P(A). k→∞ 11 Chapter 2. Discrete probability spaces 2.1. Examples of discrete distributions 2.1 Examples of discrete distributions Uniform distribution: The uniform distribution for a finite set Ω is given ∀ ω ∈ Ω by 1 p(ω) =. #Ω Bernoulli distribution: Let q ∈ [0, 1] and Ω = {0, 1}. The probability function p is defined by ( q, ω=1 p(ω) =. 1 − q, ω = 0 We call q the probability of success of a Bernoulli experiment. Binomial distribution: Let n ∈ N, q ∈ [0, 1] and Ω = {0, 1, 2,... , n}. Then for all ω ∈ Ω, p is defined by n p(ω) = · q ω · (1 − q)n−ω. ω Then p(ω) is the probability of a total of ω successes when a Bernoulli experiment with success probability q is repeated n times. Geometric distribution: Let q ∈ [0, 1] and Ω = N. The probabilty function p is for all ω ∈ Ω given by p(ω) = q · (1 − q)ω−1. Then p(ω) is the probability that the first success occurs after ω attempts when repeating a Bernoulli experiment with success probability q. Poisson distribution : Let λ ∈ (0, ∞) and Ω = N0 = N ∪ {0}. The probabilty function p is for all ω ∈ Ω given by λω −λ p(ω) = e. ω! For example, the number of trees in a sparse forest is Poisson distributed, but so is the number of customers entering a store per hour. Another important meaning of the Poisson distribution comes from the observation, that the Poisson distribution is the limiting distribution of the binomial distribution for n → ∞ and qn −→ 0. n→∞ 12 Chapter 2. Discrete probability spaces 2.1. Examples of discrete distributions Theorem 2.9: Poisson approximation Let λ ∈ (0, ∞), qn ∈ [0, 1], n ∈ N, so that limn→∞ n · qn = λ, let ((Ωn , pn ))n∈N be a sequence of discrete probability spaces with Ωn = {0, 1,... , n} and binomial distribution with parameters (n, qn ). Let further (Ω, p) be a discrete probability space with Ω = N0 and Poisson distribution with parameter λ. Then for all ω ∈ Ω, lim pn (ω) = p(ω). n→∞ 13 Chapter 3. Conditional distribution and independence §3 Conditional distribution and independence In a lot of cases some information, which shall be included in the calculations of probabilities, is already given. Definition 3.1: conditional distribution Let (Ω, P) be a discrete probability space and A, B ⊆ Ω with P(B) > 0. Then, P(A ∩ B) P(A | B) = P(B) is called the conditional distribution of A given B. Proposition 3.2 Let (Ω, P) be a discrete probability space and let B ⊆ Ω with P(B) > 0. Then P( · | B) is a discrete probability measure on P(Ω). Proof. To prove that P( · | B) is a discrete probability measure, we have to check the Kolmogorov axioms from Definition 2.2. For any A ⊆ Ω, A ∩ B ⊆ B, and so P(A ∩ B) ≤ P(B). Also P(A) ≥ 0 and P(B) > 0. Thus 0 ≤ P(A | B) ≤ 1. Furthermore, P(Ω ∩ B) P(B) P(Ω | B) = = = 1. P(B) P(B) 14 Chapter 3. Conditional distribution and independence Now let (Ai )i∈I be pairwise disjoint subsets of Ω. Then S S [ P i∈I Ai ∩ B P i∈I (A i ∩ B) P Ai | B = = i∈I P(B) P(B) X P(Ai ∩ B) X = = P(Ai | B). i∈I P(B) i∈I Theorem 3.3: Law of total probability Let (Ω, P) be a discrete probability space. Let (Bi )i∈I be events with P(Bi ) > 0 for all S i ∈ I such that (Bi )i∈I are pairwise disjoint and Ω = i∈I Bi. Then for any event A ⊆ Ω, X P(A) = P(A | Bi ) · P(Bi ). i∈I Proof. We calculate [ [ Bi ∼ disj. X Def. X ↓ ↓ P A∩ Bi = P (A ∩ Bi ) = P(A ∩ Bi ) = P(A | Bi ) · P(Bi ). i∈I i∈I i∈I i∈I We conclude by noting that [ P(A) = P(A ∩ Ω) = P A ∩ Bi. i∈I Theorem 3.4: Bayes’ law Let A ⊆ Ω with P(A) > 0 and let B1 , B2 , B3 ,... ⊆ Ω be events which are pairwise S disjoint with P(Bi ) > 0 for all i ∈ I and Ω = i∈I Bi. Then it holds for all j ∈ I P(A | Bj ) · P(Bj ) P(A | Bj ) · P(Bj ) P(Bj | A) = =P. P(A) i∈I P(A | Bi ) · P(Bi ) Proof. For all j ∈ I we have, by Definition 3.1 and Theorem 3.3, P(A | Bj ) · P(Bj ) P(A ∩ Bj ) P(A ∩ Bj ) P = [ = = P(Bj | A). i∈I P(A | Bi ) · P(Bi ) P A∩ Bi P(A) i∈I | {z } =Ω 15 Chapter 3. Conditional distribution and independence Example 3.5 0.1% of the population suffer from a certain rare disease. There is a test to diagnose this disease with the following properties: 99% of people who have the disease will be tested positive and 1% of people who are healthy will also get a positive test result. What is the probability that a person is sick given the test is positive? Let Ω... set of all tested persons, B1... set of all healthy persons, B2... set of all sick persons, A... set of all persons with a positive test result. We know B1 ∪ B2 = Ω, P(B2 ) = 0.001, P(A | B1 ) = 0.01, P(A | B2 ) = 0.99, P(B1 ) = 1 − P(B2 ) = 0.999. Now we apply Theorem (3.4) P(A | B2 ) · P(B2 ) 0.99 · 0.001 P(B2 | A) = = = 0.0902. P(A | B1 ) · P(B1 ) + P(A | B2 ) · P(B2 ) 0.01 · 0.999 + 0.99 · 0.001 Even though the test seems to be fairly precise; if the result is positive then the probability that you are really sick is pretty moderate at 9.02%. △ Lemma 3.6 Let (Ω, P) be a discrete probabilty space. For all n ∈ N and A1 ,... , An ⊆ Ω with P(A1 ∩ · · · ∩ An−1 ) ̸= 0 it holds that P(A1 ∩ · · · ∩ An ) = P(A1 ) · P(A2 | A1 ) · P(A3 | A1 ∩ A2 ) ·... · P(An | A1 ∩ · · · ∩ An−1 ) Yn i−1 \ = P Ai | Aj. i=1 j=1 16 Chapter 3. Conditional distribution and independence Definition 3.7: Independence of events Let (Ω, P) be a discrete probabilty space. Let A, B ⊆ Ω. ▷ A and B are independent, if P(A ∩ B) = P(A) · P(B). 17 Index Index Index Symbole Ereignisraum............. 5 R x......................... 7 random sample........... 5 v......................... 8 F realisations............... 7 frequency................. 6 B – relative................. 6 S Bernoulli distribution.... 12 Satz Binomial distribution.... 12 G – Bayes................. 15 Geometric distribution.. 12 D T distribution Theorem – conditional........... 14 M – total probability...... 15 Mean..................... 7 E Median................... 8 U Empirical standard deviation Uniform distrbution..... 10 8 P Uniform distribution.... 12 Empirical variance........ 8 Poisson distributin...... 12 universe.................. 5 18