Mathematics and Statistical Foundations for Machine Learning PDF

Summary

This document is a set of lecture notes on mathematics and statistics foundations for machine learning, data science, and cyber security. Topics covered in these notes include mutually exclusive events, conditional probability, and independent events. The notes were presented on August 22, 2024, at SRM University AP.

Full Transcript

Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-2 Dr. Tapan Kumar Hota August 22, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Recall 2. Disjoint and non-disjoin...

Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-2 Dr. Tapan Kumar Hota August 22, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Recall 2. Disjoint and non-disjoint outcomes 3. Some Propositions 4. Independent Events 5. Conclusion Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 1 / 21 Recall Sample Space and Events ˆ Sample Space: The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S. ˆ Event: Any subset E of the sample space is known as an event. ˆ Null event: An event which does not contain any outcomes and hence could not occur. It is denote it by ∅. ˆ Complementary event: For any event E , we define the new event E c , referred to as the complement of E , to consist of all outcomes in the sample space S that are not in E For any two events E and F of a sample space S ˆ E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. ˆ EF = E ∩ F , called the intersection of E and F , to consist of all outcomes that are both in E and in F. ˆ mutually exclusive events: If EF = ∅ then E and F are said to be mutually exclusive. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 2 / 21 Axioms Of Probability ˆ There are several possible interpretations of probability (e.g., Frequentist interpretation and Bayesian interpretation) but they (almost) completely agree on the mathematical rules probability must follow. ˆ P(E ) = Probability of event E ˆ 0 ≤ P(E ) ≤ 1 ˆ The modern axiomatic approach to probability theory: ˆ Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E ) is defined and satisfies the following three axioms: ˆ Axiom 1: 0 ≤ P(E ) ≤ 1. ˆ Axiom 2: P(S) = 1. ˆ Axiom 3: For any sequence of mutually exclusive events E1 , E2 ,... (that is, ̸ j), events for which Ei Ej = ∅ when i = ∞ ∞ ! [ X P Ei = P (Ei ). i=1 i=1 Here we refer P(E ) as the probability of the event E (we assumed P(E ) to be defined for all events E ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 3 / 21 Disjoint and non-disjoint outcomes Disjoint and non-disjoint outcomes Disjoint (mutually exclusive) outcomes: Cannot happen at the same time. ˆ The outcome of a single coin toss cannot be a head and a tail. ˆ A student both cannot fail and pass a class. ˆ A single card drawn from a deck cannot be an ace and a queen. Non-disjoint outcomes: Can happen at the same time. ˆ A student can get an A in Stats and A in Econ in the same semester. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 4 / 21 Example: Union of Non-Mutually Exclusive Events Example 1 What is the probability of drawing a jack or a red card from a well shuffled full deck? P(jack or red) = P(jack) + P(red) − P(jack and red) 4 26 2 28 = + − = 52 52 52 52 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 21 Some Propositions Some Propositions Proposition 3.1 Show that P(E c ) = 1 − P(E ). Proof: Note that, since E and E c are always mutually exclusive and since E ∪ E c = S, by Axioms 2 and 3, 1 = P(S) = P(E ∪ E c ) = P(E ) + P(E c ) ⇒ P(E c ) = 1 − P(E ). In other words, the probability that an event does not occur is 1 minus the probability that it does occur. Proposition 3.2 If E ⊂ F , then P(E ) ≤ P(F ). Proof: Note that as E ⊂ F , it follows that we can express F as F = E ∪ (E c ∩ F ). Hence, because E and E c ∩ F are mutually exclusive, we obtain, from Axiom 3, P(F ) = P(E ) + P(E c ∩ F ). Using Axiom 1, P(E c ∩ F ) ≥ 0. Thus, P(E ) ≤ P(F ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 6 / 21 Some Propositions Proposition 3.3 (law of inclusion-exclusion) Sow that P(E ∪ F ) = P(E ) + P(F ) − P(E ∩ F ). Proof: ˆ Note that E ∪ F = E ∪ (E c ∩ F ) & E and E c ∩ F are mutually exclusive, we obtain, from Axiom 3, it follows: P(E ∪ F ) = P (E ∪ (E c ∩ F )) = P(E ) + P (E c ∩ F ). (1) ˆ Furthermore, since F = (E ∩ F ) ∪ (E c ∩ F ), we again obtain from Axiom 3: P(F ) = P(E ∩ F ) + P(E c ∩ F ) ⇒ P(E c ∩ F ) = P(F ) − P(E ∩ F ). (2) ˆ From Equations (1) and (2), we have P(E ∪ F ) = P(E ) + P(F ) − P(E ∩ F ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 21 Example Example 2 Sai is taking two books along on his holiday vacation. With probability 0.5, he will like the first book; with probability 0.4, he will like the second book; and with probability 0.3, he will like both books. What is the probability that he likes neither book? Solution: ˆ Let Bi denote the event that Sai likes book i, where i = 1, 2. ˆ Then the probability that Sai likes at least one of the books is P(B1 ∪ B2 ) = P(B1 ) + P(B2 ) − P(B1 ∩ B2 ) = 0.5 + 0.4 − 0.3 = 0.6 ˆ The probability that he likes neither book is P(B1c ∩ B2c ) = P ((B1 ∪ B2 )c ) = 1 − P(B1 ∪ B2 ) = 0.4. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 21 Equally Likely Events ˆ In many experiments, it is natural to assume that all outcomes in the sample space are equally likely to occur. ˆ Consider an experiment whose sample space S is a finite set, say, S = {1, 2,... , N}. Then it is often natural to assume that P({1}) = P({2}) =... = P({N}). ˆ Thus from Axioms 2 and 3: 1 P({i}) = , i = 1, 2,... , N.(why?) N ˆ From this observation and using Axiom 3, we have number of outcomes in E P(E ) =. (3) number of outcomes in S Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 21 Example Example 3 If two dice are rolled, what is the probability that the sum of the upturned faces will equal 7? Solution: ˆ Let us assume that all of the 36 possible outcomes are equally likely. ˆ Since there are 6 possible outcomes, namely, (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), and (6, 1) that result in the sum of the dice being equal to 7, the desired probability is 6 1 =. 36 6 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 10 / 21 Example Example 4 If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black balls, what is the probability that one of the balls is white and the other two black? Solution: ˆ we regard the order in which the balls are selected as being relevant, then the sample space consists of 11 × 10 × 9 = 990 outcomes. ˆ There are 6 × 5 × 4 = 120 outcomes in which the first ball selected is white and the other two are black. ˆ There are 5 × 6 × 4 = 120 outcomes in which the first is black, the second is white,and the third is black. ˆ There are 5 × 4 × 6 = 120 in which the first two are black and the third is white. ˆ Hence, assuming that “randomly drawn” means that each outcome in the sample space is equally likely to occur, we see that the desired probability is 120 + 120 + 120 4 =. 990 11 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 11 / 21 Example Example 5 If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black balls, what is the probability that one of the balls is white and the other two black? Solution: ˆ Let us solve the same problem by regarding the outcome of the experiment as the unordered set of drawn balls. ! 11 ˆ There are outcomes in the sample space. 3 ˆ The desired probability is ! ! 6 5 1 2 4 ! =. 11 11 3 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 21 Example Example 6 A committee of 5 is to be selected from a group of 6 men and 9 women. If the selection is made randomly, what is the probability that the committee consists of 3 men and 2 women? Solution: ˆ Total people is 15 and a committee of 5 is to be selected. ! 15 ˆ Note that each of the possible committees is equally likely to be 5 selected. ! ! 6 9 3 2 240 ! =. 15 1001 5 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 13 / 21 Independent Events Conditional probability Definition 1 Two events are independent if knowing the outcome of one provides no useful information about the outcome of the other. Conditional Probability Notation: The conditional probability that E occurs given that F has occurred is  denoted by P E F. Example 7 Let two dice been to rolled and E and F denote the event that the sum of the dice is 8 and the event that the first die is a 3, respectively. Given this information, what is the probability that the sum of the 2 dice equals 8? Solution: ˆ Note that P({i}) = 1/36, i = 1, 2, 3,... , 36, as each of the 36 possible outcomes is equally likely to occur. ˆ Given that the initial die is a 3, there can be at most 6 possible outcomes of our experiment, namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6). ˆ Given that the first die is a 3, the (conditional) probability of each of the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6) is 1/6: Hence  P E F = 1/6. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 21 Conditional Probability Definition 2 If P(F ) > 0 then  P (E ∩ F ) P E F =. (4) P(F ) In other words, we have (multiplication rule)  P (E ∩ F ) = P(EF ) = P(F )P E F. (5) More generally, P(E1 E2 · · · En ) = P(E1 )P(E2 E1 )P(E3 E1 E2 ) · · · P(En E1 E2 · · · En−1 ). (6) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 15 / 21 Example Example 8 A student is taking a one-hour-time-limit makeup examination. Suppose the probability that the student will finish the exam in less than x hours is x/2, for all 0 ≤ x ≤ 1. Then, given that the student is still working after 0.75 hour, what is the conditional probability that the full hour is used? Solution: ˆ Let Lx denote the event that the student finishes the exam in less than x hours, 0 ≤ x ≤ 1. ˆ Let F be the event that the student uses the full hour. the student is not finished in less than 1 hour = P(F ) = P(Lc1 ) = 1−P(L1 ) = 0.5. ˆ We need to find P F Lc0.75.  By Equation (4), we have P F ∩ Lc0.75  P(F ) P F Lc0.75 =  = =? P(Lc0.75 ) P(Lc0.75 ) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 16 / 21 Example Example 9 A coin is flipped twice. Assuming that all four points in the sample space S = {(h, h), (h, t), (t, h), (t, t)} are equally likely, what is the conditional probability that both flips land on heads, given that (a) the first flip lands on heads? (b) at least one flip lands on heads? Solution: ˆ Let E = {(h, h)} be the event that both flips land on heads. ˆ Let F = {(h, h), (h, t)} be the event that the first flip lands on heads. ˆ Let G = {(h, h), (h, t), (t, h)} be the event that at least one flip lands on heads. ˆ Now use Equation (4) to compute   P E F and P E G. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 17 / 21 Example Example 10 An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards each. Compute the probability that each pile has exactly 1 ace. Solution: Let Ei = the probability pile i has exactly 1 ace, i = 1, 2, 3, 4. ˆ Now use the multiplication rule, Equation (6), to find the desired result: P(E1 E2 E3 E4 ) = P(E1 ) P(E2 E1 ) P(E3 E1 E2 ) P(E4 E1 E2 E3 ). ˆ Let us start with a deck of 52 cards, 4 of which are aces and 48 non-aces. From these, we choose 1 ace and 12 non-aces. ! ! 4 52 − 4 1 12 P(E1 ) = ! =? 52 13 ˆ Now given E1 occurred, we are left with a deck of cards that contains 39 cards. 3 of which are aces and 36 of which are!non-aces. ! 3 36 − 3 1 12 P(E2 E1 ) = ! =? 39 13 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 18 / 21 Example Solution: Example 10 continued ˆ Next, given E1 and E2 occurred, our sample space is reduced to a deck of 26 cards, 2 of which are aces and 24 of which are non-aces. ! ! 2 26 − 2 1 12 P(E3 E1 E2 ) = ! =? 26 13 ˆ Finally, given E1 , E2 and E3 occurred, we left with a deck of 13 cards, 1 of which are aces and 12 of which are non-aces. ! ! 1 13 − 1 1 12 P(E4 E1 E2 E3 ) = ! =? 13 13 ˆ Now evaluate P(E1 E2 E3 E4 ) = P(E1 ) P(E2 E1 ) P(E3 E1 E2 ) P(E4 E1 E2 E3 ) = ?. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 19 / 21 Independent Events Definition 3 Two events E and F are said to be independent if P(E ∩ F ) = P(E )P(F ), and dependent, if they are not independent. ˆ Observe for two independent events E and F from Equation (4),  P (E ∩ F ) = P(F )P E F we have  P E F = P(E ). (7) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 20 / 21 Conclusion Summary ˆ Mutually Exclusive Events ˆ Conditional probability and independent events Next Lecture ˆ Application of conditional probability ˆ Introduce the total probability ˆ Solve the problem related to Bayes’s theorem Reference: Sheldon Ross, A First Course in Probability, 7th Edition, Pearson, 2006 Thank you Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 21 / 21

Use Quizgecko on...
Browser
Browser