Mathematics and Statistical Foundations for Machine Learning PDF
Document Details
Uploaded by ExemplaryClover3209
SRM University AP
2024
Dr. Tapan Kumar Hota
Tags
Related
- Probability & Statistics Lecture Notes PDF
- Mathematics and Statistical Foundations for Machine Learning (FIC 504), Data Science (FIC 506), Cyber Security (FIC 507) PDF
- Mathematics and Statistical Foundations for Machine Learning PDF
- Mathematics and Statistical Foundations for Machine Learning (FIC 504) - PDF
- Mathematics and Statistical Foundations for Machine Learning (FIC 504-507) PDF
- Review on Probability Theory PDF
Summary
These are lecture notes on mathematics and statistical foundations for machine learning, data science, and cyber security, part 3. The document covers concepts like conditional probability, independent events, the total probability formula, and Bayes' theorem.
Full Transcript
Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-3 Dr. Tapan Kumar Hota August 28, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Recall 2. Total Probability...
Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-3 Dr. Tapan Kumar Hota August 28, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Recall 2. Total Probability 3. Bayes’ Theorem 4. Conclusion Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 1 / 14 Recall Recall:Some Propositions Proposition 1.1 Show that P(E c ) = 1 − P(E ). Proposition 1.2 If E ⊂ F , then P(E ) ≤ P(F ). Proposition 1.3 (law of inclusion-exclusion) Sow that P(E ∪ F ) = P(E ) + P(F ) − P(E ∩ F ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 2 / 14 Recall: Conditional probability & Independent Events Definition 1 Two events are independent if knowing the outcome of one provides no useful information about the outcome of the other. In particular, two events E and F are said to be independent if P(E ∩ F ) = P(E )P(F ), and dependent, if they are not independent. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 3 / 14 Recall: Conditional probability & Independent Events Definition 1 Two events are independent if knowing the outcome of one provides no useful information about the outcome of the other. In particular, two events E and F are said to be independent if P(E ∩ F ) = P(E )P(F ), and dependent, if they are not independent. Conditional Probability Notation: The conditional probability that E occurs given that F has occurred is denoted by P E F. Definition 2 If P(F ) > 0 then P (E ∩ F ) P E F =. (1) P(F ) In other words, we have (multiplication rule) P (E ∩ F ) = P(EF ) = P(F )P E F. (2) More generally, P(E1 E2 · · · En ) = P(E1 )P(E2 E1 )P(E3 E1 E2 ) · · · P(En E1 E2 · · · En−1 ). (3) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 3 / 14 Total Probability Total Probability Example 1 For any two events E and F with P(F ) > 0, we have P(E ) = P(E | F )P(F ) + P(E | F c )P(F c ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 4 / 14 Total Probability Example 1 For any two events E and F with P(F ) > 0, we have P(E ) = P(E | F )P(F ) + P(E | F c )P(F c ). Proof: Note that E = (E ∩ F ) ∪ (E ∩ F c ) and (E ∩ F ) ∩ (E ∩ F c ) = ∅. We know from multiplication rule P (E ∩ F ) = P(E | F ) P(F ). Rest of the proof left as an exercise. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 4 / 14 Total Probability Example 1 For any two events E and F with P(F ) > 0, we have P(E ) = P(E | F )P(F ) + P(E | F c )P(F c ). Proof: Note that E = (E ∩ F ) ∪ (E ∩ F c ) and (E ∩ F ) ∩ (E ∩ F c ) = ∅. We know from multiplication rule P (E ∩ F ) = P(E | F ) P(F ). Rest of the proof left as an exercise. Note: This is an extremely useful formula, because its use often enables us to determine the probability of an event by first “conditioning” upon whether or not some second event has occurred. That is, there are many instances in which it is difficult to compute the probability of an event directly, but it is straightforward to compute it once we know whether or not some second event has occurred. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 4 / 14 Total Probability Definition 3 (Partition of sample space S) The even set {E1 , E2 ,... , En } is said to be a partition of the sample space S, if n [ \ Ej = S and Ei Ej = ∅. j=1 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 14 Total Probability Definition 3 (Partition of sample space S) The even set {E1 , E2 ,... , En } is said to be a partition of the sample space S, if n [ \ Ej = S and Ei Ej = ∅. j=1 Example 2 (Total Probability Formula) Let {E1 , E2 ,... , En } form a partition of the sample space S and 0 < P(Ej ) < 1, j = 1, 2,... , n. Then for any event F n X P(F ) = P F | Ej P(Ej ). j=1 Proof: n Note that F = [ F ∩ Ej j=1 n Thus P(F ) = X P(F ∩ Ej ) and now use the multiplication rule. j=1 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 14 Bayes’ Theorem Bayes’ Theorem It is not uncommon to see the conditional probabilities P(A | B) and P(B | A) be confused with each other. Example 3 Suppose that in some group of lung cancer patients we see a large percentage of smokers. If we define B to be the event that a person is a smoker and A to be the event that a person has lung cancer, then all we can conclude is that in our group of people P(B | A) is large. But we cannot conclude from just this information that smoking increases the chance of lung cancer; i.e., that P(A | B) is large. 1 Theformula is named after the Reverend Thomas Bayes, who (essentially) obtained this formula in the eighteenth century. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 6 / 14 Bayes’ Theorem It is not uncommon to see the conditional probabilities P(A | B) and P(B | A) be confused with each other. Example 3 Suppose that in some group of lung cancer patients we see a large percentage of smokers. If we define B to be the event that a person is a smoker and A to be the event that a person has lung cancer, then all we can conclude is that in our group of people P(B | A) is large. But we cannot conclude from just this information that smoking increases the chance of lung cancer; i.e., that P(A | B) is large. In order to calculate a conditional probability P(A | B) when we know the other conditional probability P(B | A), a simple formula known as Bayes’ theorem is useful. 1 1 Theformula is named after the Reverend Thomas Bayes, who (essentially) obtained this formula in the eighteenth century. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 6 / 14 Bayes’ Theorem It is not uncommon to see the conditional probabilities P(A | B) and P(B | A) be confused with each other. Example 3 Suppose that in some group of lung cancer patients we see a large percentage of smokers. If we define B to be the event that a person is a smoker and A to be the event that a person has lung cancer, then all we can conclude is that in our group of people P(B | A) is large. But we cannot conclude from just this information that smoking increases the chance of lung cancer; i.e., that P(A | B) is large. In order to calculate a conditional probability P(A | B) when we know the other conditional probability P(B | A), a simple formula known as Bayes’ theorem is useful. 1 Theorem 1 (Bayes’s theorem) Let {E1 , E2 , · · · , En } be a partition of a sample space S. Let F be some fixed event. Then P F | Ej P Ej P Ej | F = n. (4) X P (F | Ei ) P (Ei ) i=1 1 Theformula is named after the Reverend Thomas Bayes, who (essentially) obtained this formula in the eighteenth century. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 6 / 14 Example Example 4 A single die is rolled. Use the conditional probability of obtaining an even number given that a number greater than three has shown. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 14 Example Example 4 A single die is rolled. Use the conditional probability of obtaining an even number given that a number greater than three has shown. Solution: Let E be the event that an even number shows, and F be the event that a number greater than three shows. We want P(E | F ) where E = {2, 4, 6} and F = {4, 5, 6}. Which implies, E ∩ F = {4, 6}. Therefore, (a) P(F ) =?, and P(E ∩ F ) =? Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 14 Example Example 4 A single die is rolled. Use the conditional probability of obtaining an even number given that a number greater than three has shown. Solution: Let E be the event that an even number shows, and F be the event that a number greater than three shows. We want P(E | F ) where E = {2, 4, 6} and F = {4, 5, 6}. Which implies, E ∩ F = {4, 6}. Therefore, (a) P(F ) =?, and P(E ∩ F ) =? (b) P(E | F ) =? Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 14 Example Example 4 A single die is rolled. Use the conditional probability of obtaining an even number given that a number greater than three has shown. Solution: Let E be the event that an even number shows, and F be the event that a number greater than three shows. We want P(E | F ) where E = {2, 4, 6} and F = {4, 5, 6}. Which implies, E ∩ F = {4, 6}. Therefore, (a) P(F ) =?, and P(E ∩ F ) =? (b) P(E | F ) =? P(F ) = 3/6, and P(E ∩ F ) = 2/6 P(E ∩ F ) 2/6 2 P(E | F ) = = =. P(F ) 3/6 3 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 14 Example Example 5 At a community college 65% of the students subscribe to Amazon Prime, 50% subscribe to Netflix, and 20% subscribe to both. If a student is chosen at random, find the following probabilities: (a) the student subscribes to Amazon Prime given that he subscribes to Netflix. (b) the student subscribes to Netflix given that he subscribes to Amazon Prime. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 14 Example Example 5 At a community college 65% of the students subscribe to Amazon Prime, 50% subscribe to Netflix, and 20% subscribe to both. If a student is chosen at random, find the following probabilities: (a) the student subscribes to Amazon Prime given that he subscribes to Netflix. (b) the student subscribes to Netflix given that he subscribes to Amazon Prime. Solution: Let A be the event that the student subscribes to Amazon Prime, and N be the event that the student subscribes to Netflix. First identify the probabilities and events given in the problem. P (student subscribes to Amazon Prime) = P(A) = 0.65 P (student subscribes to Netflix) = P(N) = 0.50 P (student subscribes to both Amazon Prime and Netflix) = P(A ∩ N) = 0.20 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 14 Example Example 5 At a community college 65% of the students subscribe to Amazon Prime, 50% subscribe to Netflix, and 20% subscribe to both. If a student is chosen at random, find the following probabilities: (a) the student subscribes to Amazon Prime given that he subscribes to Netflix. (b) the student subscribes to Netflix given that he subscribes to Amazon Prime. Solution: Let A be the event that the student subscribes to Amazon Prime, and N be the event that the student subscribes to Netflix. Firstidentify the probabilities and events given in the problem. P (student subscribes to Amazon Prime) = P(A) = 0.65 P (student subscribes to Netflix) = P(N) = 0.50 P (student subscribes to both Amazon Prime and Netflix) = P(A ∩ N) = 0.20 Then use the conditional probability rule: P(A ∩ N) (a) P(A | N) = = P(N) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 14 Example Example 5 At a community college 65% of the students subscribe to Amazon Prime, 50% subscribe to Netflix, and 20% subscribe to both. If a student is chosen at random, find the following probabilities: (a) the student subscribes to Amazon Prime given that he subscribes to Netflix. (b) the student subscribes to Netflix given that he subscribes to Amazon Prime. Solution: Let A be the event that the student subscribes to Amazon Prime, and N be the event that the student subscribes to Netflix. Firstidentify the probabilities and events given in the problem. P (student subscribes to Amazon Prime) = P(A) = 0.65 P (student subscribes to Netflix) = P(N) = 0.50 P (student subscribes to both Amazon Prime and Netflix) = P(A ∩ N) = 0.20 Then use the conditional probability rule: P(A ∩ N).20 2 P(A ∩ N) (a) P(A | N) = = = , (b) P(N | A) = = P(N).50 5 P(A) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 14 Example Example 5 At a community college 65% of the students subscribe to Amazon Prime, 50% subscribe to Netflix, and 20% subscribe to both. If a student is chosen at random, find the following probabilities: (a) the student subscribes to Amazon Prime given that he subscribes to Netflix. (b) the student subscribes to Netflix given that he subscribes to Amazon Prime. Solution: Let A be the event that the student subscribes to Amazon Prime, and N be the event that the student subscribes to Netflix. First identify the probabilities and events given in the problem. P (student subscribes to Amazon Prime) = P(A) = 0.65 P (student subscribes to Netflix) = P(N) = 0.50 P (student subscribes to both Amazon Prime and Netflix) = P(A ∩ N) = 0.20 Then use the conditional probability rule: P(A ∩ N).20 2 P(A ∩ N) (a) P(A | N) = = = , (b) P(N | A) = = P(N).50 5 P(A).20 4 =..65 13 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 14 Example Example 6 Suppose we have have three bags that each contain 100 marbles: (a) Bag 1 has 75 red and 25 blue marbles, (b) Bag 2 has 60 red and 40 blue marbles, and (c) Bag 3 has 45 red and 55 blue marbles. Let us choose one of the bags at random and then pick a marble from the chosen bag, also at random. What is the probability that the chosen marble is red? Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 14 Example Example 6 Suppose we have have three bags that each contain 100 marbles: (a) Bag 1 has 75 red and 25 blue marbles, (b) Bag 2 has 60 red and 40 blue marbles, and (c) Bag 3 has 45 red and 55 blue marbles. Let us choose one of the bags at random and then pick a marble from the chosen bag, also at random. What is the probability that the chosen marble is red? Solution: Let R be the event that the chosen marble is red. Let Bi be the event that I choose Bag i. We already know that P (R | B1 ) = 0.75, P (R | B2 ) = 0.60, P (R | B3 ) = 0.45 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 14 Example Example 6 Suppose we have have three bags that each contain 100 marbles: (a) Bag 1 has 75 red and 25 blue marbles, (b) Bag 2 has 60 red and 40 blue marbles, and (c) Bag 3 has 45 red and 55 blue marbles. Let us choose one of the bags at random and then pick a marble from the chosen bag, also at random. What is the probability that the chosen marble is red? Solution: Let R be the event that the chosen marble is red. Let Bi be the event that I choose Bag i. We already know that P (R | B1 ) = 0.75, P (R | B2 ) = 0.60, P (R | B3 ) = 0.45 We choose our partition as B1 , B2 , B3. Note that this is a valid partition because, firstly, the Bi ’s are disjoint (only one of them can happen), and secondly, because their union is the entire sample space as one of the bags will be chosen for sure, i.e., P (B1 ∪ B2 ∪ B3 ) = 1. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 14 Example Example 6 Suppose we have have three bags that each contain 100 marbles: (a) Bag 1 has 75 red and 25 blue marbles, (b) Bag 2 has 60 red and 40 blue marbles, and (c) Bag 3 has 45 red and 55 blue marbles. Let us choose one of the bags at random and then pick a marble from the chosen bag, also at random. What is the probability that the chosen marble is red? Solution: Let R be the event that the chosen marble is red. Let Bi be the event that I choose Bag i. We already know that P (R | B1 ) = 0.75, P (R | B2 ) = 0.60, P (R | B3 ) = 0.45 We choose our partition as B1 , B2 , B3. Note that this is a valid partition because, firstly, the Bi ’s are disjoint (only one of them can happen), and secondly, because their union is the entire sample space as one of the bags will be chosen for sure, i.e., P (B1 ∪ B2 ∪ B3 ) = 1. Using the law of total probability, we can write P(R) = P (R | B1 ) P (B1 ) + P (R | B2 ) P (B2 ) + P (R | B3 ) P (B3 ) = (0.75) 13 + (0.60) 13 + (0.45) 13 = 0.60. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 14 Example Example 7 A life insurance agency believes that it’s clients who are senior citizens can be divided into two classes: those who are in good health and those who are not in good health. The agency reports that a senior citizen in good health will pass away within a one year period with probability 0.09 whereas this probability is 0.26 for the senior citizen who is not in good health. Suppose that 10% of the agency’s senior citizens are in good health. Given that a senior citizen passed away, what is the probability that the senior citizen was not in good health? Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 10 / 14 Example Example 7 A life insurance agency believes that it’s clients who are senior citizens can be divided into two classes: those who are in good health and those who are not in good health. The agency reports that a senior citizen in good health will pass away within a one year period with probability 0.09 whereas this probability is 0.26 for the senior citizen who is not in good health. Suppose that 10% of the agency’s senior citizens are in good health. Given that a senior citizen passed away, what is the probability that the senior citizen was not in good health? Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 10 / 14 Example Let E = senior citizen is not in good health and F = senior citizen passed away P(E ∩ F ) 0.9(0.26) P(E | F ) = = ≈ 0.9630. P(F ) 0.1(0.09) + (0.9)(0.26) Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 11 / 14 Example Example 8 Suppose a 15 minute rapid antigen test for the SARS-CoV-2 virus is 80.2% effective in detecting the virus when it is present. However, the test also yields a false positive 8% of the time. Assume that 3.39% of people living in Hyderabad has the virus. Suppose a person living in Hyderabad takes the test and it is learned that the test comes back positive. Find the probability that the person actually has the disease. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 14 Example Example 8 Suppose a 15 minute rapid antigen test for the SARS-CoV-2 virus is 80.2% effective in detecting the virus when it is present. However, the test also yields a false positive 8% of the time. Assume that 3.39% of people living in Hyderabad has the virus. Suppose a person living in Hyderabad takes the test and it is learned that the test comes back positive. Find the probability that the person actually has the disease. Denote E = have the virus and F = tested positive. Thus, we need to find P(E | F ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 14 Example Let B1 = { the person has the virus }, B2 = { the person does not has the virus } and A = { the person has the disease }. We are asked to find P (B1 | A). By Bayes’ Theorem, P (B1 ) P (A | B1 ) P (B1 | A) = P2 j=1 P Bj P A | Bj P (B1 ) P (A | B1 ) = P (B1 ) P (A | B1 ) + P (B2 ) P (A | B2 ) 0.0339(.802) = 0.0339(.802) + 0.9661(0.08) ≈ 0.2602 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 13 / 14 Conclusion Summary Conditional probability and independent events The conditional probability is helpful in calculating probabilities when some partial information concerning the result of an experiment is available. Introduce the total probability Solve the problem related to Bayes’s theorem Next Lecture Tutorial Reference: Sheldon Ross, A First Course in Probability, 7th Edition, Pearson, 2006 Thank you Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 14