Mathematics and Statistical Foundations for Machine Learning (FIC 504), Data Science (FIC 506), Cyber Security (FIC 507) PDF

Summary

Mathematics and Statistical Foundations for Machine Learning (FIC 504), Data Science (FIC 506), Cyber Security (FIC 507), part-1 lecture notes, from SRM University AP, August 21, 2024. The course covers probability, random variables, statistics, linear algebra, matrix decompositions, and continuous optimization.

Full Transcript

Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-1 Dr. Tapan Kumar Hota August 21, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Course Overview 2. Introduction to...

Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) part-1 Dr. Tapan Kumar Hota August 21, 2024 [email protected] Level 2, Room No 8, S R Block Table of contents 1. Course Overview 2. Introduction to Probability 3. Axioms of Probability 4. Law of large numbers 5. Conclusion Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 1 / 15 Course Overview Introduction ˆ Course code: ˆ FIC 504: Mathematics and Statistical Foundations for Machine Learning ˆ FIC 506: Mathematics and Statistical Foundations for Data Science ˆ FIC 507: Mathematical Foundations for Cyber Security ˆ Number of credit: 3 ˆ Prerequisite: Good knowledge of calculus and linear algebra, and college level physics. ˆ Textbooks: ˆ Sheldon Ross, A First Course in Probability, 7th Edition, Pearson, 2006 ˆ J. Medhi, Stochastic Processes, 3rd Edition, New Age International, 2009. ˆ S.M. Ross, Stochastic Processes, 2nd Edition, Wiley, 1996. ˆ Kenneth M Hoffman, Ray Kunze, Linear Algebra, 2nd Edition, Pearson. ˆ Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 2 / 15 Objective Objectives ˆ Introduction of mathematical tools that are useful in developing new algorithm for machine learning. ˆ Introduce the matrix method and optimization process. ˆ Data analysis using the least squares classification and regression. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 3 / 15 Syllabus UNIT I: PROBABILITY Classical, relative frequency and axiomatic definitions of probability, addition rule and conditional probability, multiplication rule, total probability, Bayes’ Theorem, and independence. UNIT II: RANDOM VARIABLES & STATISTICS Discrete, continuous, and mixed random variables, probability mass, probability density and cumulative distribution functions, mathematical expectation, moments, moment generating function, Central Limit Theorem, Confidence Interval, Hypothesis testing. UNIT III: LINEAR ALGEBRA Finite dimensional vector spaces over a field; linear combination, linear dependence, and independence; basis and dimension; inner-product spaces, linear transformations; matrix representation of linear transformations. Eigen values and eigenvectors (Matrix and Transformations), rank and nullity, inverse and linear transformation, Cayley-Hamilton Theorem. UNIT IV: MATRIX DECOMPOSITIONS Determinant and Trace, LU-Decomposition, QR-Decomposition, Cholesky Decomposition, Eigen decomposition and Diagonalization, Singular Value Decomposition, Matrix Approximation and Jordan Canonical Form. UNIT V: CONTINUOUS OPTIMIZATION Foundations of Optimizations (Basic terminology and Definitions), Optimization Using Gradient Descent, Constrained Optimization and Lagrange Multipliers, Convex optimization. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 4 / 15 Assessment Sl No Assessment Conducting Mark Converting Marks 1 CLA-1 20 10 2 CLA-2 20 10 3 CLA-3 20 10 4 Midterm 25 20 5 Endterm 100 50 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 15 Assessment Sl No Assessment Conducting Mark Converting Marks 1 CLA-1 20 10 2 CLA-2 20 10 3 CLA-3 20 10 4 Midterm 25 20 5 Endterm 100 50 ˆ Midterm: 30-09-2024 to 04-10-2024 ˆ Last date of class 28-11-2024 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 15 Assessment Sl No Assessment Conducting Mark Converting Marks 1 CLA-1 20 10 2 CLA-2 20 10 3 CLA-3 20 10 4 Midterm 25 20 5 Endterm 100 50 ˆ Midterm: 30-09-2024 to 04-10-2024 ˆ Last date of class 28-11-2024 ˆ Office hours and discussion hours: Monday and Wednesday, 3 to 5 pm. Please drop an email before you visit along with your queries. ˆ Attendance Policy: 75% is need for appearing in the final exam. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 5 / 15 Introduction to Probability Introduction to Probability We introduce the concept of the probability of an event and then show how probabilities can be computed in certain situations. As a preliminary, however, we need the concept of the sample space and the events of an experiment. Random processes ˆ A random process is a situation in which we know what outcomes could happen, but we don’t know which particular outcome will happen. ˆ Examples: coin tosses, die rolls, spotify shuffle, whether the stock market goes up or down tomorrow, etc. ˆ It can be helpful to model a process as random even if it is not truly random. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 6 / 15 Sample Space and Events ˆ Consider an experiment whose outcome is not predictable with certainty. However, although the outcome of the experiment will not be known in advance, let us suppose that the set of all possible outcomes is known. ˆ Sample Space: The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 15 Sample Space and Events ˆ Consider an experiment whose outcome is not predictable with certainty. However, although the outcome of the experiment will not be known in advance, let us suppose that the set of all possible outcomes is known. ˆ Sample Space: The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S. Example 1. If the outcome of an experiment consists in the determination of the sex of a newborn child, then S = {g , b}, where the outcome g means that the child is a girl and b that it is a boy. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 15 Sample Space and Events ˆ Consider an experiment whose outcome is not predictable with certainty. However, although the outcome of the experiment will not be known in advance, let us suppose that the set of all possible outcomes is known. ˆ Sample Space: The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S. Example 1. If the outcome of an experiment consists in the determination of the sex of a newborn child, then S = {g , b}, where the outcome g means that the child is a girl and b that it is a boy. 2. If the outcome of an experiment is the order of finish in a race among the 7 horses having post positions 1, 2, 3, 4, 5, 6, and 7, then S = { all 7 ! permutations of (1, 2, 3, 4, 5, 6, 7)}. The outcome (2, 3, 1, 6, 5, 4, 7) means, for instance, that the number 2 horse comes in first, then the number 3 horse, then the number 1 horse, and so on. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 15 Sample Space and Events ˆ Consider an experiment whose outcome is not predictable with certainty. However, although the outcome of the experiment will not be known in advance, let us suppose that the set of all possible outcomes is known. ˆ Sample Space: The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S. Example 1. If the outcome of an experiment consists in the determination of the sex of a newborn child, then S = {g , b}, where the outcome g means that the child is a girl and b that it is a boy. 2. If the outcome of an experiment is the order of finish in a race among the 7 horses having post positions 1, 2, 3, 4, 5, 6, and 7, then S = { all 7 ! permutations of (1, 2, 3, 4, 5, 6, 7)}. The outcome (2, 3, 1, 6, 5, 4, 7) means, for instance, that the number 2 horse comes in first, then the number 3 horse, then the number 1 horse, and so on. 3. If the experiment consists of flipping two coins, then the sample space consists of the following four points: S = {(H, H), (H, T ), (T , H), (T , T )}. The outcome will be (H, H) if both coins are heads, (H, T ) if the first coin is heads and the second tails, (T , H) if the first is tails and the second heads, and (T , T ) if both coins are tails. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 7 / 15 Sample Space and Events ˆ Event: Any subset E of the sample space is known as an event. In other words, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E , then we say that E has occurred. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 15 Sample Space and Events ˆ Event: Any subset E of the sample space is known as an event. In other words, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E , then we say that E has occurred. ˆ For any two events E and F of a sample space S, we define the new event E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 15 Sample Space and Events ˆ Event: Any subset E of the sample space is known as an event. In other words, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E , then we say that E has occurred. ˆ For any two events E and F of a sample space S, we define the new event E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. ˆ For any two events E and F , we may also define the new event EF = E ∩ F , called the intersection of E and F , to consist of all outcomes that are both in E and in F. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 15 Sample Space and Events ˆ Event: Any subset E of the sample space is known as an event. In other words, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E , then we say that E has occurred. ˆ For any two events E and F of a sample space S, we define the new event E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. ˆ For any two events E and F , we may also define the new event EF = E ∩ F , called the intersection of E and F , to consist of all outcomes that are both in E and in F. ˆ Null event: An event which does not contain any outcomes and hence could not occur. To give such an event a name, we shall refer to it as the null event and denote it by ∅. ˆ mutually exclusive events: If EF = ∅ then E and F are said to be mutually exclusive. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 15 Sample Space and Events ˆ Event: Any subset E of the sample space is known as an event. In other words, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E , then we say that E has occurred. ˆ For any two events E and F of a sample space S, we define the new event E ∪ F to consist of all outcomes that are either in E or in F or in both E and F. ˆ For any two events E and F , we may also define the new event EF = E ∩ F , called the intersection of E and F , to consist of all outcomes that are both in E and in F. ˆ Null event: An event which does not contain any outcomes and hence could not occur. To give such an event a name, we shall refer to it as the null event and denote it by ∅. ˆ mutually exclusive events: If EF = ∅ then E and F are said to be mutually exclusive. ˆ Complementary event: For any event E , we define the new event E c , referred to as the complement of E , to consist of all outcomes in the sample space S that are not in E. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 8 / 15 Sample Space and Events Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 15 Sample Space and Events The operations of forming unions, intersections, and complements of events obey certain rules similar to the rules of algebra. We list a few of these rules: ˆ Commutative laws: E ∪ F = F ∪ E , E ∩ F = F ∩ E. ˆ Associative laws (E ∪ F ) ∪ G = E ∪ (F ∪ G ), (E ∩ F ) ∩ G = E ∩ (F ∩ G ). ˆ Distributive laws: (E ∪ F ) ∩ G = (E ∩ G ) ∪ (F ∩ G ), (E ∩ F ) ∪ G = (E ∪ G ) ∩ (F ∪ G ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 9 / 15 Sample Space and Events Figure 1: Venn Diagrams of (E ∪ F ) ∩ G = (E ∩ G ) ∪ (F ∩ G ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 10 / 15 Sample Space and Events Figure 1: Venn Diagrams of (E ∪ F ) ∩ G = (E ∩ G ) ∪ (F ∩ G ). The following useful relationships between the three basic operations of forming unions, intersections, and complements are known as De Morgan’s laws:  c  c n [ \n \n [n  Ej  = Ej ,  Ej  = Ej (1) j=1 j=1 j=1 j=1 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 10 / 15 Axioms of Probability Axioms Of Probability ˆ There are several possible interpretations of probability but they (almost) completely agree on the mathematical rules probability must follow. ˆ P(E ) = Probability of event E ˆ 0 ≤ P(E ) ≤ 1 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 11 / 15 Axioms Of Probability ˆ There are several possible interpretations of probability but they (almost) completely agree on the mathematical rules probability must follow. ˆ P(E ) = Probability of event E ˆ 0 ≤ P(E ) ≤ 1 ˆ Frequentist interpretation: ˆ The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 11 / 15 Axioms Of Probability ˆ There are several possible interpretations of probability but they (almost) completely agree on the mathematical rules probability must follow. ˆ P(E ) = Probability of event E ˆ 0 ≤ P(E ) ≤ 1 ˆ Frequentist interpretation: ˆ The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times. ˆ Bayesian interpretation: ˆ A Bayesian interprets probability as a subjective degree of belief: For the same event, two separate people could have different viewpoints and so assign different probabilities. ˆ Largely popularized by revolutionary advance in computational technology and methods during the last twenty years. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 11 / 15 Axioms Of Probability ˆ It would be be more reasonable to assume a set of simpler and more self-evident axioms about probability. This approach is the modern axiomatic approach to probability theory. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 15 Axioms Of Probability ˆ It would be be more reasonable to assume a set of simpler and more self-evident axioms about probability. This approach is the modern axiomatic approach to probability theory. ˆ Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E ) is defined and satisfies the following three axioms: ˆ Axiom 1: 0 ≤ P(E ) ≤ 1 (states that the probability that the outcome of the experiment is an outcome in E is some number between 0 and 1). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 15 Axioms Of Probability ˆ It would be be more reasonable to assume a set of simpler and more self-evident axioms about probability. This approach is the modern axiomatic approach to probability theory. ˆ Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E ) is defined and satisfies the following three axioms: ˆ Axiom 1: 0 ≤ P(E ) ≤ 1 (states that the probability that the outcome of the experiment is an outcome in E is some number between 0 and 1). ˆ Axiom 2: P(S) = 1 (with probability 1, the outcome will be a point in the sample space S). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 15 Axioms Of Probability ˆ It would be be more reasonable to assume a set of simpler and more self-evident axioms about probability. This approach is the modern axiomatic approach to probability theory. ˆ Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E ) is defined and satisfies the following three axioms: ˆ Axiom 1: 0 ≤ P(E ) ≤ 1 (states that the probability that the outcome of the experiment is an outcome in E is some number between 0 and 1). ˆ Axiom 2: P(S) = 1 (with probability 1, the outcome will be a point in the sample space S). ˆ Axiom 3: For any sequence of mutually exclusive events E1 , E2 ,... (that is, ̸ j), events for which Ei Ej = ∅ when i = ∞ ∞ ! [ X P Ei = P (Ei ) , i=1 i=1 (for mutually exclusive events, the probability of at least one of these events occurring is just the sum of their respective probabilities). Here we refer P(E ) as the probability of the event E (we assumed P(E ) to be defined for all events E ). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 12 / 15 Example Example Which of the following events would you be most surprised by? (a) exactly 3 heads in 10 coin flips (b) exactly 3 heads in 100 coin flips (c) exactly 3 heads in 1000 coin flips Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 13 / 15 Example Example Which of the following events would you be most surprised by? (a) exactly 3 heads in 10 coin flips (b) exactly 3 heads in 100 coin flips (c) exactly 3 heads in 1000 coin flips Example If our experiment consists of tossing a coin and if we assume that a head is as likely to appear as a tail, then we would have, 1 P({H}) = P({T }) =. 2 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 13 / 15 Example Example Which of the following events would you be most surprised by? (a) exactly 3 heads in 10 coin flips (b) exactly 3 heads in 100 coin flips (c) exactly 3 heads in 1000 coin flips Example If our experiment consists of tossing a coin and if we assume that a head is as likely to appear as a tail, then we would have, 1 P({H}) = P({T }) =. 2 Example If a die is rolled and we suppose that all six sides are equally likely to appear, then we would have 1 P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) =. 6 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 13 / 15 Law of large numbers Law of large numbers Definition Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, p̂n , converges to the probability of that outcome, p. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 15 Law of large numbers Definition Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, p̂n , converges to the probability of that outcome, p. Example When tossing a fair coin, if heads comes up on each of the first 10 tosses, what do you think the chance is that another head will come up on the next toss? 0.5, less than 0.5, or more than 0.5? H H H H H H H H H H ? ˆ The probability is still 0.5, or there is still a 50% chance that another head will come up on the next toss. P(H on 11th toss) = P(T on 11th toss) = 0.5 Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 15 Law of large numbers Definition Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, p̂n , converges to the probability of that outcome, p. Example When tossing a fair coin, if heads comes up on each of the first 10 tosses, what do you think the chance is that another head will come up on the next toss? 0.5, less than 0.5, or more than 0.5? H H H H H H H H H H ? ˆ The probability is still 0.5, or there is still a 50% chance that another head will come up on the next toss. P(H on 11th toss) = P(T on 11th toss) = 0.5 ˆ The coin is not “due” for a tail. Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 15 Law of large numbers Definition Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, p̂n , converges to the probability of that outcome, p. Example When tossing a fair coin, if heads comes up on each of the first 10 tosses, what do you think the chance is that another head will come up on the next toss? 0.5, less than 0.5, or more than 0.5? H H H H H H H H H H ? ˆ The probability is still 0.5, or there is still a 50% chance that another head will come up on the next toss. P(H on 11th toss) = P(T on 11th toss) = 0.5 ˆ The coin is not “due” for a tail. ˆ The common misunderstanding of the LLN is that random processes are supposed to compensate for whatever happened in the past; this is just not true and is also called gambler’s fallacy (or law of averages). Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 14 / 15 Conclusion Summary We define the terms ˆ Overview of the course ˆ Sample space, Event, Mutually exclusive events. ˆ Define the axiomatic definition of probability. Next Lecture ˆ few properties and propositions related to the probability ˆ Solve the problems based on these propositions and properties. ˆ conditional probability ˆ Bayes’s theorem Reference: Sheldon Ross, A First Course in Probability, 7th Edition, Pearson, 2006 Thank you Dr. Tapan Kumar Hota Mathematics and Statistical Foundations for - Machine Learning (FIC 504),- Data Science (FIC 506),-Cyber Security (FIC 507) 15 / 15

Use Quizgecko on...
Browser
Browser