Lec-1-Introduction Practical AI PDF

EEC 289Q Practical AI Houman Homayoun Machine Learning and AI AI does not always imply a learning- based system: Symbolic reasoning Rule based system Tree search etc. Learning based system → learned based on the data → more flexibility, good at solving pattern recognition problems. History of Machine Learning 1957: Perceptron algorithm (implemented as a circuit!) 1959: Arthur Samuel wrote a learning-based checkers program that could defeat him 1969:Minsky and Papert’s book Perceptrons (limitations of linear models) 1980s :Some foundational ideas Connectionist psychologists explored neural models of cognition 1984: Leslie Valiant formalized the problem of learning as PAC learning 1988: Backpropagation (re-)discovered by Geoffrey Hinton and colleagues 1988: Judea Pearl’s book Probabilistic Reasoning in Intelligent Systems introduced Bayesian networks History of Machine Learning 1990s: the “AI Winter”, a time of pessimism and low funding Markov chain Monte Carlo variational inference kernels and support vector machines Boosting convolutional networks reinforcement learning 2000s: applied AI fields (vision, NLP, etc.) adopted ML 2010s: deep learning 2010 - 2012: neural nets smashed previous records in speech-to-text and object recognition increasing adoption by the tech industry 2016: AlphaGo defeated the human Go champion 2018-now: generating photorealistic images and videos 2020: GPT3 language model Now: increasing attention to ethical and societal implications Types Of Machine Learning Supervised learning o Data and labels (targets) available for training o Example: cats vs. dogs classification Reinforcement learning (semi-supervised) o Learning system interacts with the world and learns to maximize a scalar reward signal o Example: game playing (chess, backgammon, Go,...) Unsupervised learning o No information about intended outputs o Data exploration – finding patterns, tendencies,... o Example: clustering Machine Learning Applications Computer vision: Object detection, semantic segmentation, pose estimation Machine Learning Applications NLP: language translation, predictive text, personal assistants, spam filtering. Machine Learning Applications Regression: time series data forecasting (stock market, weather forecast, …) Mathematics for Machine Learning Machine learning draws heavily on calculus, probability, and linear algebra Examples of Linear Algebra and Basic Math in Machine Learning Linear Regression: Linear algebra is used to find the line of best fit for the data in a linear regression model. Principal Component Analysis (PCA): Linear algebra is used to find the principal components of a dataset in PCA. Gradient Descent: Calculus is used to find the minimum of a cost function in gradient descent algorithm. Bayes' Theorem: Probability theory is used to calculate the probability of an event given prior knowledge in Bayesian Networks. Expected Value The expected value or expectation of a variable 𝑋 with respect to a probability distribution 𝑃 𝑋 is the average (mean) when 𝑋 is drawn from 𝑃 𝑋 It is calculated as 𝔼𝑋~𝑃 𝑋 = ෍ 𝑃 𝑋 𝑋 𝑋 Mean is the most common measure of central tendency of a distribution For a random variable: μ = 𝔼 𝑋 = σ𝑖 𝑃 𝑥𝑖 ∙ 𝑥𝑖 1 This is similar to the mean of a sample of observations: μ = σ𝑖 𝑥𝑖 𝑁 Variance Variance gives the measure of how much the values of the variable 𝑋 deviate from the expected value as we sample values of X from 𝑃 𝑋 2 1 2 Var 𝑋 = 𝔼 𝑋 − 𝔼 𝑋 = ෍ 𝑥𝑖 − μ 𝑁−1 𝑖 When the variance is low, the values of 𝑋 cluster near the expected value Variance is commonly denoted with 𝜎 2 Example of Variance 30 25 Mean: 10 + 12 + 13 + 17 + 20 + 24 20 = 16 6 15 10 5 0 Example of Variance 30 Mean 25 16 20 15 10 5 0 Example of Variance 30 Mean 16 25 20 15 10 10 – 16 = −6 5 0 Example of Variance 30 Mean 16 Item 1 -6 25 20 15 10 10 – 16 = −6 5 0 Example of Variance 30 Mean 16 Item 1 -6 25 Item 2 -4 20 15 12 – 16 = −4 10 5 0 Example of Variance 30 Mean 16 Item 1 -6 25 Item 2 -4 20 Item 3 -3 15 Item 4 1 Item 5 4 10 Item 6 8 5 0 Example of Variance 30 Mean 16 2 Item 1 -6 −6 = 36 25 Item 2 -4 20 Item 3 -3 15 Item 4 1 Item 5 4 10 Item 6 8 5 0 Example of Variance 30 Mean 16 Item 1 -6 36 25 2 Item 2 -4 −4 = 16 20 Item 3 -3 15 Item 4 1 Item 5 4 10 Item 6 8 5 0 Example of Variance 30 Mean 16 Item 1 -6 36 25 Item 2 -4 16 20 Item 3 -3 9 15 Item 4 1 1 Item 5 4 16 10 Item 6 8 64 5 0 Example of Variance 30 Mean 16 Item 1 -6 36 25 Item 2 -4 16 20 Item 3 -3 9 15 Item 4 1 1 Item 5 4 16 10 Item 6 8 64 5 142 SUM 0 Example of Variance 30 25 20 SUM 142 15 Variance: # item - 1 6-1 10 5 0 Standard Deviation Standard Deviation = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 Example SUM 142 30 Variance: # item - 1 6-1 = 28.4 25 20 Standard Deviation: 15 𝟐𝟖. 𝟒 = 𝟓. 𝟑 10 5 0 Example 30 25 20 Standard deviation = 5.3 15 Mean = 16 10 5 0 Example 30 25 20 Smaller Standard deviation 15 Mean = 16 10 5 0 Sample vs Population Sample Population σ(𝑫𝒂𝒕𝒂 𝒑𝒐𝒊𝒏𝒕 − 𝑴𝒆𝒂𝒏)𝟐 σ(𝑫𝒂𝒕𝒂 𝒑𝒐𝒊𝒏𝒕 − 𝑴𝒆𝒂𝒏)𝟐 𝒏 −𝟏 𝒏 Bessel’s Correction Population and Samples Population: Size of N A sample of population: Size of n Population and Samples Mean: For Population: For Sample: We are calculating We are calculating Parameter Statistic σN𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑥𝑖 μ= 𝑥ҧ = 𝑁 𝑛 Population and Samples Variance: For Population: For Sample: We are calculating We are calculating Parameter Statistic 𝑁 2 𝑛 2 2 σ (𝑥 𝑖=1 𝑖 − μ) 2 σ 𝑖 (𝑥𝑖 − 𝑥) ҧ σ = 𝑆𝑛 = 𝑁 𝑛 Variance of Samples Biased Variance: 𝑛 2 2 σ 𝑖 (𝑥𝑖 − 𝑥) ҧ 𝑆𝑛 = 𝑛 Unbiased Variance: 2 σ𝑛𝑖 (𝑥𝑖 − 𝑥)ҧ 2 𝑆𝑛−1 = 𝑛−1 Variance of Samples Biased Variance: 𝑛 2 2 σ 𝑖 (𝑥𝑖 − 𝑥) ҧ 𝑆𝑛 = 𝑛 Smaller Larger Unbiased Variance: 2 σ𝑛𝑖 (𝑥𝑖 − 𝑥)ҧ 2 𝑆𝑛−1 = 𝑛−1 Larger Smaller So why divide by n-1 ? N = 14 μ n=3 So why divide by n-1 ? Small Difference between population mean and 𝑥ҧ sample mean N = 14 μ n=3 Samples So why divide by n-1 ? Huge Difference between population mean and sample mean 𝑥ҧ N = 14 μ n=3 Samples Example: Positive relationship Example: Negative relationship Covariance between X and Y 𝐶𝑜𝑣 𝑥, 𝑦 = 𝔼 𝑥 − 𝜇𝑥 ∗ (𝑦 − 𝜇𝑦 ) In case when 𝑥 > 𝜇𝑥 : 𝑦 > 𝜇𝑦 Positive Positive In case when 𝑥 < 𝜇𝑥 : 𝑦 < 𝜇𝑦 Negative Negative 𝐶𝑜𝑣 𝑥, 𝑦 is Positive Covariance between X and Y 𝐶𝑜𝑣 𝑥, 𝑦 = 𝔼 𝑥 − 𝜇𝑥 ∗ (𝑦 − 𝜇𝑦 ) In case when 𝑥 > 𝜇𝑥 : 𝑦 < 𝜇𝑦 Positive Negative In case when 𝑥 < 𝜇𝑥 : 𝑦 > 𝜇𝑦 Negative Positive 𝐶𝑜𝑣 𝑥, 𝑦 is Negative What is Covariance? Covariance presents the relationship between x and y. If there is a positive covariance between x and y, the when x increases, y tends to increase as well If there is a negative covariance between x and y, the when x increases, y tends to decrease instead What if x and y has units? In this case, the covariance does not tell us much regarding the numerical relationship between x and y. Moreover, the covariance does not cancel the unit. This is saying, when x and y are having units, the covariance would also have the unit. Correlation between X and Y 𝐶𝑜𝑣 𝑥, 𝑦 𝐶𝑜𝑟𝑟 𝑥, 𝑦 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 ∗ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑦) When calculating the correlation, we know the top part and the bottom part both have the same unit. Therefore, the correlation between two random variable does not have a unit. Furthermore, we know that the covariance of x and y is always smaller than bottom part. The range of correlation 𝐶𝑜𝑣 𝑥, 𝑦 𝐶𝑜𝑟𝑟 𝑥, 𝑦 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑥 ∗ 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑦) −1 ≤ 𝑐𝑜𝑟𝑟 𝑥, 𝑦 ≤ 1 When 𝑐𝑜𝑟𝑟 𝑥, 𝑦 = 1: x and y are perfectly positively corelated When 𝑐𝑜𝑟𝑟 𝑥, 𝑦 = 0: x and y are absolutely no association When 𝑐𝑜𝑟𝑟 𝑥, 𝑦 = −1: x and y are perfectly negatively corelated Correlation Coefficient sphweb.bumc.bu.edu Example (different R) Example X: 1 2 3 4 5 6 Y: 2 4 7 9 12 14 X Y XY X2 Y2 1 2 2 1 4 2 4 8 4 16 3 7 21 9 49 4 9 36 16 81 5 12 60 25 144 6 14 84 36 196 Sum 21 ∑X 48 ∑Y 211 ∑XY 91 ∑X2 490 ∑Y2 Sum 21 ∑X 48 ∑Y 211 ∑XY 91 ∑X2 490 ∑Y2 6 211 −21(48) R= 2 = 0.998 6∗91 − 21 [6∗490 − 482 ] R2 = = 0.996 16 R² = 0.9968 14 12 10 Y 8 6 4 2 0 0 1 2 3 4 5 6 7 X

Lec-1-Introduction Practical AI PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue