Lecture 6 - Probabilistic Reasoning PDF

DD2380 Artificial Intelligence Probabilistic Reasoning (We start at 15:15) André Pereira Slide Credits Based on original slides from Patric Jensfelt and Iolanda Leite, KTH Based partly on materials from http://ai.berkeley.edu Kevin Murphy, MIT, UBC, Google Danica Kragic, KTH W. Burgard, C. Stachniss, M. Benewitz and K. Arras, when at Albert-Ludwigs-Universität Freiburg Reading Instructions Chapters 13-15, Russel & Norvig Outline Probabilities Motivation Notation and Recap Bayes Rule Conditional Independence Probabilistic Graphical Models Bayesian Networks Sequential Data Markov Models (next Lecture) Hidden Markov Models (next Lecture) Motivation: Why We Use Probabilities? 5 What is Probability/Uncertainty Probability quantifies the likelihood of an event happening in the face of uncertainty. Uncertainty plays an important role in sensor interpretation sensor fusion map making path planning self-localization control etc… Real-World Examples – Autonomous Car Real-World Applications – Autonomous Car Cross intersection safely? – Observations from car Sensor models Statistic from different roads Weather models … – Observations from others – Can I cross with 99% safety? 99.99999% safety? 2024-11-12 Footer 7 Diagnose Diseases Real-World Applications – Diagnose Diseases Doctor knows How common a certain disease is Connection with factors such as age, sex, habits… Connection with measures, e.g., temperature … Observe Diagnose 2024-11-12 Footer 8 Pervasive Power of Probabilities Probability Recap 1/3 Probability of event X: p(X) p(X) ∈ [0,1] (i. e. , 0 ≤ p(X) ≤ 1) 1 = ∑𝑎𝑎𝑙𝑙𝑙𝑙 𝑋𝑋 p(X) p(¬X) the probability that X is false. p(X) = 1 – p(¬X) Joint probability of X AND Y: p(X,Y) Conditional probability of X GIVEN Y: p(X|Y) Probability Recap 2/3 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟: 𝑝𝑝(𝑋𝑋, 𝑌𝑌) = 𝑝𝑝(𝑌𝑌|𝑋𝑋)𝑝𝑝(𝑋𝑋) 𝑆𝑆𝑆𝑆𝑆𝑆 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚): 𝑝𝑝(𝑋𝑋) = ∑𝑌𝑌 𝑝𝑝(𝑋𝑋, 𝑌𝑌) Sum Rule (Marginalization) T P T W P hot 0.5 hot sun 0.4 𝑝𝑝(𝑇𝑇) = 𝑝𝑝(𝑇𝑇, 𝑊𝑊) cold 0.5 𝑊𝑊 hot rain 0.1 cold sun 0.2 W P cold rain 0.3 sun 0.6 𝑝𝑝(𝑊𝑊) = 𝑝𝑝(𝑇𝑇, 𝑊𝑊) 𝑇𝑇 rain 0.4 𝐼𝐼𝐼𝐼 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔: 𝑝𝑝(𝑋𝑋) = 𝑝𝑝(𝑋𝑋, 𝑌𝑌) 𝑌𝑌 Marginal distributions Law of Total Probability (conditioning) Combining p(X)= p(X,Y) (𝑠𝑠𝑠𝑠𝑠𝑠 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟) Y 𝑝𝑝 𝑋𝑋, 𝑌𝑌 = 𝑝𝑝 𝑋𝑋 𝑌𝑌 𝑝𝑝 𝑌𝑌 (𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟) Gives 𝑝𝑝(𝑋𝑋) = 𝑝𝑝(𝑋𝑋|𝑌𝑌)𝑝𝑝(𝑌𝑌) 𝑌𝑌 Conditional Probability Given 2 events A and B with P(B)≠0, the conditional probability of A given B is denoted P(A|B) 𝑃𝑃(𝐴𝐴∩𝐵𝐵) and defined as: 𝑃𝑃 𝐴𝐴 𝐵𝐵 = 𝑃𝑃(𝐵𝐵) 𝑃𝑃(𝐴𝐴) 𝑃𝑃 𝐵𝐵 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝐵𝐵 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑃𝑃 𝐴𝐴 ∩ 𝐵𝐵 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝑜𝑜𝑜𝑜 𝐴𝐴 𝑎𝑎𝑎𝑎𝑎𝑎 𝐵𝐵 Conditional Probability (Weather Example) 𝑃𝑃(𝑊𝑊=𝑠𝑠,𝑇𝑇=𝑐𝑐) 0.2 𝑃𝑃 𝑊𝑊 = 𝑠𝑠 𝑇𝑇 = 𝑐𝑐 = = = 0.4 T W P 𝑃𝑃(𝑇𝑇=𝑐𝑐) 0.5 hot sun 0.4 hot rain 0.1 cold sun 0.2 Sum Rule cold rain 0.3 P(W=s, T=c) + P(W=r,T=c)=0.2 + 0.3 = 0.5 Conditional Dependence Applications in Artificial Intelligence Natural Language Processing: Understanding pronouns. In the sentence “There's a cat and a dog. John approached it because he heard barking.“ The likelihood that "it" refers to "dog" depends on the context provided by "heard barking.“ Robotics Path planning must consider obstacles. A cluttered environment reduces the likelihood of direct navigation. Computer Vision Recognizing street signs. If the system detects a bright backdrop typical of signboards, it also anticipates contrasting dark regions indicative of text or symbols. Recognizing one feature increases the likelihood of the other being present. Recognizing Street Signs Example Knowing what to look gives an idea of what to expect to measure We could expect both very bright and very dark pixels to belong to street signs Probabilistic Inference Compute a desired probability from other known probabilities (e.g., conditional from joint) We generally compute conditional probabilities P(on time | no reported accidents) = 0.90 These represent the agent’s beliefs given the evidence Probabilities change with new evidence: P(on time | no accidents, 5 a.m.) = 0.95 P(on time | no accidents, 5 a.m., raining) = 0.80 Observing new evidence causes beliefs to be updated Bayes Rule Allows us to update our beliefs about the probability of an event, given new information 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃(𝐴𝐴) 𝑃𝑃 𝐴𝐴 𝐵𝐵 = 𝑃𝑃(𝐵𝐵) P(A∣B): Posterior (Probability of A given B is observed) P(B∣A): Likelihood (Probability of observing B given A is true) P(A): Prior (Initial probability of A) P(B): Evidence (Overall probability of observing B) Why is it useful? Probability of A given some evidence B can be expressed in factors that are sometimes easier to determine. Foundation of many AI systems Bayes Rule Derivation (2 different options) 𝑃𝑃(𝐴𝐴,𝐵𝐵) 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃 𝐴𝐴 𝐵𝐵 = (𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝) 𝑃𝑃(𝐵𝐵) Use the product rule to replace the joint probability with 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 𝑅𝑅𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑡𝑡𝑡𝑡 𝑃𝑃 𝐴𝐴, 𝐵𝐵 = 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 Divide 𝑃𝑃 𝐴𝐴 𝐵𝐵 𝑃𝑃 𝐵𝐵 = 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 𝑏𝑏𝑏𝑏 𝑃𝑃 𝐵𝐵 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃(𝐴𝐴) 𝑇𝑇𝑇𝑇 𝑔𝑔𝑔𝑔𝑔𝑔 𝑃𝑃 𝐴𝐴 𝐵𝐵 = 𝑃𝑃(𝐵𝐵) Bayes Rule using Normalization 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃(𝐴𝐴) 𝑃𝑃 𝐴𝐴 𝐵𝐵 = 𝑷𝑷(𝑩𝑩) 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑃𝑃 𝐵𝐵 𝑏𝑏𝑏𝑏 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝐴𝐴: 𝑃𝑃(𝐵𝐵) = 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 𝐴𝐴 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃(𝐴𝐴) 𝑃𝑃 𝐴𝐴 𝐵𝐵 = ∑𝐴𝐴 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 1 With η = , ∑𝐴𝐴 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐴𝐴 𝑷𝑷 𝑨𝑨 𝑩𝑩 = η 𝑷𝑷 𝑩𝑩 𝑨𝑨 𝑷𝑷(𝑨𝑨) Page 493 of the book (section 13.3) for a concrete example Bayes Rule Example Vision system for detecting zebras Z Prior: p(Z)=0.02 (zebra in 2% of images) Detector for “stripey areas” gives observations, O Detector gives yes/no Detector performance 𝑝𝑝(𝑂𝑂|𝑍𝑍) = 0.8(true pos) 𝑝𝑝(𝑂𝑂|¬𝑍𝑍) = 0.1(false pos, e.g. gate) Exercise 1: Calculate p(Z|O) What does it represent? The result of that probability? Bayes Rule Example Solution 𝑝𝑝(𝑍𝑍|𝑂𝑂) represents probability that there is a zebra if our detector says there is one. 𝑃𝑃 𝑂𝑂 𝑍𝑍 𝑃𝑃(𝑍𝑍) To calculate 𝑃𝑃 𝑍𝑍 𝑂𝑂 = 𝑃𝑃(𝑂𝑂) 𝑝𝑝(𝑍𝑍) = 0.02 (zebra in 2% of images) 𝑝𝑝(𝑂𝑂|𝑍𝑍) = 0.8 (true pos) 𝑝𝑝(𝑂𝑂|¬𝑍𝑍) = 0.1 (false pos, e.g., gate) 0.8 O 0.02 Z 0.2 ¬Ο 𝑃𝑃 𝑂𝑂 𝑍𝑍 𝑃𝑃(𝑍𝑍) 0.8 0.02 𝑂𝑂 ¬𝑍𝑍 𝑃𝑃(¬𝑍𝑍) = = 0.1404 𝑃𝑃 𝑂𝑂 𝑍𝑍 𝑃𝑃 𝑍𝑍 +𝑃𝑃 0.8 0.02+0.1 0.98 0.1 O 0.98 ¬Z 0.9 ¬Ο Bayes Rule Example Discussion p(Z|O) = 0.1404 Intuition tells most people that the detector is much better than this, i.e. we would expect to see a much higher p(Z|O) since the detector is correct in 80% of the cases However, only 1 out of 50 images have a zebra → 49 out of 50 do not contain a zebra + detector not perfect Failing to account for negative evidence properly is a typical failing of human intuitive reasoning Conditional Independence Unconditional (absolute) independence very rare (why?) Conditional independence is our most basic and robust form of knowledge about uncertain environments. Example: P (Traffic) P (Umbrella) P (Rain) Rain Rain causes traffic AND people wearing umbrellas. T is conditionally independent of U given R Traffic Umbrella Conditional independence Formulas If X is conditionally independent of Y given Z 𝑃𝑃(𝑋𝑋|𝑌𝑌, 𝑍𝑍) = 𝑃𝑃(𝑋𝑋|𝑍𝑍) Which also means 𝑃𝑃 𝑋𝑋, 𝑌𝑌 𝑍𝑍 = P(X|Y,Z)P(Y|Z) {product rule: P(X,Y)=P(X∣Y)×P(Y)} = 𝑃𝑃 𝑋𝑋 𝑍𝑍 𝑃𝑃(𝑌𝑌|𝑍𝑍) {conditional independence} NOTE: Not the same as P(𝑋𝑋, 𝑌𝑌) = 𝑃𝑃 𝑋𝑋 𝑃𝑃(𝑌𝑌) Conditional Independence Example P(Toothache, Cavity, Catch) If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache 𝑃𝑃(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) = 𝑃𝑃(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) Catch is conditionally independent of Toothache given Cavity. Equivalent statements: 𝑃𝑃(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 , 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) = 𝑃𝑃(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) 𝑃𝑃(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) = 𝑃𝑃(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) 𝑃𝑃(𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 | 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) Probability Recap 3/3 𝑃𝑃 𝑥𝑥, 𝑦𝑦 𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏: 𝑃𝑃 𝑥𝑥 𝑦𝑦 = 𝑃𝑃 𝑦𝑦 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑: 𝑝𝑝(x, y) = 𝑝𝑝(y|x)𝑝𝑝(x) 𝑛𝑛 𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂𝐂 𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑: 𝑃𝑃 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 = 𝑃𝑃 𝑋𝑋1 𝑃𝑃 𝑋𝑋2 𝑋𝑋1 𝑃𝑃 𝑋𝑋3 𝑋𝑋1 , 𝑋𝑋2 … = 𝑃𝑃 𝑋𝑋𝑖𝑖 𝑋𝑋1 , … , 𝑋𝑋𝑖𝑖−1 𝑖𝑖=1 X,Y independent if and only if ∀𝑥𝑥, 𝑦𝑦 ∶ 𝑃𝑃 𝑥𝑥, 𝑦𝑦 = 𝑃𝑃 𝑥𝑥 𝑃𝑃(𝑦𝑦) X and Y are conditionally independent given Z if and only if: X⫫Y|Z ∀𝑥𝑥, 𝑦𝑦, 𝑧𝑧 ∶ 𝑃𝑃 𝑥𝑥, 𝑦𝑦|𝑧𝑧 = 𝑃𝑃 𝑥𝑥|𝑧𝑧 𝑃𝑃(𝑦𝑦|𝑧𝑧) Break Probabilistic Graphical Models Compact representation of the joint distribution over a set of variables Graphical representation that helps analyze and structure probability information A Each variable is encoded as a node Conditional independence assumptions coded as arcs Here: a Bayesian network (directed acyclic graph (DAG)) B C D E Bayesian Network A root node A B C D E Bayesian Network A root node B, D, E leaf nodes A B C D E Bayesian Network A root node B, D, E leaf nodes A A parent to B and C B C D E Bayesian Network A root node B, D, E leaf nodes A A parent to B and C B and C children of A B C D E Bayesian Network A root node B, D, E leaf nodes A A parent to B and C B and C children of A A ancestor of D and E B C D E Bayesian Network A root node B, D, E leaf nodes A A parent to B and C B and C children of A A ancestor of D and E B C A “causes” B and C Value of A influences value of B and C Arrow → “direct influence over” A has direct influence over B and C D E Bayesian Network B and C are dependent However, they are conditionally independent given A A Q: Write down the formula that relates A, B and C 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = ? ? 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 𝐶𝐶, 𝐴𝐴 𝑃𝑃 𝐶𝐶 𝐴𝐴 B C 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐶𝐶 𝐴𝐴 D E Bayesian Network B and C are dependent However, they are conditionally independent given A Rain Q: Write down the formula that relates A, B and C 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = ? ? Traffic Umbrella 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 𝐶𝐶, 𝐴𝐴 𝑃𝑃 𝐶𝐶 𝐴𝐴 𝑃𝑃 𝐵𝐵, 𝐶𝐶 𝐴𝐴 = 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐶𝐶 𝐴𝐴 If we do not know A, knowing something about B will tell us something about C (tells us about A which tells us about C) but if we know A then knowing B does not tell us anything more about C than we already knew because of A. Bayesian network C depends on A A B C D E Bayesian network C depends on A E depends on A and C A B C D E Bayesian network C depends on A E depends on A and C A However, E is conditionally independent of A given C That is, C captures all the information in A relevant to determine E B C D E Flow of Probabilistic Influence (when can X influence Y?) Flow of Probabilistic Influence (when can X influence Y?) Flow of Probabilistic Influence (when can X influence Y?) Joint Distribution A probabilistic model of a domain must represent the Joint Probability Distribution (JPD) i.e., the probability of every possible event as defined by the combination of the A values of all the variables. Bayesian networks achieve compactness by factoring the JPD into local, conditional distributions for each variable given its parents. Let’s factorize P(𝐴𝐴, 𝐵𝐵, 𝐶𝐶, 𝐷𝐷) Remember the chain rule 𝑛𝑛 B C 𝑃𝑃 𝑋𝑋𝑖𝑖 𝑋𝑋1 , … , 𝑋𝑋𝑖𝑖−1 𝑖𝑖=1 Tip: We should work from the top and factor out A, then B, C and D D Joint distribution example D cond. indep. of A,B given C 𝑃𝑃 (𝐴𝐴, 𝐵𝐵, 𝐶𝐶, 𝐷𝐷) = 𝑃𝑃(𝐴𝐴) 𝑃𝑃 (𝐵𝐵|𝐴𝐴) 𝑃𝑃(𝐶𝐶|𝐴𝐴, 𝐵𝐵) 𝑃𝑃(𝐷𝐷|𝐴𝐴, 𝐵𝐵, 𝐶𝐶) A C cond. indep. of B given A = 𝑃𝑃 𝐴𝐴 𝑃𝑃 𝐵𝐵 𝐴𝐴 𝑃𝑃 𝐶𝐶 𝐴𝐴 𝑃𝑃 𝐷𝐷 𝐶𝐶 B C 𝐼𝐼𝐼𝐼 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔: 𝑖𝑖=𝑛𝑛 𝑃𝑃 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 = 𝑃𝑃 𝑋𝑋𝑖𝑖 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑋𝑋𝑖𝑖 ) 𝑖𝑖=1 We could have used the chain or product rule in any order D We can make use of the conditional independencies Other factorizations are possible, but not as compact Exercise 2: Factorize the graph 𝑝𝑝 𝐺𝐺𝐺, 𝐺𝐺𝐺, 𝐺𝐺𝐺, 𝐺𝐺𝐺, 𝐺𝐺𝐺 = ? remember 𝑖𝑖=𝑛𝑛 𝑃𝑃 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 = 𝑃𝑃 𝑋𝑋𝑖𝑖 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑋𝑋𝑖𝑖 ) 𝑖𝑖=1 = 𝑝𝑝(𝐺𝐺𝐺) 𝑝𝑝(𝐺𝐺𝐺|𝐺𝐺𝐺) 𝑝𝑝(𝐺𝐺𝐺|𝐺𝐺𝐺) 𝑝𝑝(𝐺𝐺𝐺|𝐺𝐺𝐺) 𝑝𝑝(𝐺𝐺𝐺|𝐺𝐺𝐺, 𝐺𝐺𝐺, 𝐺𝐺𝐺) Zebra Example Vision system for detecting zebras Z Prior: p(Z)=0.02 (zebra in 2% of images) Detector for “stripey areas” gives observations, O Detector gives yes/no Detector performance 𝑝𝑝(𝑂𝑂|𝑍𝑍) = 0.8(true pos) 𝑝𝑝(𝑂𝑂|¬𝑍𝑍) = 0.1(false pos, e.g. gate) Draw as Bayesian network Order? : The existence of the zebra influences the observation and not vice versa Z O Alarm example You have an alarm. Reacts to burglaries but sometimes triggered by small earthquakes. 2 neighbors John, Mary call when they hear the alarm (not earthquake or burglary!) John calls almost every time there is an alarm but can confuse it with a phone Mary plays loud music and sometimes misses it but rarely mix other things with it Draw Bayesian network! What variables? Simplification ‒ J&M calling does not depend on what triggered the alarm Conditional Probability Table ‒ Values not specified in the text before. Made up here. Final Exercise: Alarm example calculation Calculate P(J, M, A, ¬B, ¬E) J : JohnCalls M : MaryCalls A : Alarm B : Burglary E : Earthquake Remember that p(¬X) = 1-p(X) and 𝑖𝑖=𝑛𝑛 𝑃𝑃 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 = 𝑃𝑃 𝑋𝑋𝑖𝑖 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝑋𝑋𝑖𝑖 ) 𝑖𝑖=1 Final Exercise: Alarm example calculation Calculate P(J, M, A, ¬B, ¬E) J : JohnCalls M : MaryCalls A : Alarm B : Burglary E : Earthquake P 𝐽𝐽, 𝑀𝑀, 𝐴𝐴, ¬𝐵𝐵, ¬𝐸𝐸 = 𝑃𝑃 ¬𝐵𝐵 𝑃𝑃 ¬𝐸𝐸 𝑃𝑃 𝐴𝐴 ¬𝐵𝐵, ¬𝐸𝐸 𝑃𝑃 𝐽𝐽 𝐴𝐴 𝑃𝑃 𝑀𝑀 𝐴𝐴 = 0.0006 0.99 0.98.001 0.90 0.70 Tip When constructing the network try to: use ordering based on cause → symptom (causal) rather than symptom → cause (diagnostic) Need to specify fewer numbers and numbers are easier to get Ex: Alarm → MaryCalls means we have to specify p(MaryCalls|Alarm) which is a lot easier than p(Alarm|MaryCalls) for MaryCalls → Alarm Key points on Bayesian Networks Representation of the joint probability distribution Through factorization, Bayesian Networks provide a compact representation of the joint probability distribution over a set of variables. This makes understanding and constructing the joint distribution more manageable. Encoding of a collection of independence statements The structure of a Bayesian Network encodes conditional independence relations among variables. If two nodes (variables) in the network are conditionally independent given some evidence, then no direct connection exists between them, given the evidence. These independence statements can simplify calculations significantly. Helpful in designing inference procedure Knowing the structure and the conditional probability distributions, one can perform various inferential tasks using a Bayesian Network, such as querying the probability of a particular variable given evidence. Reasoning over Time or Space Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring … Need to introduce time (or space) into our models Sequential data – Example 1 Measurement of time series Example: Sign recognition Measure: drawn path Want: characters Sequential data – Example 2 Measurement of time series Example: Speech recognition Measure: audio signal Want: Words/sentences Next Lecture Additional (highly recommended) study material http://learn-ai.web.app/ Page developed by one of the previous course TAs with detailed (and visually appealing) explanations of next wednesday’s lecture exercises + Forward algorithm Visit the page before the HMM Tutorial sessions Stamp Tutorial on Canvas containing the algorithmic implementations of these algorithms Don’t forget to complete the quiz on Bayesian Networks to consolidate the knowledge from today’s lecture End of Taming Uncertainty Part 1/2

Lecture 6 - Probabilistic Reasoning PDF

Document Details

Tags

Related

Summary

Full Transcript