Inference in Bayesian Networks PDF

Inference in Bayesian networks Artificial intelligence Kristóf Karacs PPKE-ITK Recap n Bayesian networks n Combination of evidence n Types of connections n d-separation 1 Outline n Efficient inference ¨ d-separation theorem ¨ Chain rule n Quantitative inference n Using joint distributions n Variable elimination n Approximate inference Theorem n If A and B are d-separated given an evidence e, then P(A | e) = P(A | B, e) n Enables efficient inference 2 Chain rule n Variables: V1, …, Vn n Values: v1, …, vn P = , = ,…, = = = = Using the chain rule n P(ABCD) = P(A=true, B=true, C=true, D=true) n P(ABCD) = A P(A) P(B) B P(D|ABC) P(ABC) = P(D|C) P(ABC) = C P(C | A,B) P(D|C) P(C|AB) P(AB) = P(D|C) P(C|AB) P(A) P(B) D P(D | C) 3 Icy roads n Inspector Smith is waiting for Holmes and Watson, who are driving (separately) to meet him. It is winter. His secretary tells him that Watson has had an accident. He says, “It must be that the roads are icy. I bet that Holmes will have an accident too. I should go to lunch.” But, his secretary says, “No, the roads are not icy, look at the window.” So, he says, “I guess I better wait for Holmes.” Conditional Probability Tables (CPT) n I: Road is icy n H: Holmes crashes P(I=T) P(I=F) n W: Watson crashes I 0.3 0.7 H W P(H=T | I) P(H=F | I) P(W=T | I) P(W=F | I) I=T 0.8 0.2 I=T 0.8 0.2 I=F 0.1 0.9 I=F 0.1 0.9 4 Icy roads P(I)= 0.3 I P(H | I) P(W | I) I 0.8 H W I 0.8 ¬I 0.1 ¬I 0.1 n P(W) = P(W | I) P(I) + P(W | ¬I) P(¬ I) = 0.8 * 0.3 + 0.1 * 0.7 = 0.31 Icy roads P(I)= 0.3 I P(H | I) P(W | I) I 0.8 H W I 0.8 ¬I 0.1 ¬I 0.1 n P(I | W) = P(W | I) P(I) / P(W) = 0.8 * 0.3 / 0.31 = 0.77 5 Icy roads P(I)= 0.3 I P(H | I) P(W | I) I 0.8 H W I 0.8 ¬I 0.1 ¬I 0.1 n P(H | W) = = P(H | W,I) P(I | W) + P(H | W,¬I) P(¬I | W) = P(H | I) P(I | W) + P(H | ¬I) P(¬I | W) = 0.8 * 0.77 + 0.1 * 0.23 = 0.639 Icy roads P(I)= 0.3 I P(H | I) P(W | I) I 0.8 H W I 0.8 ¬I 0.1 ¬I 0.1 n P(H | W, ¬I) = P(H | ¬I) = 0.1 6 Wet lawns n “Sherlock Holmes wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson’s lawn and he sees it is wet as well. So, he concludes, it must have rained.” Wet lawns P(S)= 0.1 S R P(R)= 0.2 P(W | R) P(H | R,S) H W R 1.0 R,S 1.0 ¬R 0.2 R,¬S 1.0 ¬R,S 0.9 n P(R | H), P(S | H), P(W | H) ¬R,¬S 0.1 n P(R | HW), P(S | HW) 7 Types of inference n Exact inference n Approximate inference Possible queries n P(X=x0 | E=e) n What value of x maximizes P(X=x | E=e) ? n Probability distribution Pr(X | E=e) 8 Using joint distribution n Summing over variables not involved =∑ , , , = = = ∧ = ∧ = ∧ = ∈ ( ) ∈ ( ) ∈ ( ) , ∑ , , , = =∑ , , , Variable elimination A B C D = , , , = = = 9 Variable elimination A B C D = Variable elimination A B C D = ∑ ∑ 10 Variable elimination A B C D = ( ) Variable elimination B C D = ( ) 11 Variable elimination C D = ( ) Variable elimination algorithm Given a Bayesian network and an elimination order for the non-query variables, compute For i = m downto 1 ¨ Remove all factors that mention Xi ¨ Multiply those factors, getting a value for each combination of mentioned variables ¨ Sum over Xi ¨ Put this new factor into the factor set 12 Example Pr ( d ) = å A, B , L ,T , S ,V Pr ( d | a, b ) Pr ( a | t , l ) Pr ( b | s ) Pr ( l | s ) Pr ( s ) Pr ( t | v ) Pr ( v ) Example Pr ( d ) = å A, B , L ,T , S ,V Pr ( d | a, b ) Pr ( a | t , l ) Pr ( b | s ) Pr ( l | s ) Pr ( s ) Pr ( t | v ) Pr ( v ) = å Pr ( d | a, b ) å Pr ( a | t , l ) å Pr ( b | s ) Pr ( l | s ) Pr ( s ) å Pr ( t | v ) Pr ( v ) A, B L ,T S V = å Pr ( d | a, b ) å Pr ( a | t , l ) f1 ( t ) å Pr ( b | s ) Pr ( l | s ) Pr ( s ) A, B L ,T S = å Pr ( d | a, b ) å f 2 ( b, l )å Pr ( a | t , l ) f1 ( t ) A, B L T = å Pr ( d | a, b ) å f 2 ( b, l ) f 3 ( a, l ) = åå Pr ( d | a, b ) f 4 ( a, b ) A, B L A B 13 Variable elimination n Generally requires exponential time (O(n2 bk)) n Bad elimination order can generate huge factors n Finding the best one is NP-hard ¨ Heuristic: choose the variable that results in smallest next factor (greedy method) n Linear time for singly connected networks (polytree) ¨ There is only one (undirected) path between any two nodes ¨ Always eliminate variables with no parents Exercise 14 Inference in multiply connected DAGs n Clustering ¨ Transforms the network to a probabilistically equivalent polytree by joining certain nodes in the network ¨ Useful when computing many a posteriori probabilities n Stochastic simulation (Monte Carlo) ¨ Estimates the probabilities by generating samples using the probability distribution defined by the network Clustering P(C)= 0.5 C P(R | C) P(S | C) C 0.8 R S C 0.1 ¬C 0.2 ¬C 0.5 W P(W | R,S) R,S 1.0 R,¬S 1.0 ¬R,S 0.9 ¬R,¬S 0.1 15 Clustering P(C)= 0.5 C P(R+S = x | C) R,S R,¬S ¬R,S ¬R,¬S S+R C 0.08 0.72 0.02 0.18 ¬C 0.1 0.1 0.4 0.4 P(W | R,S) W R,S 1.0 R,¬S 1.0 ¬R,S 0.9 ¬R,¬S 0.1 Monte Carlo (sampling) n Iterative sampling by making draws for each variable ¨ Draws are based on CPTs ¨ Start from root nodes ¨ For children use the drawn values of parents n After many rounds relative frequencies can be calculated 16 Monte Carlo P(C)= 0.5 C P(R | C) P(S | C) C 0.8 R S C 0.1 ¬C 0.2 ¬C 0.5 W P(W | R,S) P(W | C) = P(C,W) / P(C) R,S 1.0 P*(W | C) = #C,W / #C R,¬S 1.0 ¬R,S 0.9 ¬R,¬S 0.1 Importance sampling n Problem ¨ Rare events will not be well represented n Solution: Importance sampling ¨ Considers biased distributions (towards rare values) ¨ Output is weighted to correct for the bias ¨ Weights are determined by likelihood ratios ¨ Fast convergence ¨ Can handle huge networks 17 Exercises n Wet Lawns with numbers n Variable elimination n Monte Carlo Outline n Efficient inference ¨ D-separation theorem ¨ Chain rule n Quantitative inference n Using joint distributions n Variable elimination n Multiply connected networks 18

Inference in Bayesian Networks PDF

Document Details

Tags

Related

Summary

Full Transcript