Adversarial Search Strategies in Games - PDF

Adversarial search - Strategies in games Artificial intelligence Kristóf Karacs PPKE-ITK Recap n What is intelligence? n Agent model n Problem solving by search ¨ Non-informed search strategies ¨ Informed search strategies ¨ Local and online search 1 Outline n Modeling two player games n Game theoretic value n Minimax search n Cutoff search n Pruning, alpha-beta search n Expectimax search Categorization of games n Number of players (2 or higher) n Competitive or cooperative n Zero sum (game theory) ¨ Total gains = Total losses n Discrete or continuous n Finite or infinite n Deterministic or stochastic n Perfect or partial information 2 Playing the game n Choosing the best move on each turn ¨ Episodic search (no backtracking) n Conventions ¨ Turns alternate ¨ Player 1 moves first Search problem n (S, S0, succ(): S ® P(S), F, V(): F ® ) ¨S a finite set of states (state includes player due to move) ¨ S0 initial state ¨ succ() follower states function ¨F terminal states ¨V value function for terminal states 3 Example: A trivial card game n Deal four playing cards out, face up n Players take cards alternating n The player with the highest even sum scores the amount Entire search space 4 Game theoretic value n Game theoretic value of a state is the value of a terminal state that will be reached if both players play optimally Let D = max depth of game tree For i = D to 1 For each node n at depth i If n is a terminal node, then = ( ) Else If Player 1 is due to move at node n, then = max ∈ Else (Player 2 must be due to move) = m ∈ Minimax search n Generalization of the classic divide and choose method to share gold/cake/etc. n Recursive search method (DFS like) ¨ For own moves choose the state that maximizes the game theoretic value ¨ For the moves of the opponent choose the state that minimizes the game theoretic value 5 Minimax algorithm n At first assign the values associated with terminal states n Then move the values toward the root node using minimax decision n Basically calculates game theoretic value GTV(S) = if (S is terminal) return V(S) else let { S1, S2, … Sk } = succ(S) let Vi = GTV(Si) for each i if (player-to-move(S) == 1) return max(Vi) else return min(Vi) Moving the scores 6 Moving the scores Moving the scores 7 Moving the scores Exercise: Nim n There are some piles of matches n On each turn one may remove any number of matches, but at least one from a single pile n The last person to remove a match loses (misère game) n In II-Nim, one begins with two piles, each with two matches ( ii ii ) 8 II-Nim state space ( _ , _ )-A ( _ , i )-A ( _ , ii )-A ( i , _ )-A ( i , i )-A ( i , ii )-A n Equivalent states due ( ii , _ )-A ( ii , i )-A ( ii , ii )-A to symmetry (e.g. ( _ , _ )-B ( _ , i )-B ( _ , ii )-B (_,ii)-A and (ii,_)-A) ( i , _ )-B ( i , i )-B ( i , ii )-B ( ii , _ )-B ( ii , i )-B ( ii , ii )-B ( _ , _ )-A ( _ , i )-A ( _ , ii )-A n Merge them using a ( i , i )-A ( i , ii )-A canonical description ( ii , ii )-A (e.g. left pile never ( _ , _ )-B ( _ , i )-B ( _ , ii )-B larger than right)! ( i , i )-B ( i , ii )-B ( ii , ii )-B II-Nim formal definition S = ( _ , _ )-A ( _ , i )-A ( _ , ii )-A ( i , i )-A ( i , ii )-A ( ii , ii )-A ( _ , _ )-B ( _ , i )-B ( _ , ii )-B ( i , i )-B ( i , ii )-B ( ii , ii )-B S0 = ( ii , ii )-A succ() = succ(_,i)-A = { (_,_)-B } succ(_,i)-B = { (_,_)-A } succ(_,ii)-A = { (_,_)-B , (_,i)-B } succ(_,ii)-B = { (_,_)-A , (_,i)-A } succ(i,i)-A = { (_,i)-B } succ(i,i)-B = { (_,i)-A } succ(i,ii)-A = { (_,i)-B (_,ii)-B (i,i)-B} succ(i,ii)-B = { (_,i)-A , (_,ii)-A (i,i)-A } succ(ii,ii)-A = { (_,ii)-B , (i,ii)-B } succ(ii,ii)-B = { (_,ii)-A , (i,ii)-A } F = ( _ , _ )-A ( _ , _ )-B V = V( _ , _ )-A = +1 V( _ , _ )-B = -1 9 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B (- -) B -1 (- i) B (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B +1 (- -) B -1 (- i) B (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 10 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 11 II-Nim Game Tree (ii ii) A (i ii) B -1 (- ii) B -1 (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A -1 (i ii) B -1 (- ii) B -1 (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 12 Real games n Search space is too large n Real-time decision requirement n Chess ¨ Branching factor is ~35 ¨ Allows for a 4-ply look ahead n Capacity: < 2 million states per move (at 10k states/sec for 3 minutes) n 354 = 1 500 625; 355 = 52 521 875 ¨ Average humans can look ahead 6-8 plies n Guaranteed solution not possible n Solution: heuristic evaluation function Cutoff search n Use an evaluation function ¨ Estimatethe guaranteed score ¨ Draw search space to a certain depth ¨ Depth chosen to limit the time taken n Put the estimated values at the end of paths n Propagate them to the top as before 13 Evaluation function n Estimates game theoretic value of a state n Enables comparing different states n Search + evaluation function ¨ Combines many estimates ® good for noise filtering Example: Scores in chess n Assigning weights to pieces ¨ Pawn ®1 ¨ Knight ® 3 ¨ Bishop ® 3 ¨ Rook ®5 ¨ Queen ® 9 n Position also matters in real-life evaluation functions 14 Example: Scores in chess n Black ¨ 5 pawns * 1 = 5 ¨ 1 bishop * 3 = 3 å =18 ¨ 2 rooks * 5 = 10 n White ¨ 5 pawns * 1 = 5 ¨ 1 rook * 5 = 5 å =10 n Net scores ¨ Black: 18-10 = 8 ¨ White: 10-18 = -8 Evaluation function for the example n Odd cards: zero n Even cards: actual value ¨ In this case the evaluation function chooses 10 … which is the worst choice 15 Problems with evaluation functions n Horizon effect (Hans Berliner, 1973) ¨ Good and bad possibilities in a search space deeper than the horizon cannot be taken into account ¨ Possible solution: reduce the number of initial moves to look at, thus pushing the horizon farther ¨ Can be negative and positive n Non-quiescent states à likely to change drastically ¨ Wild swings in the evaluation function ¨ E.g.: captures in chess when using the sample evaluation function ¨ Solution: expand the state until quiescent positions are reached Pruning n Visit as many board states as possible n Skip bad branches (prune them) ¨ Bestvalue is still worse than other branches ¨ Example: having your queen taken in chess n Alpha-beta pruning ¨ Can be used for entire search or cutoff search ¨ Recognize surely inferior branches 16 Idea of Alpha-Beta pruning MAX ³2 n The MIN-value (1) is already smaller than the MAX-value of the parent (2) MIN £2 =2 £1 n The MIN-value can only decrease further n The MAX-value is only allowed to increase n No point in computing MAX further below this 2 5 1 node Terminology Alpha-value n Temporary values at MAX ³2 ¨ MAX-nodes are called Beta-value Alpha-values ¨ MIN-nodes are called Beta-values MIN £2 =2 £1 MAX 2 5 1 17 Principles n If an Alpha-value is greater than or equal to the Beta-value of a descendant node, then no more children of the descendant need to be considered n If a Beta-value is less than or equal to the Alpha-value of a descendant node, then no more children of the descendant need to be considered The general cutoff rule ( )-a In example: let α = max(v1, v3, v1 ? ( )-b v5). If min(v6, v7)≤α, then we can v2 be certain that it is worthless searching the tree from the v3 ( )-a ? ? current node or the sibling on its right. ( )-b ? ? In general: if at a B-move node, let α = max of all A’s choices v4 ( )-a expanded on current path. Let β = min of B’s choices, including those at current node. Cutoff is v5 β ≤ α. ( )-b v6 In general: Converse rule at an ? A-move node. v7 Current Node 18 How much do we gain? n Assuming a uniform branching factor b, minimax examines O(bh) nodes ¨ So does alpha-beta in the worst-case n But: alpha-beta is sensitive to the order of nodes n The gain for alpha-beta is maximum when n the MIN children of a MAX node are ordered in decreasing backed up values n the MAX children of a MIN node are ordered in increasing backed up values n Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975] n But this requires an oracle n If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4) Alpha-Beta Pruning for Player 1 n Given a node N which can be chosen by Player 1, then if there is another node, X, along any path, such that (a) X can be chosen by Player 2 (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for Player 1 than N then the parent of N can be pruned. 19 Alpha-Beta Pruning for Player 1 n Given a node N which can be chosen by Player 2, then if there is another node, X, along any path, such that (a) X can be chosen by Player 1 (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for Player 1 than N then the parent of N can be pruned. Alpha-beta pruning for the four-card game Player 1 Player 2 20 Games with chance n Many games have an element of chance (e.g. backgammon) n Guaranteed scores can no longer be calculated n Solution: calculate expected scores using probability Expectimax Search n Based on minimax tree n For random events an extra node is added for each possible outcome that changes the possible board states after the event n Moving score values up through a chance node ¨ E(n) = å p(n)*s(n) 21 A simple game with chance n Deal four cards face up n Player 1 chooses a card n Player 2 throws a die ¨ If it’s a ‘six’, then player 2 chooses a card, swaps it with player 1’s and keeps player 1’s card ¨ If it’s not a ‘six’, then player 2 just chooses a card n Player 1 chooses next card n Player 2 takes the last card Expectimax Diagram 22 Expectimax Calculations Games Played by Computer n Games played perfectly ¨ Connect four, noughts & crosses (tic-tac-toe), draughts (checkers) ¨ Best move pre-calculated for each board state n Small number of possible board states n Games played at superhuman level ¨ Backgammon, chess, go ¨ Scrabble, Tetris n Games played badly ¨ Bridge, Ulti, soccer :) 23 Game complexity Game State-space Game-tree Branching factor complexity complexity Nine man’s morris ~ 1010 ~ 1050 10 Checkers ~ 1020 ~ 1031 2.8 Rubik’s cube ~ 1019 12 Chess ~ 1047 ~ 10123 35 Go (9x9) ~ 1038 Go (19x19) ~ 10171 ~ 10360 250 Gomoku (15x15) ~ 10105 ~ 1070 210 Summary n Modeling two player games n Game theoretic value n Minimax search n Cutoff search n Pruning, alpha-beta n Expectimax 24

Adversarial Search Strategies in Games - PDF

Document Details

Tags

Related

Summary

Full Transcript