Adversarial Search Strategies in Games - PDF
Document Details
Uploaded by SuperiorSard5855
Pázmány Péter Katolikus Egyetem
Kristóf Karacs
Tags
Summary
These lecture notes cover adversarial search strategies in games, including concepts such as minimax search, cutoff search, pruning, and expectimax search. The document also includes a categorization of games and an example of a trivial card game.
Full Transcript
Adversarial search - Strategies in games Artificial intelligence Kristóf Karacs PPKE-ITK Recap n What is intelligence? n Agent model n Problem solving by search ¨ Non-informed search strategies ¨ Informed search strategies...
Adversarial search - Strategies in games Artificial intelligence Kristóf Karacs PPKE-ITK Recap n What is intelligence? n Agent model n Problem solving by search ¨ Non-informed search strategies ¨ Informed search strategies ¨ Local and online search 1 Outline n Modeling two player games n Game theoretic value n Minimax search n Cutoff search n Pruning, alpha-beta search n Expectimax search Categorization of games n Number of players (2 or higher) n Competitive or cooperative n Zero sum (game theory) ¨ Total gains = Total losses n Discrete or continuous n Finite or infinite n Deterministic or stochastic n Perfect or partial information 2 Playing the game n Choosing the best move on each turn ¨ Episodic search (no backtracking) n Conventions ¨ Turns alternate ¨ Player 1 moves first Search problem n (S, S0, succ(): S ® P(S), F, V(): F ® ) ¨S a finite set of states (state includes player due to move) ¨ S0 initial state ¨ succ() follower states function ¨F terminal states ¨V value function for terminal states 3 Example: A trivial card game n Deal four playing cards out, face up n Players take cards alternating n The player with the highest even sum scores the amount Entire search space 4 Game theoretic value n Game theoretic value of a state is the value of a terminal state that will be reached if both players play optimally Let D = max depth of game tree For i = D to 1 For each node n at depth i If n is a terminal node, then = ( ) Else If Player 1 is due to move at node n, then = max ∈ Else (Player 2 must be due to move) = m ∈ Minimax search n Generalization of the classic divide and choose method to share gold/cake/etc. n Recursive search method (DFS like) ¨ For own moves choose the state that maximizes the game theoretic value ¨ For the moves of the opponent choose the state that minimizes the game theoretic value 5 Minimax algorithm n At first assign the values associated with terminal states n Then move the values toward the root node using minimax decision n Basically calculates game theoretic value GTV(S) = if (S is terminal) return V(S) else let { S1, S2, … Sk } = succ(S) let Vi = GTV(Si) for each i if (player-to-move(S) == 1) return max(Vi) else return min(Vi) Moving the scores 6 Moving the scores Moving the scores 7 Moving the scores Exercise: Nim n There are some piles of matches n On each turn one may remove any number of matches, but at least one from a single pile n The last person to remove a match loses (misère game) n In II-Nim, one begins with two piles, each with two matches ( ii ii ) 8 II-Nim state space ( _ , _ )-A ( _ , i )-A ( _ , ii )-A ( i , _ )-A ( i , i )-A ( i , ii )-A n Equivalent states due ( ii , _ )-A ( ii , i )-A ( ii , ii )-A to symmetry (e.g. ( _ , _ )-B ( _ , i )-B ( _ , ii )-B (_,ii)-A and (ii,_)-A) ( i , _ )-B ( i , i )-B ( i , ii )-B ( ii , _ )-B ( ii , i )-B ( ii , ii )-B ( _ , _ )-A ( _ , i )-A ( _ , ii )-A n Merge them using a ( i , i )-A ( i , ii )-A canonical description ( ii , ii )-A (e.g. left pile never ( _ , _ )-B ( _ , i )-B ( _ , ii )-B larger than right)! ( i , i )-B ( i , ii )-B ( ii , ii )-B II-Nim formal definition S = ( _ , _ )-A ( _ , i )-A ( _ , ii )-A ( i , i )-A ( i , ii )-A ( ii , ii )-A ( _ , _ )-B ( _ , i )-B ( _ , ii )-B ( i , i )-B ( i , ii )-B ( ii , ii )-B S0 = ( ii , ii )-A succ() = succ(_,i)-A = { (_,_)-B } succ(_,i)-B = { (_,_)-A } succ(_,ii)-A = { (_,_)-B , (_,i)-B } succ(_,ii)-B = { (_,_)-A , (_,i)-A } succ(i,i)-A = { (_,i)-B } succ(i,i)-B = { (_,i)-A } succ(i,ii)-A = { (_,i)-B (_,ii)-B (i,i)-B} succ(i,ii)-B = { (_,i)-A , (_,ii)-A (i,i)-A } succ(ii,ii)-A = { (_,ii)-B , (i,ii)-B } succ(ii,ii)-B = { (_,ii)-A , (i,ii)-A } F = ( _ , _ )-A ( _ , _ )-B V = V( _ , _ )-A = +1 V( _ , _ )-B = -1 9 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B (- -) B -1 (- i) B (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B +1 (- -) B -1 (- i) B (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 10 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A (i i) A (- i) A (- i) A (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A (i ii) B (- ii) B (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 11 II-Nim Game Tree (ii ii) A (i ii) B -1 (- ii) B -1 (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 II-Nim Game Tree (ii ii) A -1 (i ii) B -1 (- ii) B -1 (- ii) A +1 (i i) A +1 (- i) A -1 (- i) A -1 (- -) A +1 (- i) B +1 (- -) B -1 (- i) B +1 (- -) B -1 (- -) B -1 (- -) A +1 (- -) A +1 12 Real games n Search space is too large n Real-time decision requirement n Chess ¨ Branching factor is ~35 ¨ Allows for a 4-ply look ahead n Capacity: < 2 million states per move (at 10k states/sec for 3 minutes) n 354 = 1 500 625; 355 = 52 521 875 ¨ Average humans can look ahead 6-8 plies n Guaranteed solution not possible n Solution: heuristic evaluation function Cutoff search n Use an evaluation function ¨ Estimatethe guaranteed score ¨ Draw search space to a certain depth ¨ Depth chosen to limit the time taken n Put the estimated values at the end of paths n Propagate them to the top as before 13 Evaluation function n Estimates game theoretic value of a state n Enables comparing different states n Search + evaluation function ¨ Combines many estimates ® good for noise filtering Example: Scores in chess n Assigning weights to pieces ¨ Pawn ®1 ¨ Knight ® 3 ¨ Bishop ® 3 ¨ Rook ®5 ¨ Queen ® 9 n Position also matters in real-life evaluation functions 14 Example: Scores in chess n Black ¨ 5 pawns * 1 = 5 ¨ 1 bishop * 3 = 3 å =18 ¨ 2 rooks * 5 = 10 n White ¨ 5 pawns * 1 = 5 ¨ 1 rook * 5 = 5 å =10 n Net scores ¨ Black: 18-10 = 8 ¨ White: 10-18 = -8 Evaluation function for the example n Odd cards: zero n Even cards: actual value ¨ In this case the evaluation function chooses 10 … which is the worst choice 15 Problems with evaluation functions n Horizon effect (Hans Berliner, 1973) ¨ Good and bad possibilities in a search space deeper than the horizon cannot be taken into account ¨ Possible solution: reduce the number of initial moves to look at, thus pushing the horizon farther ¨ Can be negative and positive n Non-quiescent states à likely to change drastically ¨ Wild swings in the evaluation function ¨ E.g.: captures in chess when using the sample evaluation function ¨ Solution: expand the state until quiescent positions are reached Pruning n Visit as many board states as possible n Skip bad branches (prune them) ¨ Bestvalue is still worse than other branches ¨ Example: having your queen taken in chess n Alpha-beta pruning ¨ Can be used for entire search or cutoff search ¨ Recognize surely inferior branches 16 Idea of Alpha-Beta pruning MAX ³2 n The MIN-value (1) is already smaller than the MAX-value of the parent (2) MIN £2 =2 £1 n The MIN-value can only decrease further n The MAX-value is only allowed to increase n No point in computing MAX further below this 2 5 1 node Terminology Alpha-value n Temporary values at MAX ³2 ¨ MAX-nodes are called Beta-value Alpha-values ¨ MIN-nodes are called Beta-values MIN £2 =2 £1 MAX 2 5 1 17 Principles n If an Alpha-value is greater than or equal to the Beta-value of a descendant node, then no more children of the descendant need to be considered n If a Beta-value is less than or equal to the Alpha-value of a descendant node, then no more children of the descendant need to be considered The general cutoff rule ( )-a In example: let α = max(v1, v3, v1 ? ( )-b v5). If min(v6, v7)≤α, then we can v2 be certain that it is worthless searching the tree from the v3 ( )-a ? ? current node or the sibling on its right. ( )-b ? ? In general: if at a B-move node, let α = max of all A’s choices v4 ( )-a expanded on current path. Let β = min of B’s choices, including those at current node. Cutoff is v5 β ≤ α. ( )-b v6 In general: Converse rule at an ? A-move node. v7 Current Node 18 How much do we gain? n Assuming a uniform branching factor b, minimax examines O(bh) nodes ¨ So does alpha-beta in the worst-case n But: alpha-beta is sensitive to the order of nodes n The gain for alpha-beta is maximum when n the MIN children of a MAX node are ordered in decreasing backed up values n the MAX children of a MIN node are ordered in increasing backed up values n Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975] n But this requires an oracle n If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4) Alpha-Beta Pruning for Player 1 n Given a node N which can be chosen by Player 1, then if there is another node, X, along any path, such that (a) X can be chosen by Player 2 (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for Player 1 than N then the parent of N can be pruned. 19 Alpha-Beta Pruning for Player 1 n Given a node N which can be chosen by Player 2, then if there is another node, X, along any path, such that (a) X can be chosen by Player 1 (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for Player 1 than N then the parent of N can be pruned. Alpha-beta pruning for the four-card game Player 1 Player 2 20 Games with chance n Many games have an element of chance (e.g. backgammon) n Guaranteed scores can no longer be calculated n Solution: calculate expected scores using probability Expectimax Search n Based on minimax tree n For random events an extra node is added for each possible outcome that changes the possible board states after the event n Moving score values up through a chance node ¨ E(n) = å p(n)*s(n) 21 A simple game with chance n Deal four cards face up n Player 1 chooses a card n Player 2 throws a die ¨ If it’s a ‘six’, then player 2 chooses a card, swaps it with player 1’s and keeps player 1’s card ¨ If it’s not a ‘six’, then player 2 just chooses a card n Player 1 chooses next card n Player 2 takes the last card Expectimax Diagram 22 Expectimax Calculations Games Played by Computer n Games played perfectly ¨ Connect four, noughts & crosses (tic-tac-toe), draughts (checkers) ¨ Best move pre-calculated for each board state n Small number of possible board states n Games played at superhuman level ¨ Backgammon, chess, go ¨ Scrabble, Tetris n Games played badly ¨ Bridge, Ulti, soccer :) 23 Game complexity Game State-space Game-tree Branching factor complexity complexity Nine man’s morris ~ 1010 ~ 1050 10 Checkers ~ 1020 ~ 1031 2.8 Rubik’s cube ~ 1019 12 Chess ~ 1047 ~ 10123 35 Go (9x9) ~ 1038 Go (19x19) ~ 10171 ~ 10360 250 Gomoku (15x15) ~ 10105 ~ 1070 210 Summary n Modeling two player games n Game theoretic value n Minimax search n Cutoff search n Pruning, alpha-beta n Expectimax 24