Chapter 5 Adversarial Search and Games PDF

Introduction to Artificial Intelligence Chapter 5 Adversarial Search and Games Introduction to AI Outline I N WHICH WE EXAMINE THE PROBLEMS THAT ARISE WHEN WE TRY TO PLAN AHEAD IN A WORLD WHERE OTHER AGENTS ARE PLANNING AGAINST US ♦ Games ♦ Perfect play – minimax decisions – α–β pruning ♦ Imperfect real-time decisions ♦ Stochastic games ♦ Partially observable games Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 2 Games vs. search problems Games are to AI as grand prix racing is to automobile design A game can be formally defined as a kind of search problem: S0: The initial state, which specifies how the game is set up at the start. PLAYER(s) or TO-MOVE(s) : Defines which player has the move in a state s. ACTIONS(s): Returns the set of legal moves in a state s. RESULT(s, a): The transition model, which defines the result of a move a. TERMINAL-TEST(s): which is true when the game is over and false otherwise. States where the game has ended are called terminal states. UTILITY(s, p): A utility function (objective or payoff), defines the final numeric value for a game that ends in terminal state s for a player p. In chess, the outcome is a win (1), loss (0), or draw (1/2). In backgammon: 0-192 payoffs. “Unpredictable” opponent ⇒ solution is a strategy Time limits ⇒ unlikely to find goal, must approximate Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 3 Two-player zero-sum games The games most commonly studied within AI (such as chess and Go) are what game theorists call deterministic, two- player, turn-taking, perfect information, zero-sum games. “Perfect information” is a synonym for “fully observable,” “zero-sum” means that what is good for one player is just as bad for the other: there is no “win-win” outcome. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 4 T y p e s of games deterministic chance perfect information chess, checkers, backgammon go, othello monopoly battleships, bridge, poker, scrabble blind tictactoe nuclear war imperfect information The initial state, ACTIONS function, and RESULT function define the state space graph—a graph where the vertices are states, the edges are moves and a state might be reached by multiple paths. We can cover a search tree over part of that graph to determine what move to make. We define the complete game tree as a search tree that follows every sequence of moves all the way to a terminal state. The game tree may be infinite if the state space itself is unbounded or if the rules of the game allow for infinitely repeating positions. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 5 G a m e tree (2-player, deterministic, turns) part of the game tree for tic-tac-toe (noughts and crosses) Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 6 Optimal Decisions in Games: M i n i m a x S e a r c h Perfect play for deterministic, that a perfect-information games Idea: choose move to position with highest minimax value = best achievable payoff against best play (assuming that both players play optimally) E.g., 2-ply game: The word ply is used to unambiguously mean one move by one player, bringing us one level deeper in the game tree. The utilities of the terminal states in this game range from 2 to 14. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 7 M i n i m a x S e a r c h algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game return the a in ACTIONS(state) maximizing MIN-VALUE(R ESULT (a, state)) function M AX-VALUE(state) returns a utility value if T ERMINAL-T EST(state) then return UTILITY(state) v ← −∞ for a, s in SUCCESSORS(state) do v ← MAX(v, MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if T ERMINAL-T EST(state) then return UTILITY(state) v← ∞ for a, s in SUCCESSORS(state) do v ← MIN(v, M AX-VALUE(s)) return v Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 8 M i n i m a x S e a r c h algorithm Now that we can compute MINIMAX(s), we can turn that into a search algorithm that finds the best move for MAX by trying all actions and choosing the one whose resulting state has the highest MINIMAX value. The algorithm is a recursive algorithm that proceeds all the way down to the leaves of the tree and then backs up the minimax values through the tree as the recursion unwinds. The minimax algorithm performs a complete depth-first exploration of the game tree. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 9 Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(bm) Space complexity?? O(bm) (depth-first exploration) For chess, b≈ 35, m ≈ 100 for “reasonable” gamesoravgm ≈ 80 ⇒ exact solution completely infeasible But do we need to explore every path? Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 10 Optimal decisions in multiplayer games  How to extend the minimax idea to multiplayer games.  Replace the single value for each node with a vector of values.  For example, in a three-player game with players A, B, and C, a vector ⟨vA,vB,vC⟩ is associated with each node.  The UTILITY function return a vector of utilities.  In general, the backed-up value of a node n is the utility vector of the successor state with the highest value for the player choosing at n. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 11 Optimal decisions in multiplayer games  Alliances: players must balance the immediate advantage of breaking an alliance against the long-term disadvantage of being perceived as untrustworthy.  Alliances are made and broken as the game proceeds. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 12 Alpha– Beta Pruning  The number of game states is exponential in the depth of the tree.  No algorithm can completely eliminate the exponent.  But, we can sometimes cut it in half, computing the correct minimax decision without examining every state by pruning large parts of the tree that make no difference to the outcome. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 13 α–β pruning example MAX 3 MIN 3 3 12 8 Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 14 α–β pruning example MAX 3 MIN 3 2 X X 3 12 8 2 Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 15 α–β pruning example MAX 3 MIN 3 2 14 X X 3 12 8 2 14 Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 16 α–β pruning example MAX 3 MIN 3 2 14 5 X X 3 12 8 2 14 5 Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 17 α–β pruning example MAX 33 MIN 3 2 14 5 2 X X 3 12 8 2 14 5 2 MINIMAX(root ) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2)) = max(3, min(2, x, y), 2) = max(3, z, 2) where z = min(2, x, y) ≤ 2 = 3. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 18 Why is it called α – β ? MAX MIN...... MAX MIN V α is the best value (to MAX) found so far off the current path If V is worse than α, MAX will avoid it ⇒ prune that branch Define β similarly for MIN Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 19 T h e α–β algorithm function ALPHA-B ETA-DECISION(state) returns an action return the a in ACTIONS(state) maximizing MIN-VALUE(R ESULT (a, state)) function M AX-VALUE(state, α, β) returns a utility value inputs: state, current state in game α, the value of the best alternative for MAX along the path to state β, the value of the best alternative for MIN along the path to state if T ERMINAL-T EST(state) then return UTILITY(state) v ← −∞ for a, s in SUCCESSORS(state) do v ← MAX(v, MIN-VALUE(s, α, β)) if v ≥ β then return v α ← MAX(α, v) return v function MIN-VALUE(state, α, β) returns a utility value same as M AX-VALUE but with roles of α, β reversed Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 20 Properties of α–β Pruning does not affect final result Good move ordering improves effectiveness of pruning MAX 33 MIN 3 2 14 5 2 X X 3 12 8 2 14 5 2 “perfect ordering,” time complexity O(b m/2 )⇒ doubles solvable depth A simple example of the value of reasoning about which computations are relevant (a form of meta-reasoning) Ordering function: trying captures first, then threats, then forward moves, and then backward moves. Unfortunately, 3550 is still impossible! Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 21 Outline ♦ Games ♦ Perfect play – minimax decisions – α–β pruning ♦ Imperfect real time decisions ♦ Stochastic games ♦ Games of imperfect information Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 22 Heuristic evaluation functions Black to move White to move White slightly better Black winning Equivalentclassesof states: two-pawn/one-pawn. Eval (s) = expected value Linear weighted sum of features E val(s) = w1f 1 (s) + w2f 2 (s) +... + w n f n (s), e.g., w1 = 9 with f 1 (s) = (no of white Q) – (no of black Q), etc. Assumes that contribution of each feature is independent of the values of the other features Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 23 Cutting- off search Standard approach: Use C UTOFF -T EST instead of T ERMINAL -TEST e.g., depth limit (perhaps add quiescence search: search ”inter- esting” positions to a greater depth than ”quiet” ones,e..g. capture moves) Use EVAL instead of UTILITY i.e., evaluation function that estimates desirability of position H-MINIMAX(s, d) = EVAL(s) if CUTOFF-TEST(s, d) max a∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MAX min a∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MIN Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 24 M i n m a x may make an error Minmax heuristic selects the optimal move provided that the leaf node eval- uations are exactly correct. Consider a position with one legal move - alpha beta search still generates and evaluates a large tree. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 25 Outline ♦ Games ♦ Perfect play – minimax decisions – α–β pruning ♦ Imperfect real time decisions ♦ Stochastic games ♦ Games of imperfect information Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 26 Nondeterministic games: backgammon 0 1 2 3 4 5 6 7 8 9 10 11 12 25 24 23 22 21 20 19 18 17 16 15 14 13 Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 27 Schematic game tree Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 28 Algorithm for nondeterministic games Expected value of a position E XPECTIMINIMAX gives perfect play Just like MINIMAX, except we must also handle chance nodes: EXPECTIMINIMAX(s) = UTILITY(s) if TERMINAL-TEST(s) maxaEXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MAX min a EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MIN Σ r P(r)EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s)= CHANCE where r is a possible dice roll (chance event) Eval for games of chance: using Monte Carlo simulation to evaluate a posi- tion: thousands of games with random dice. Resulting win percentage is a good approximation of the value of the position. Chapter 5In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us 29

Chapter 5 Adversarial Search and Games PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue