Introduction to Artificial Intelligence PDF

Introduction to Artificial Intelligence Pouria Katouzian October 2024 1 Contents 1 Intelligent Agents 7 1.1 Définition............................... 7...

Introduction to Artificial Intelligence Pouria Katouzian October 2024 1 Contents 1 Intelligent Agents 7 1.1 Définition............................... 7 1.1.1 Sensors............................ 7 1.1.2 Actuators........................... 7 1.1.3 Policy:............................. 7 1.2 Agents and Environments...................... 7 1.3 Environment:............................. 7 1.4 Percepts and Actions......................... 7 1.5 Agent................................. 7 1.5.1 Behavior:........................... 7 1.5.2 Policy:............................. 8 1.5.3 Pacman World Example:.................. 8 1.5.4 Percept-to-Action Mapping:................. 8 1.5.5 Table Representation:.................... 8 1.6 Agent Architectures......................... 8 1.6.1 Reflex Agents......................... 8 1.6.2 Simple Reflex Agents..................... 8 1.6.3 Model-Based Reflex Agents................. 8 1.6.4 Utility-Based Agents..................... 9 1.7 Rationality and Performance.................... 9 1.7.1 Rationality.......................... 9 1.7.2 Performance Measure.................... 9 2 Solving Problems by Searching 10 2.1 Planning Agents........................... 10 2.1.1 Characteristics........................ 10 2.2 Search Problem............................ 10 2.2.1 Initial State:......................... 10 2.2.2 Actions:............................ 10 2.2.3 Goal State:.......................... 10 2.2.4 Path Cost:.......................... 10 2.3 Uninformed Search Methods..................... 11 2.3.1 Depth-First Search (DFS).................. 11 2.3.2 Breadth-First Search (BFS)................. 11 2.3.3 Uniform-Cost Search..................... 11 2.4 Informed Search Methods...................... 11 2.4.1 Heuristics........................... 12 2.5 Reflex Agents............................. 12 2.5.1 Characteristics........................ 12 2.5.2 Limitations.......................... 12 2.5.3 Example............................ 12 2.6 Problem-Solving Agents....................... 12 2.6.1 Assumptions......................... 12 2.6.2 Working............................ 12 2 2.6.3 Examples........................... 13 2.7 Q/A.................................. 13 2.8 Multiple Choice Questions...................... 14 3 Games and Adversarial Search 19 3.1 Introduction to Adversarial Search................. 19 3.2 Games as Multi-Agent Environments................ 19 3.3 Types of Games............................ 19 3.4 Formal Definition of Games..................... 20 3.5 Zero-Sum Games........................... 20 3.6 Minimax Algorithm......................... 20 3.7 Properties of Minimax........................ 20 3.8 Efficiency of Minimax........................ 21 3.9 Alpha-Beta Pruning......................... 21 3.10 Properties of Alpha-Beta Pruning.................. 21 3.11 Game Tree Size and Complexity.................. 21 3.12 Transposition Tables......................... 21 3.13 13. Imperfect Real-Time Decisions................. 22 3.14 Evaluation Functions......................... 22 3.15 Quiescence and Horizon Effect................... 22 3.16 Multi-Agent Games......................... 22 3.17 Stochastic Games........................... 22 3.18 18. Expectiminimax Algorithm................... 23 3.19 Monte Carlo Tree Search (MCTS)................. 23 3.20 Exploration vs. Exploitation in MCTS............... 23 3.21 State-of-the-Art Game Programs (Checkers, Chess, Go)..... 23 3.22 AlphaGo and AlphaGo Zero..................... 23 3.23 Summary of Adversarial Search in AI............... 24 3.24 Multiple Choice Questions: Games and Adversarial Search.... 24 4 Solving Problems by Searching 33 4.1 Random Variables.......................... 33 4.1.1 Definition........................... 33 4.1.2 Types............................. 33 4.1.3 Example............................ 33 4.2 Probability Distributions....................... 33 4.2.1 Definition........................... 33 4.2.2 Example............................ 33 4.3 Inference................................ 33 4.3.1 Definition........................... 33 4.3.2 Example............................ 33 4.4 Independence............................. 34 4.4.1 Definition........................... 34 4.4.2 Example............................ 34 4.5 The Bayes’ Rule........................... 34 4.5.1 Definition........................... 34 3 4.5.2 Example............................ 34 4.6 Quantifying Uncertainty....................... 34 4.6.1 Definition........................... 34 4.6.2 Example............................ 34 4.7 Principle of Maximum Expected Utility.............. 34 4.7.1 Example............................ 35 4.8 Probabilistic Assertions....................... 35 4.8.1 Example............................ 35 4.9 Frequentism vs. Bayesianism.................... 35 4.9.1 Frequentism.......................... 35 4.9.2 Bayesianism.......................... 35 4.10 Kolmogorov’s Axioms........................ 35 4.10.1 Example............................ 35 4.11 Marginal Distributions........................ 36 4.11.1 Example............................ 36 4.12 Conditional Distributions...................... 36 4.12.1 Definition........................... 36 4.12.2 Example............................ 36 4.13 Probabilistic Inference........................ 36 4.13.1 Definition........................... 36 4.13.2 Example............................ 36 4.14 Inference by Enumeration...................... 36 4.14.1 Definition........................... 36 4.14.2 Example............................ 36 4.15 Product Rule............................. 37 4.15.1 Definition........................... 37 4.15.2 Example............................ 37 4.16 Chain Rule.............................. 37 4.16.1 Definition........................... 37 4.16.2 Example............................ 37 4.17 Independence and Conditional Independence........... 37 4.17.1 Definition........................... 37 4.17.2 Example............................ 37 4.18 Naive Bayes Model.......................... 37 4.18.1 Definition........................... 37 4.18.2 Example............................ 37 4.19 Bayesian Inference.......................... 38 4.19.1 Definition........................... 38 4.19.2 Example............................ 38 4.20 Examples of Probabilistic Inference................. 38 4.20.1 Ghostbusters Problem.................... 38 4.20.2 Diagnostic Probability.................... 38 4.21 AI Applications in Science...................... 38 4.21.1 Definition........................... 38 4.21.2 Example............................ 38 4.22 Multiple-Choice Questions...................... 39 4 5 Probabilistic Reasoning - Detailed Explanation and Examples 48 5.1 Bayesian Networks.......................... 48 5.2 Semantics of Bayesian Networks.................. 48 5.3 Construction of Bayesian Networks................. 48 5.4 Independence Relations in Bayesian Networks........... 49 5.5 Inference in Bayesian Networks................... 49 5.6 Parameter Learning......................... 49 5.7 Representing Uncertain Knowledge................. 50 5.8 Example Bayesian Networks..................... 50 5.9 Inference by Enumeration...................... 50 5.10 Variable Elimination......................... 50 5.11 d-Separation............................. 51 5.12 Approximate Inference........................ 51 5.13 Parameter Learning with Maximum Likelihood Estimation (MLE) 51 5.14 Bayesian Parameter Learning.................... 52 5.15 Maximum a Posteriori (MAP) Estimation............. 52 5.16 Multiple Choice Questions on Probabilistic Reasoning...... 53 6 Reasoning Over Time in Artificial Intelligence 62 6.1 Markov Models............................ 62 6.2 Markov Processes........................... 62 6.3 First-order Markov Processes.................... 62 6.4 Sensor Markov Assumption..................... 62 6.5 Stationarity Assumption....................... 62 6.6 Inference Tasks............................ 62 6.7 Prediction............................... 62 6.8 Filtering................................ 63 6.9 Smoothing............................... 63 6.10 Most Likely Explanation....................... 63 6.11 Bayes Filter.............................. 63 6.12 Forward-Backward Algorithm.................... 63 6.13 Hidden Markov Models (HMMs).................. 64 6.14 Kalman Filter............................. 64 6.15 Particle Filter............................. 64 6.16 Dynamic Bayesian Networks (DBNs)................ 65 6.17 Applications.............................. 65 6.18 Summary............................... 65 6.19 Multiple Choice Questions...................... 65 7 Machine Learning and Neural Networks 70 7.1 Detailed Explanation of Topics................... 70 7.2 Introduction to Machine Learning and Neural Networks..... 70 7.3 Learning Agents........................... 70 7.4 Machine Learning........................... 70 7.5 Linear Regression........................... 71 7.6 Logistic Regression.......................... 71 5 7.7 Gradient Descent........................... 71 7.8 Deep Learning............................ 71 7.9 Single-Input and Single-Output Networks............. 71 7.10 Multivariate Outputs and Inputs.................. 72 7.11 Deep Networks............................ 72 7.12 Convolutional Neural Networks (CNNs).............. 72 7.13 Recurrent Neural Networks (RNNs)................ 72 7.14 Transformers............................. 72 7.15 Scaling Laws............................. 73 7.16 Applications of AI.......................... 73 7.17 Summary............................... 73 7.18 Multiple Choice Questions: Machine Learning and Neural Networks 74 8 Introduction to Reinforcement Learning (RL) 79 8.1 Markov Decision Processes (MDPs)................. 79 8.2 Algorithms for Solving MDPs.................... 80 8.3 Reinforcement Learning Paradigms................. 80 8.4 Model-Free Reinforcement Learning................ 81 8.5 Q-Learning.............................. 81 8.6 Generalizing Across States...................... 81 8.7 Exploration Strategies........................ 81 8.8 Applications of Reinforcement Learning.............. 82 8.9 Multiple-Choice Questions on Reinforcement Learning...... 83 9 Detailed Explanation of Reinforcement Learning Topics 88 9.1 Introduction to Reinforcement Learning (RL)........... 88 9.2 Markov Decision Processes (MDPs)................. 89 9.3 Algorithms for Solving MDPs.................... 89 9.4 Value Iteration............................ 89 9.5 Policy Iteration............................ 90 9.6 Key Ideas in Reinforcement Learning................ 90 9.7 Types of Reinforcement Learning.................. 90 9.8 Passive RL.............................. 90 9.9 Active RL............................... 90 9.10 Model-Based vs. Model-Free Estimation.............. 91 9.11 Model-Based Estimation....................... 91 9.12 Model-Free Estimation........................ 91 9.13 Temporal-Difference (TD) Learning................. 91 9.14 Q-Learning.............................. 91 9.15 Generalizing Across States...................... 91 9.16 Deep Q-Learning (DQN)....................... 92 9.17 Applications of Reinforcement Learning.............. 92 9.18 Exploration Strategies........................ 92 9.19 Summary of RL Goals........................ 92 9.20 Multiple-Choice Questions on Reinforcement Learning...... 93 6 1 Intelligent Agents 1.1 Définition An intelligent agent is an autonomous entity that perceives its environment through sensors and acts upon it using actuators. The actions of an agent are guided by a pre-defined policy or decision-making logic. Agent’s Components: An intelligent agent typically consists of: 1.1.1 Sensors These receive information from the environment. 1.1.2 Actuators These enable the agent to perform actions in the environment. 1.1.3 Policy: A function or rule that maps percepts (sensory inputs) to actions. 1.2 Agents and Environments 1.3 Environment: The environment in which an agent operates can vary from very simple, such as a grid world, to highly complex, like real-world robotics. 1.4 Percepts and Actions The agent observes the environment through percepts, which could be anything from sensory data (like camera inputs for robots) to simpler inputs in a simu- lated world. Based on these percepts, the agent takes actions to influence the environment. Examples: An example often discussed in this section might be a Pacman- like game, where the agent has percepts like its current location and the state of food in the environment. 1.5 Agent 1.5.1 Behavior: The agent’s behavior is described by its policy, a function that maps sequences of percepts to actions. 7 1.5.2 Policy: This policy is central to how the agent acts. For example, if the agent per- ceives certain information (like the location of food in Pacman), the policy will determine whether it moves left, right, or stays still. 1.5.3 Pacman World Example: A simplified environment like a two-cell world with a Pacman agent is used as an example. The percepts might be the position and state of the food, and the actions might include moving left, right, or eating. 1.5.4 Percept-to-Action Mapping: In this example, the agent’s policy is illustrated with a Pacman world where Pacman perceives its environment (like whether there is food) and takes actions accordingly (such as moving left or right, or eating). 1.5.5 Table Representation: This section describes how the policy can be represented as a table, where each possible sequence of percepts leads to a defined action. Example Table: Percept: (left cell, no food) Action: go right Percept: (left cell, food) Action: eat 1.6 Agent Architectures 1.6.1 Reflex Agents These agents select actions based solely on the current percept. Reflex agents act without considering the history of percepts or future consequences. 1.6.2 Simple Reflex Agents These act based on condition-action rules, like ”if the agent sees an obstacle, turn left.” 1.6.3 Model-Based Reflex Agents These agents maintain an internal model of the world, allowing them to take more informed actions. 8 1.6.4 Utility-Based Agents These agents consider future states and assign utility values to them. Their goal is to maximize the expected utility of their actions, making them more sophisticated decision-makers. 1.7 Rationality and Performance 1.7.1 Rationality An agent is said to be rational if it chooses actions that maximize its expected performance measure, given the information it has. Rationality doesn’t imply perfection but rather that the agent makes the best possible decision based on its knowledge and objectives. 1.7.2 Performance Measure This refers to how well an agent’s actions lead to desired outcomes. For example, in the Pacman world, the agent might be evaluated based on how much food it eats while avoiding obstacles. The performance measure defines what ”success” looks like for the agent in its environment. 9 2 Solving Problems by Searching 2.1 Planning Agents Planning agents are a type of intelligent agent that decide their actions by considering the consequences and future states. Unlike reflex agents, which respond to the current percept, planning agents create a sequence of actions (a plan) to achieve a specific goal. 2.1.1 Characteristics Planning agents use models of the environment to forecast future states. They can anticipate the results of different actions and choose the best sequence of steps. For example, a chess-playing agent that considers many possible moves in ad- vance to decide the best strategy. In a maze-solving scenario, a planning agent would plan its entire path to the goal before moving, as opposed to reacting at each step. 2.2 Search Problem A search problem is a formal representation of a goal-seeking problem that involves finding a sequence of actions from a given start state to a goal state. A search problem typically has the following elements (components): 2.2.1 Initial State: The starting point for the agent. 2.2.2 Actions: Possible actions the agent can take. 2.2.3 Goal State: The condition the agent aims to achieve. 2.2.4 Path Cost: The cost associated with taking a particular sequence of actions. Examples: Navigating a maze, finding the shortest route between cities, or solving a puzzle like the 8-puzzle. 10 2.3 Uninformed Search Methods Uninformed search strategies use only the information available in the problem definition and do not rely on additional knowledge about the goal’s location. 2.3.1 Depth-First Search (DFS) Explores as far as possible along a branch before backtracking. Utilizes a LIFO (Last In, First Out) stack to keep track of the nodes. Not guaranteed to find the shortest path but is memory efficient. Advantage: Memory-efficient. Disadvantage: Can get stuck in loops or take a long path if the goal is deep in the tree. 2.3.2 Breadth-First Search (BFS) Explores all the nodes at the present depth before moving on to nodes at the next depth level. Utilizes a FIFO (First In, First Out) queue. Guaranteed to find the shortest path in unweighted graphs but can consume a lot of memory. Advantage: Always finds the shortest path in terms of number of steps (if the path cost is uniform). Disadvantage: High memory usage due to the need to keep track of all nodes at a given level. 2.3.3 Uniform-Cost Search Expands the node with the lowest path cost. It is optimal and complete, pro- vided that the cost of each action is non-negative. Used for cases where the cost of each step varies. Advantage: Guarantees finding the least-cost solution. Disadvantage: Can be slower if many paths have similar costs. 2.4 Informed Search Methods Informed search methods use heuristic functions to guide the search process, aiming to reach the goal faster by making more informed decisions. is a best- first search algorithm that uses a heuristic function f(n) = g(n) + h(n). g(n) is the cost from the start to the current node. h(n) is the heuristic estimate of the cost from the current node to the goal. is a best-first search algorithm that uses a heuristic function (n) = g(n) + h(n) g(n) is the cost from the start to the current node. h(n) is the heuristic estimate of the cost from the current node to the goal. 11 2.4.1 Heuristics A heuristic is an estimate of the cost to reach the goal from a particular state. Heuristics function help reduce the search space, allowing the agent to focus on the most promising paths. Example: In a navigation problem, the straight-line distance between two points is a common heuristic. Advantage: Finds the optimal solution if the heuristic is admissible (never overestimates the true cost). Disadvantage: Can consume a lot of memory in complex problems. 2.5 Reflex Agents Reflex agents act based on the current percept without considering the history of percepts or future consequences (ignore the rest of the percept history). 2.5.1 Characteristics These agents follow condition-action rules, such as ”if there is an obstacle in front, turn left.” 2.5.2 Limitations Reflex agents are limited in complex environments because they lack planning and do not consider future goals or changes in the environment(can get stuck or repeat actions in complex scenarios). 2.5.3 Example A vacuum-cleaner agent that moves to the next dirty spot when it senses dirt nearby. 2.6 Problem-Solving Agents Problem-solving agents use search algorithms to plan sequences of actions to achieve a goal. 2.6.1 Assumptions The environment is often considered single-agent (only one active agent), ob- servable (the agent has complete information about the environment), deter- ministic (actions have predictable effects), and known (all relevant information is available to the agent). 2.6.2 Working Problem-solving agents build a search tree, where each node represents a state, and the edges between nodes represent actions. The agent’s task is to search this tree to find a path from the initial state to the goal state. 12 2.6.3 Examples A navigation agent in a maze searches for a path from the start to the goal by considering all possible moves and their consequences. A chess-playing agent, which considers all possible moves to find the optimal strategy to win the game. 2.7 Q/A 1) A reflex agent considers future consequences of its actions before making a decision. False 2)Breadth-first search guarantees finding the shortest path in an unweighted graph. True 3) Depth-first search is always guaranteed to find the shortest path in a search problem. False 4) A* search is both optimal and complete as long as the heuristic used is admissible. True 5) A planning agent takes actions without considering future consequences. False 6) Problem-solving agents assume that the environment is single-agent, deter- ministic, and known. True 7) Uniform-cost search is a type of uninformed search that expands the node with the lowest cost. True 8) Heuristics in informed search are always accurate estimates of the actual cost to reach the goal. False 9) Reflex agents use condition-action rules to make decisions based on the cur- rent percept. True 10) Informed search methods use heuristics to improve the efficiency of the search process. True 13 2.8 Multiple Choice Questions 1. Which of the following are characteristics of planning agents? (a) They act without considering future consequences. (b) They generate sequences of actions to achieve a goal. * (c) They always use condition-action rules. (d) They use a model of the environment to predict future states. * 2. Which search methods are classified as uninformed search methods? (a) A* search (b) Depth-first search * (c) Breadth-first search * (d) Uniform-cost search * 3. A* search uses which of the following to determine the next node to ex- plore? (a) Depth of the node (b) Total path cost (g(n)) * (c) Heuristic estimate to the goal (h(n)) * (d) Random selection of nodes 4. Which of the following are true for reflex agents? (a) They rely on condition-action rules to make decisions. * (b) They can plan sequences of actions to reach a goal. (c) They act based on the current percept. * (d) They always consider the future consequences of their actions. 5. What are some characteristics of problem-solving agents? (a) They consider the consequences of their actions. * (b) They assume the environment is known and deterministic. * (c) They always use a random search strategy. (d) They take decisions based on the hypothesized future out- comes of actions. * 14 6. Which of the following is NOT a characteristic of a problem-solving agent? (a) Considers future actions (b) Acts only on the current percept * (c) Uses a model of the world (d) Operates in a known environment 7. What are the key components of a search problem? (a) Initial state * (b) Actions * (c) Goal state * (d) Feedback from the environment 8. Depth-first search: (a) Always finds the shortest path (b) Explores the deepest nodes first * (c) Is guaranteed to find a solution if one exists * (d) Uses a queue to track nodes 9. What is the primary difference between uninformed and informed search methods? (a) Uninformed search has no knowledge of the goal location (b) Informed search methods use heuristics * (c) Uninformed search methods are faster (d) Informed search methods expand nodes randomly 10. Heuristics are used in informed search methods to: (a) Estimate the cost from a node to the goal * (b) Reduce the size of the search space * (c) Guarantee finding the shortest path (d) Select actions randomly 11. In the A* algorithm, the cost function f (n) is a combination of: (a) The depth of the node and the branching factor (b) The path cost to reach the node and the estimated cost to reach the goal * (c) The number of explored nodes and the current node’s depth (d) Random selection and exploration 15 12. Breadth-first search is guaranteed to find the shortest path in which type of graph? (a) Weighted graphs (b) Unweighted graphs * (c) Directed graphs (d) Cyclic graphs 13. Which of the following are types of uninformed search methods? (a) Uniform-cost search * (b) A* search (c) Depth-first search * (d) Breadth-first search * 14. Planning agents differ from reflex agents because: (a) They consider only the current percept (b) They plan sequences of actions to reach a goal * (c) They act based on condition-action rules (d) They use no information about the future state 15. In a uniform-cost search, nodes are expanded based on: (a) The number of nodes in the tree (b) The lowest path cost * (c) The depth of the node (d) The estimated distance to the goal 16. Reflex agents: (a) Always consider the future consequences of their actions (b) Act based on condition-action rules * (c) Use heuristics to find the best action (d) Can solve complex planning problems 17. Which of the following are examples of informed search? (a) Breadth-first search (b) Depth-first search (c) A* search * (d) Greedy best-first search * 16 18. The goal of a rational agent is to: (a) Maximize its performance based on a given performance measure * (b) Randomly explore the environment (c) Always minimize the number of actions (d) Always explore the entire search space 19. In problem-solving agents, what assumptions are typically made about the environment? (a) It is observable and deterministic * (b) It is multi-agent and stochastic (c) It is unknown and unpredictable (d) It is single-agent and fully known * 20. Which of the following are examples of uninformed search? (a) Uniform-cost search * (b) A* search (c) Depth-first search * (d) Iterative deepening search * 21. A rational agent acts to: (a) Minimize the number of actions (b) Maximize the expected outcome based on its performance measure * (c) Randomly select actions to explore new possibilities (d) Always achieve the highest possible reward in every situation 22. A* search is considered optimal when: (a) The path cost function g(n) is greater than the heuristic h(n) (b) The heuristic h(n) is admissible (it never overestimates the true cost) * (c) It expands nodes in order of their depth (d) It selects the node with the lowest heuristic value at every step 17 23. Problem-solving agents are different from reflex agents because: (a) They do not consider future consequences (b) They use a search strategy to plan sequences of actions * (c) They cannot act in a known environment (d) They do not use percepts to make decisions 24. A reflex agent can be described as: (a) Always planning the next sequence of actions (b) Acting based on current percepts only * (c) Using heuristics to make decisions (d) Always considering future possibilities 25. In a search problem, the goal state is defined as: (a) The desired outcome the agent is trying to reach * (b) The state that minimizes the total path cost (c) The first state explored by the search algorithm (d) The state with the fewest actions 18 3 Games and Adversarial Search 3.1 Introduction to Adversarial Search Definition: Adversarial search involves decision-making in environments where an agent competes against one or more opponents. This search is crucial in game-playing scenarios where agents aim to maximize their utility while minimizing their opponents’. Example: In a game of tic-tac-toe, both players take turns aiming to either win or block the other, demonstrating adversarial search. 3.2 Games as Multi-Agent Environments Definition: In multi-agent environments, multiple agents interact. In competitive games, these agents have opposing goals, leading to adversar- ial scenarios. Example: Chess is a two-player game where one player’s gain (capturing a piece) is another player’s loss. 3.3 Types of Games Definition: Games can be classified based on factors such as the num- ber of players (single vs. multi-agent), nature of information (perfect vs. imperfect), and randomness (deterministic vs. stochastic). Example: – Deterministic, perfect information: Chess, where all informa- tion is known to both players. – Stochastic, imperfect information: Poker, where players don’t see others’ cards, and the outcome depends on chance. 19 3.4 Formal Definition of Games Components: – Players: The participants in the game. – Actions: The moves each player can make. – Utilities: The outcomes or scores for each player. – Initial State: The starting setup of the game. – Terminal States: States where the game ends. Example: In tic-tac-toe: – Players are X and O. – Actions are marking empty cells. – Utilities are win, lose, or draw. – Terminal states occur when the board is full or someone wins. 3.5 Zero-Sum Games Definition: In zero-sum games, the total utility for all players sums to zero, meaning one player’s gain is another’s loss. Example: In a game of checkers, each piece captured by one player is a direct loss to the other. 3.6 Minimax Algorithm Definition: Minimax is an algorithm used in decision-making for zero- sum games. It involves a recursive search through the game tree, where players maximize their minimum guaranteed utility (the ”minimax” value). Example: In tic-tac-toe, minimax explores possible moves and selects the one that minimizes the opponent’s best response, ensuring the best outcome for the current player. 3.7 Properties of Minimax Optimality: Minimax is optimal in two-player zero-sum games if both players play optimally. Determinism: It is well-suited for deterministic games where all possible moves are known. Example: In chess, minimax can theoretically calculate the optimal move by exploring the entire game tree, though this is computationally infeasible without pruning. 20 3.8 Efficiency of Minimax Definition: While minimax guarantees optimal moves, it is computation- ally expensive due to its exhaustive search. Complexity: The time complexity is O(bm ), where b is the branching factor and m is the depth of the tree. Example: In checkers, minimax might explore billions of possible moves, making it impractical without optimizations like alpha-beta pruning. 3.9 Alpha-Beta Pruning Definition: Alpha-beta pruning is a technique to reduce the number of nodes evaluated by the minimax algorithm. It eliminates branches that will not affect the final decision, optimizing the search. Example: In chess, alpha-beta pruning skips moves that do not influence the final outcome, such as moves that already yield a better result. 3.10 Properties of Alpha-Beta Pruning Efficiency: It reduces the number of nodes searched to O(bm/2 ) under ideal conditions, effectively doubling the depth the algorithm can search. Order of Moves: The efficiency of alpha-beta pruning depends on the order in which moves are evaluated. Example: In a tic-tac-toe game, exploring moves that immediately lead to winning or losing outcomes first can significantly speed up the search. 3.11 Game Tree Size and Complexity Definition: The game tree size represents the total number of possible moves. The complexity depends on factors like the branching factor and the depth of the game tree. Example: In chess, the game tree is vast, with an estimated branching factor of around 35 and a maximum depth of 80, making exhaustive search infeasible. 3.12 Transposition Tables Definition: Transposition tables store previously computed positions to avoid redundant calculations, particularly in games with many repeated states. Example: In chess, transposition tables help by storing positions reached through different move sequences but resulting in the same board config- uration. 21 3.13 13. Imperfect Real-Time Decisions Definition: In real-time scenarios, players must make decisions quickly without exploring the entire game tree. Algorithms must approximate the best move within time constraints. Example: In fast-paced games like StarCraft, decisions are based on partial information and time constraints, requiring heuristics and approx- imations. 3.14 Evaluation Functions Definition: Evaluation functions estimate the desirability of a game state when a terminal state cannot be reached within time limits. They assign scores based on features like material balance in chess. Example: In tic-tac-toe, an evaluation function might assign higher scores for configurations with two marks in a row and an open space for the third. 3.15 Quiescence and Horizon Effect Quiescence: A technique that extends the search at unstable positions to avoid misleading evaluations due to impending significant moves. Horizon Effect: A limitation where an algorithm cannot see beyond a certain depth, potentially missing critical moves. Example: In chess, quiescence search might explore captures to avoid a misleading evaluation based on temporary material imbalance. 3.16 Multi-Agent Games Definition: Multi-agent games involve more than two players, often re- quiring different strategies like forming alliances or betraying other players. Example: In Risk, players may form temporary alliances to counter a stronger opponent but ultimately compete to control the map. 3.17 Stochastic Games Definition: Stochastic games include elements of chance, requiring strate- gies that incorporate probabilities. Example: In Monopoly, rolling dice introduces randomness, affecting movement and outcomes, which players must account for in their strate- gies. 22 3.18 18. Expectiminimax Algorithm Definition: An extension of minimax that incorporates chance nodes representing probabilistic outcomes, used in stochastic games. Example: In backgammon, expectiminimax accounts for dice rolls at chance nodes, combining player moves and random outcomes in the deci- sion process. 3.19 Monte Carlo Tree Search (MCTS) Definition: MCTS is a heuristic search algorithm that uses random sam- pling to explore game states. It builds a search tree based on simulated outcomes, gradually improving its accuracy. Example: In Go, MCTS simulates random moves to estimate the value of different actions, allowing it to handle the vast search space. 3.20 Exploration vs. Exploitation in MCTS Definition: MCTS balances between exploring new moves (exploration) and exploiting known good moves (exploitation) to refine its estimates. Example: In a Go game, MCTS might explore untried moves early on but focus more on promising moves as simulations continue, improving accuracy. 3.21 State-of-the-Art Game Programs (Checkers, Chess, Go) Definition: Advanced game programs use sophisticated algorithms like MCTS, alpha-beta pruning, and neural networks to achieve high-level play. Example: IBM’s Deep Blue, Google’s AlphaGo, and Chinook are exam- ples of programs that excelled at chess, Go, and checkers, respectively. 3.22 AlphaGo and AlphaGo Zero Definition: AlphaGo and AlphaGo Zero are AI systems developed by DeepMind to play Go at a superhuman level. AlphaGo used supervised learning on human games, while AlphaGo Zero used self-play. Example: AlphaGo defeated a world champion Go player using deep neural networks and MCTS. AlphaGo Zero, learning solely through self- play, surpassed AlphaGo and other Go-playing systems. 23 3.23 Summary of Adversarial Search in AI Overview: Adversarial search techniques like minimax, alpha-beta prun- ing, and MCTS form the backbone of AI systems in game playing. These methods enable AI to perform complex decision-making and strategy for- mation. Example: AI uses adversarial search in various fields, from board games to real-time strategy games, where agents must predict and counter op- ponents’ actions. 3.24 Multiple Choice Questions: Games and Adversarial Search 1. Adversarial search involves decision-making in: (a) Environments with no competition (b) Environments where agents compete against opponents ⋆ (c) Single-agent environments (d) Real-world physical environments 2. A game like tic-tac-toe is an example of: (a) A cooperative search (b) An adversarial search ⋆ (c) A multi-player search (d) A random search 3. Multi-agent environments involve: (a) Only one player (b) Players with aligned goals (c) Multiple players, often with opposing goals ⋆ (d) Agents who ignore each other 4. Which of the following is a deterministic game with perfect information? (a) Poker (b) Risk (c) Chess ⋆ (d) Monopoly 24 5. In the formal definition of a game, terminal states are: (a) The initial setup of the game (b) The states where the game ends ⋆ (c) Moves that lead to winning (d) States that occur in the middle of the game 6. In zero-sum games: (a) One player’s gain is another player’s loss ⋆ (b) All players gain equally (c) The total utility can be positive (d) All moves have the same value 7. The minimax algorithm is primarily used for: (a) Finding the optimal move in zero-sum games ⋆ (b) Maximizing random outcomes (c) Determining probabilities (d) Constructing game trees 8. Minimax assumes that: (a) Players will make random moves (b) Both players play optimally ⋆ (c) Only the first player plays optimally (d) All players cooperate 9. Which of the following is true about minimax? (a) It explores the entire game tree for optimal moves ⋆ (b) It only explores a subset of the game tree (c) It always finds a winning move (d) It works best with non-deterministic games 10. The time complexity of minimax is: (a) O(m) (b) O(bm ) ⋆ (c) O(b × m) (d) O(b/m) 25 11. Alpha-beta pruning improves minimax by: (a) Pruning branches that won’t affect the final decision ⋆ (b) Exploring the entire game tree (c) Ignoring opponent moves (d) Focusing only on random moves 12. Alpha-beta pruning is most effective when: (a) Moves are evaluated randomly (b) Moves are evaluated in the best possible order ⋆ (c) Moves are evaluated last (d) The game tree is small 13. Alpha-beta pruning can reduce the time complexity to approximately: (a) O(bm/2 ) ⋆ (b) O(bm ) (c) O(m) (d) O(b) 14. The game tree size is influenced by: (a) The players’ choices (b) The branching factor and the depth ⋆ (c) The initial state only (d) Terminal states only 15. Transposition tables are used to: (a) Store previously computed positions to avoid redundancy ⋆ (b) Change the order of moves (c) Add randomness to the game (d) Estimate the game tree size 16. Imperfect real-time decisions are required in: (a) Turn-based games only (b) Static games (c) Real-time strategy games ⋆ (d) Deterministic games 26 17. Evaluation functions are necessary when: (a) The game is simple (b) The search cannot reach terminal states within time limits ⋆ (c) The game has few moves (d) Alpha-beta pruning is used 18. A good evaluation function in tic-tac-toe might: (a) Assign higher scores for rows with two marks and an open space ⋆ (b) Only evaluate completed rows (c) Ignore diagonal rows (d) Count all empty spaces 19. Quiescence search aims to: (a) Extend search at unstable positions to avoid misleading eval- uations ⋆ (b) Shorten the game tree (c) Increase the branching factor (d) Limit evaluation functions 20. The horizon effect occurs when: (a) An algorithm cannot see beyond a certain depth ⋆ (b) The evaluation function is perfect (c) Terminal states are reached (d) The branching factor is low 21. Multi-agent games often involve: (a) Multiple players who may form alliances or betray each other ⋆ (b) Only two players (c) Players with the same objectives (d) No competitive elements 22. Which of the following is an example of a stochastic game? (a) Chess or Checkers (b) Tic-tac-toe (c) Monopoly ⋆ 27 23. The expectiminimax algorithm is used in games that: (a) Involve randomness and probabilistic outcomes ⋆ (b) Are deterministic (c) Have perfect information (d) Only involve two players 24. In backgammon, expectiminimax accounts for: (a) Dice rolls as chance nodes ⋆ (b) Only player decisions (c) The number of pieces (d) Non-probabilistic outcomes 25. Monte Carlo Tree Search (MCTS) relies on: (a) Deterministic moves (b) Random sampling to explore game states ⋆ (c) Only known positions (d) Alpha-beta pruning 26. Exploration in MCTS refers to: (a) Trying new moves to discover their potential ⋆ (b) Repeating known moves (c) Focusing on the best-known move (d) Minimizing the search area 27. State-of-the-art game programs often use: (a) Techniques like MCTS, alpha-beta pruning, and neural net- works ⋆ (b) Only minimax (c) Pure random strategies (d) No heuristic methods 28. AlphaGo Zero differs from AlphaGo by: (a) Learning entirely through self-play without human data ⋆ (b) Relying on human games for training (c) Using only minimax (d) Not using any neural networks 28 29. Adversarial search is essential in: (a) Scenarios where agents compete to minimize each other’s utility ⋆ (b) Cooperative games only (c) Single-player puzzles (d) Random guessing games 30. In a zero-sum game, if Player A gains 5 points: (a) Player B loses 5 points ⋆ (b) Player B also gains 5 points (c) The total points remain unaffected (d) Player B gains no points 31. The branching factor in a game tree refers to: (a) The average number of moves available at each node ⋆ (b) The number of players (c) The depth of the tree (d) The utility of moves 32. Which of the following best describes AlphaGo’s training approach? (a) It used human games combined with reinforcement learning ⋆ (b) It relied solely on random moves (c) It required no prior game data (d) It used only minimax 33. Monte Carlo Tree Search uses exploration and exploitation to: (a) Balance between trying new moves and using known good moves ⋆ (b) Avoid randomness (c) Focus only on terminal states (d) Reduce the game tree size 34. An evaluation function in chess might consider: (a) Material balance and piece positioning ⋆ (b) Only the number of moves (c) Random positions (d) Terminal states only 29 35. Which of the following is true about transposition tables? (a) They help avoid recalculating positions that have been pre- viously evaluated ⋆ (b) They store all possible moves (c) They increase the game tree size (d) They limit the number of players 36. Quiescence search prevents: (a) Misleading evaluations due to imminent significant moves ⋆ (b) The use of evaluation functions (c) The horizon effect (d) Opponent predictions 37. A horizon effect in chess might result in: (a) Missing a critical move just beyond the search depth ⋆ (b) Calculating all moves perfectly (c) Avoiding checkmate (d) Overestimating the search depth 38. In multi-agent games like Risk, players may: (a) Form temporary alliances and later compete for individual goals ⋆ (b) Always cooperate (c) Share all resources equally (d) Ignore each other 39. Stochastic games involve elements of: (a) Randomness and chance ⋆ (b) Perfect information (c) Single-player actions (d) Deterministic outcomes only 40. The expectiminimax algorithm is essential in games like: (a) Poker, where there is both chance and decision-making ⋆ (b) Chess, which is fully deterministic (c) Tic-tac-toe, with no randomness (d) Checkers, with perfect information 30 41. Monte Carlo Tree Search improves over time by: (a) Running more simulations to better estimate move values ⋆ (b) Reducing the branching factor (c) Ignoring unknown moves (d) Focusing only on winning moves 42. Alpha-beta pruning is more effective when: (a) Moves are evaluated in an optimal order ⋆ (b) Moves are selected randomly (c) Terminal states are not reached (d) Players make suboptimal decisions 43. Which of the following is a characteristic of adversarial search in AI? (a) It involves agents trying to maximize their gain while min- imizing their opponent’s ⋆ (b) It involves no competition (c) It only applies to cooperative games (d) It always requires human input 44. The horizon effect can be reduced by: (a) Extending the search at critical points using quiescence search ⋆ (b) Limiting the search depth (c) Focusing on terminal states only (d) Increasing the branching factor 45. Which is a state-of-the-art game AI that uses MCTS? (a) AlphaGo ⋆ (b) Deep Blue (c) IBM Watson (d) Chinook 46. AlphaGo Zero’s learning approach involved: (a) Self-play with reinforcement learning, without human data ⋆ (b) Training on historical games only (c) Using a single heuristic (d) Relying on minimax exclusively 31 47. Transposition tables are particularly useful in games with: (a) Many repeated states reached through different sequences of moves ⋆ (b) Only two players (c) Simple move sets (d) No chance elements 48. In MCTS, exploitation involves: (a) Using moves known to be good based on previous simula- tions ⋆ (b) Trying random moves (c) Increasing the branching factor (d) Avoiding terminal states 49. The use of adversarial search extends beyond games to fields like: (a) Cybersecurity, where AI must counteract attackers ⋆ (b) Basic arithmetic problems (c) Linear programming (d) Non-competitive fields 50. Which of the following AI systems demonstrated state-of-the-art play in chess? (a) Deep Blue ⋆ (b) AlphaGo (c) AlphaGo Zero (d) IBM Watson 32 4 Solving Problems by Searching 4.1 Random Variables 4.1.1 Definition A random variable is a variable whose possible values are numerical outcomes of a random phenomenon. It maps outcomes from the sample space to a specific range, typically real numbers. 4.1.2 Types Types: There are discrete random variables (e.g., rolling a die) and continuous random variables (e.g., measuring temperature). 4.1.3 Example In a coin toss, the random variable X could represent the outcome, where X=1 for heads and X=0 for tails. 4.2 Probability Distributions 4.2.1 Definition A probability distribution assigns a probability to each possible value of a ran- dom variable. For discrete variables, this is often presented as a probability mass function (PMF), while for continuous variables, it’s a probability density function (PDF). 4.2.2 Example 1 For a fair six-sided die, the probability distribution is P (X = x) = 6 for each face value x from 1 to 6. 4.3 Inference 4.3.1 Definition Inference in probability is about calculating unknown probabilities using known ones. It allows us to make decisions or predictions based on observed data. 4.3.2 Example If we know the probability that it rains given cloudy weather P (Rain | Cloudy), we can infer the likelihood of rain based on observed cloudiness. 33 4.4 Independence 4.4.1 Definition Two events are independent if the occurrence of one does not influence the probability of the other. In mathematical terms, P (A ∩ B) = P (A) × P (B). 4.4.2 Example The result of a coin flip (heads or tails) is independent of rolling a die. So, P (Heads) × P (3 on a die) = P (Heads and 3). 4.5 The Bayes’ Rule 4.5.1 Definition Bayes’ Rule is a fundamental theorem that relates conditional probabilities. It updates the probability of a hypothesis based on new evidence: P (E|H) · P (H) P (H|E) = P (E) 4.5.2 Example In medical diagnostics, if P (Disease) = 0.01, P (Positive Test|Disease) = 0.9, and P (Positive Test) = 0.05, then the probability of having the disease given a positive test is: P (Positive Test|Disease) · P (Disease) 0.9 · 0.01 P (Disease|Positive Test) = = = 0.18 P (Positive Test) 0.05 4.6 Quantifying Uncertainty 4.6.1 Definition Quantifying uncertainty involves using probabilities to represent and manage unknowns in a system. It allows AI systems to make informed decisions even when not all information is available. 4.6.2 Example Weather forecasting quantifies uncertainty by predicting the probability of rain. A forecast may say there’s a 70% chance of rain, quantifying our uncertainty about the weather. 4.7 Principle of Maximum Expected Utility This principle suggests that a rational agent should choose actions that max- imize the expected utility, which is calculated as the weighted average of all possible outcomes, with probabilities as weights. 34 4.7.1 Example In a game with a 50% chance of winning $100 or losing $50, the expected utility is: (0.5 × 100) + (0.5 × −50) = 25 A rational agent would take the action if it leads to the highest expected utility. 4.8 Probabilistic Assertions Probabilistic assertions express beliefs about events in uncertain environments. They summarize an agent’s belief in the likelihood of an event. 4.8.1 Example An assertion like P(earthquake) = 0.02 means the agent believes there’s a 2% chance of an earthquake, based on known information. 4.9 Frequentism vs. Bayesianism 4.9.1 Frequentism Views probabilities as the long-run frequencies of events. For example, a 50% probability of heads means that in a large number of coin tosses, half should result in heads. 4.9.2 Bayesianism Treats probabilities as degrees of belief or confidence. For example, if a weather forecast says 60% chance of rain, it reflects confidence in rain occurring, given current conditions, not necessarily based on historical frequencies. 4.10 Kolmogorov’s Axioms These are three foundational principles of probability: Non-Negativity: P (A) ≥ 0 for any event A. Normalization: P (Ω) = 1, where Ω is the sample space. Additivity: For mutually exclusive events A and B, P (A ∪ B) = P (A) + P (B). 4.10.1 Example If we roll a fair die, P (Even) + P (Odd) = 1 because these are mutually exclusive and exhaustive events. 35 4.11 Marginal Distributions A marginal distribution represents the probabilities of a subset of variables from a joint distribution, obtained by summing or integrating over other variables. 4.11.1 Example In a table showing probabilities of weather and temperature, the marginal dis- tribution for weather shows probabilities for each weather type, regardless of temperature. 4.12 Conditional Distributions 4.12.1 Definition A conditional distribution gives the probability of one variable given the value of another. 4.12.2 Example P (Rain|Cloudy) = 0.6 indicates that if it’s cloudy, there’s a 60% chance of rain. 4.13 Probabilistic Inference 4.13.1 Definition The process of deriving probabilities for unknown variables based on known relationships and observations. 4.13.2 Example Given symptoms and their probabilities with diseases, probabilistic inference can calculate the likelihood of a disease based on observed symptoms. 4.14 Inference by Enumeration 4.14.1 Definition Summing over all possible outcomes that match the evidence to calculate prob- abilities. 4.14.2 Example To find P (Rain), given some joint distribution over weather types, sum the probabilities of all events that include rain. 36 4.15 Product Rule 4.15.1 Definition The probability of both events A and B happening is P (A∩B) = P (A)·P (B|A). 4.15.2 Example If P (Rain) = 0.3 and P (Wet Road|Rain) = 0.8, then P (Rain and Wet Road) = 0.24. 4.16 Chain Rule 4.16.1 Definition A generalization of the product rule that breaks down the joint probability of multiple events into a product of conditional probabilities. 4.16.2 Example P (A ∩ B ∩ C) = P (A) · P (B|A) · P (C|A ∩ B). 4.17 Independence and Conditional Independence 4.17.1 Definition Two events are independent if P (A|B) = P (A); conditionally independent if P (A|B, C) = P (A|C). 4.17.2 Example In a medical diagnosis, symptoms are conditionally independent given the dis- ease but not necessarily independent on their own. 4.18 Naive Bayes Model 4.18.1 Definition Assumes all features are independent given the class, simplifying probability computations. 4.18.2 Example In spam detection, words in an email are treated as independent given whether it’s spam or not, simplifying the classification. 37 4.19 Bayesian Inference 4.19.1 Definition Uses Bayes’ Rule to update the probability of a hypothesis based on new evi- dence. 4.19.2 Example Diagnosing diseases with new test results, where each new piece of evidence updates the likelihood of a disease. 4.20 Examples of Probabilistic Inference 4.20.1 Ghostbusters Problem Inferring the ghost’s location using probabilistic evidence (e.g., sensor readings). 4.20.2 Diagnostic Probability Using symptoms to infer the probability of diseases. 4.21 AI Applications in Science 4.21.1 Definition AI applies probabilistic methods to various scientific fields for inference and prediction under uncertainty. 4.21.2 Example Bayesian inference in exoplanet characterization uses observed data to infer atmospheric properties, handling uncertainties in measurements. 38 4.22 Multiple-Choice Questions 1. What is a discrete random variable? (a) A variable that can take any value within a range (b) A variable with a finite number of distinct values ⋆ (c) A variable that represents continuous data (d) A variable with no defined probability distribution 2. A probability distribution for a continuous random variable is called: (a) Probability Mass Function (PMF) (b) Probability Density Function (PDF) ⋆ (c) Cumulative Distribution Function (CDF) (d) Frequency Distribution 3. In the context of Bayesian inference, P (A | B) represents: (a) The joint probability of A and B (b) The conditional probability of B given A (c) The conditional probability of A given B ⋆ (d) The probability of A 4. Two events are independent if: (a) P (A ∩ B) = P (A) + P (B) (b) P (A | B) = P (A) ⋆ (c) P (A | B) = P (B | A) (d) P (A ∩ B) = 0 5. The Bayes’ Rule is used to: (a) Find the probability of an event given its complement (b) Update prior beliefs with new evidence ⋆ (c) Calculate joint probabilities (d) Determine independence between two events 6. Which of the following reflects a frequentist view of probability? (a) Probability represents a degree of belief (b) Probability is a measure of ignorance (c) Probability is the long-run frequency of events ⋆ (d) Probability can change with new information 39 7. Kolmogorov’s second axiom states that: (a) The probability of the sample space is 1 (b) The probability of any event is non-negative (c) The probability of mutually exclusive events is additive ⋆ (d) The probability of complementary events sums to 1 8. A marginal distribution can be obtained by: (a) Dividing joint probabilities by conditional probabilities (b) Summing over the conditional distributions (c) Summing the probabilities over all possible values of the other variables ⋆ (d) Multiplying the probabilities of independent events 9. If P (A) = 0.3, P (B) = 0.5, and A and B are independent, then P (A ∩ B) =: (a) 0.15 ⋆ (b) 0.35 (c) 0.6 (d) 0.8 10. In a probability distribution, the sum of all probabilities must equal: (a) 0 (b) 1 ⋆ (c) Any positive value (d) Any real number 11. The Principle of Maximum Expected Utility suggests that agents should: (a) Minimize risk (b) Maximize the probability of an outcome (c) Choose actions with the highest expected utility ⋆ (d) Avoid uncertain situations 12. A conditional distribution describes: (a) The distribution of multiple variables (b) The distribution of one variable given the value of another ⋆ (c) The sum of probabilities across all variables (d) Independent events 40 13. Which of the following is true for a Naive Bayes Model? (a) All features are dependent given the class (b) It requires a large amount of training data (c) It assumes all features are conditionally independent given the class ⋆ (d) It does not use conditional probabilities 14. Which method can be used to find P (A | B) using P (B | A), P (A), and P (B)? (a) Kolmogorov’s axioms (b) Chain Rule (c) Bayes’ Rule ⋆ (d) Product Rule 15. Probabilistic inference is used to: (a) Predict future events without using probabilities (b) Make decisions based on observed data ⋆ (c) Measure the accuracy of a prediction (d) Increase uncertainty in an AI system 16. The Chain Rule allows us to: (a) Calculate marginal distributions (b) Express a joint distribution as a product of conditional prob- abilities ⋆ (c) Combine two independent events (d) Update beliefs with new evidence 17. In Bayesian inference, a posterior probability: (a) Is calculated without any prior knowledge (b) Remains constant regardless of new evidence (c) Is updated based on observed evidence ⋆ (d) Represents the probability of new evidence given the hypothesis 18. In probabilistic terms, ”ignorance” can contribute to uncertainty because: (a) It increases the number of random variables (b) It affects the probabilities of events being unknown ⋆ (c) It forces all events to be treated as independent (d) It changes conditional probabilities into joint probabilities 41 19. Which of the following is NOT a property of conditional independence? (a) P (A | B, C) = P (A | C) (b) P (B | A, C) = P (B | C) (c) P (A, B | C) = P (A | C) · P (B | C) (d) P (A, B) = P (A | B) · P (B) ⋆ 20. A joint distribution provides: (a) The probability of each event separately (b) A summary of the expected utility for each action (c) The probability of all combinations of multiple random vari- ables ⋆ (d) The probability of one event given another 21. An event with a probability of 0: (a) Is impossible ⋆ (b) Has occurred (c) Is guaranteed (d) Is highly likely 22. Frequentism interprets probability as: (a) A measure of uncertainty (b) The likelihood of an event based on subjective belief (c) A degree of confidence (d) The long-run frequency of occurrence of an event ⋆ 23. The Product Rule states that P (A ∩ B) can be calculated as: (a) P (A) + P (B | A) (b) P (A) · P (B) (c) P (A) · P (B | A) ⋆ (d) P (A | B) + P (B) 24. Which of the following represents a probability mass function (PMF)? (a) P (X) = 1 − e−x for x ≥ 0 λk e−λ (b) P (X = k) = k! for k = 0, 1, 2,... ⋆ 2 (c) P (X = x) = x (d) P (X = x) = sin(x) 42 25. In a probabilistic model, P (A | B) = 0.8 means: (a) There’s an 80% chance that B occurs if A does (b) There’s an 80% chance that A occurs if B does ⋆ (c) A and B are independent events (d) A and B are mutually exclusive 26. The Naive Bayes model works well in practice because: (a) It assumes complete dependence among features (b) It simplifies calculations by assuming conditional indepen- dence ⋆ (c) It only works with binary data (d) It does not require a large dataset 27. Conditional probability can be found using which rule? (a) Kolmogorov’s second axiom (b) Chain Rule (c) Product Rule ⋆ (d) Independence Rule 28. A probability density function (PDF) applies to: (a) Discrete random variables (b) Continuous random variables ⋆ (c) Both discrete and continuous variables (d) Non-random variables 29. Which of the following is an example of an inference problem? (a) Predicting tomorrow’s temperature ⋆ (b) Measuring current temperature (c) Checking if a statement is true (d) Calculating the sum of two numbers 30. Kolmogorov’s first axiom states that: (a) P (A) ≥ 0 for any event A ⋆ (b) P (Ω) = 1 (c) P (A ∪ B) = P (A) + P (B) for mutually exclusive events (d) P (A | B) = P (B | A) 43 31. If P (A ∪ B) = P (A) + P (B), then: (a) A and B are independent (b) A and B are mutually exclusive ⋆ (c) A and B are conditionally independent (d) A and B are always true 32. An agent using the Maximum Expected Utility principle should: (a) Choose the action with the highest utility only (b) Choose the action with the highest probability of success (c) Choose the action with the highest expected utility, consid- ering all possible outcomes ⋆ (d) Avoid uncertain outcomes 33. The cumulative distribution function (CDF) represents: (a) The likelihood of a specific outcome (b) The probability that a variable takes on a value less than or equal to a certain value ⋆ (c) The sum of all probabilities (d) The joint probability of multiple variables 34. Which rule can simplify the calculation of a joint probability if events are conditionally independent? (a) Chain Rule (b) Bayes’ Rule (c) Independence Rule (d) Conditional Independence ⋆ 35. Bayesian inference updates which of the following probabilities? (a) Prior probability (b) Marginal probability (c) Joint probability (d) Posterior probability ⋆ 36. A conditional probability distribution over random variables given fixed values for others is called: (a) Marginal Distribution (b) Conditional Distribution ⋆ (c) Joint Distribution or Independence 44 37. The naive Bayes model assumes: (a) Features are dependent on each other (b) Features are independent of each other given the class ⋆ (c) All features are binary (d) It does not use conditional probabilities 38. Probabilistic assertions allow an agent to: (a) Measure exact outcomes with certainty (b) Express uncertainty about propositions ⋆ (c) Guarantee outcomes (d) Calculate only joint probabilities 39. The marginal distribution of a random variable can be found by: (a) Dividing by the prior probability (b) Summing over all values of other variables in the joint dis- tribution ⋆ (c) Multiplying by conditional probabilities (d) Using the product rule 40. The process of calculating P (A ∪ B) when A and B are not mutually exclusive is given by: (a) P (A ∪ B) = P (A) + P (B) (b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) ⋆ (c) P (A ∪ B) = P (A) × P (B) (d) P (A ∪ B) = P (A | B) + P (B) 41. In probabilistic inference, evidence refers to: (a) Variables whose values are unknown (b) Variables with assigned probabilities based on observations ⋆ (c) A measure of prior belief (d) Probabilities that do not change with new data 42. In probabilistic terms, ignorance typically leads to: (a) Higher certainty (b) Greater uncertainty ⋆ (c) Independence (d) Reduced sample space 45 43. A joint probability distribution for two independent events can be written as: (a) P (A ∩ B) = P (A | B) · P (B) (b) P (A ∩ B) = P (A) · P (B) ⋆ (c) P (A ∩ B) = P (A) + P (B) (d) P (A ∩ B) = P (A) ÷ P (B) 44. In Bayesian inference, a prior probability represents: (a) Updated belief after observing evidence (b) The initial belief before any evidence is observed ⋆ (c) The probability of the evidence given the hypothesis (d) The likelihood of multiple events 45. A probability density function (PDF) is typically used to: (a) Calculate probabilities for discrete random variables (b) Define probabilities for continuous random variables ⋆ (c) Find joint probabilities (d) Estimate expected utility 46. Independence between two variables means: (a) P (A | B) = P (B | A) (b) P (A ∩ B) = 0 (c) P (A | B) = P (A) ⋆ (d) P (A) + P (B) = 1 47. Conditional independence is useful in probabilistic models because: (a) It reduces the number of parameters needed ⋆ (b) It always leads to independence between all variables (c) It eliminates the need for marginal distributions (d) It makes events mutually exclusive 48. A chain rule in probability is useful for: (a) Simplifying joint probabilities into conditional probabilities ⋆ (b) Calculating marginal probabilities only (c) Finding independence among events (d) Estimating prior probabilities 46 49. Which of the following can be an example of Bayesian inference? (a) Using a frequency table to find probabilities (b) Updating the likelihood of rain given new weather data ⋆ (c) Estimating the joint probability of multiple events (d) Summing probabilities across all possible outcomes 50. The Bayes’ Rule is foundational in AI because: (a) It measures random variables (b) It enables updating beliefs based on evidence ⋆ (c) It calculates exact probabilities without data (d) It identifies mutually exclusive events 47 5 Probabilistic Reasoning - Detailed Explana- tion and Examples 5.1 Bayesian Networks Definition: A Bayesian Network (BN) is a graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Each node represents a random variable, and each edge represents a probabilistic dependency. Components: 1. Nodes: Represent random variables. 2. Edges: Indicate conditional dependencies. 3. Conditional Probability Tables (CPTs): Specify the probability of each node given its parents. Example: In a medical diagnosis model, nodes could represent symptoms (e.g., Fever, Cough), diseases (e.g., Flu), and risk factors (e.g., Travel). An edge from Flu to Fever represents that the likelihood of a fever depends on the presence of the flu. 5.2 Semantics of Bayesian Networks Definition: The semantics of a Bayesian Network describe how the network structure corresponds to the factorization of the joint probability distribution of the variables. The joint probability of all variables is the product of the conditional probabilities of each variable given its parents. Factorization: For nodes A, B, and C, if A → B → C, the joint probability is P (A, B, C) = P (A) × P (B|A) × P (C|B). Example: In a network for weather conditions, with nodes representing Rain, Traffic, and Delays, we have P (Rain, Traffic, Delays) = P (Rain) × P (Traffic|Rain) × P (Delays|Traffic). 5.3 Construction of Bayesian Networks Definition: Constructing a Bayesian Network involves identifying relevant vari- ables, determining dependencies, and assigning conditional probability tables based on domain knowledge and data. Steps: 1. Identify Variables: Determine the set of variables relevant to the prob- lem. 2. Define Dependencies: Use expert knowledge or data to identify condi- tional dependencies among variables. 48 3. Specify CPTs: Define the conditional probabilities for each variable given its parents. Example: To construct a BN for car breakdown prediction, identify vari- ables like Engine Health, Battery Status, and Breakdown Risk. Define depen- dencies, such as Battery Status depending on Engine Health, and fill out CPTs for each variable. 5.4 Independence Relations in Bayesian Networks Definition: Independence relations in Bayesian Networks describe how the structure of the network implies conditional independence among variables. Two nodes are conditionally independent given a third node if they are d-separated by that node. Conditional Independence: In a network A → B → C, A and C are conditionally independent given B. Example: In a network with nodes for Weather, Road Conditions, and Ac- cident, Weather and Accident are conditionally independent given Road Condi- tions. Knowing the road condition gives all needed information about accident probability, regardless of weather. 5.5 Inference in Bayesian Networks Definition: Inference in Bayesian Networks involves calculating the probability of a particular variable given observed evidence. This is essential for decision- making and predictions. Exact Inference Methods: 1. Enumeration: Summing over all possible configurations of the non- evidence variables. 2. Variable Elimination: Systematically eliminates variables from the net- work to simplify calculations. Example: In a medical diagnosis network, if we know a patient has a Fever and Cough, inference allows calculating the probability of various diseases like Flu or Pneumonia based on this evidence. 5.6 Parameter Learning Definition: Parameter learning involves determining the probabilities in the CPTs of a Bayesian Network using data. This can be done using methods like Maximum Likelihood Estimation (MLE) or Bayesian Parameter Estimation. Example: In a Bayesian Network for predicting stock prices, parameter learning could use historical data to estimate probabilities for price changes given various market indicators. 49 5.7 Representing Uncertain Knowledge Definition: Bayesian Networks are a powerful tool for representing uncertain knowledge. They allow encoding probabilistic dependencies and capturing un- certainty about the state of the world. Example: In a cybersecurity context, a Bayesian Network could repre- sent uncertain knowledge about potential threats, such as likelihoods of various types of attacks (phishing, malware) given different security signals (e.g., firewall alerts). 5.8 Example Bayesian Networks Example 1: A Bayesian Network for predicting house prices might include nodes for Location, Market Conditions, House Size, and Price. Each node in- fluences the next, creating a network that reflects how these factors interdepen- dently affect pricing. Example 2: In a network for an online recommendation system, nodes might include User Preferences, Purchase History, and Product Ratings, with dependencies reflecting how preferences and past purchases influence product recommendations. 5.9 Inference by Enumeration Definition: Inference by Enumeration is an exact inference technique that involves summing over all possible values of non-evidence variables to calculate probabilities. Steps: 1. Fix the values for evidence variables. 2. Enumerate all combinations of non-evidence variables. 3. Sum the probabilities over these combinations. Example: In a network with variables for Flu, Fever, and Cough, given Fever as evidence, enumeration sums probabilities over all combinations of Flu and Cough to find the likelihood of Flu. 5.10 Variable Elimination Definition: Variable Elimination is a method to simplify inference by eliminat- ing variables one at a time, summing over them, and using intermediate factors to reduce computation. Steps: 1. Identify the order of elimination. 2. Sum out the variables one by one. 50 3. Multiply the resulting factors to compute the final probability. Example: In a Bayesian Network for predicting academic success, given variables like Study Time, Grades, and Exam Performance, variable elimination can efficiently compute the probability of Exam Performance given observed Grades and Study Time. 5.11 d-Separation Definition: d-Separation is a criterion to determine whether two nodes in a Bayesian Network are conditionally independent given a set of nodes. It involves ”blocking” paths using observed nodes to check for independence. Rules: 1. Chain Structure: A → B → C: A and C are independent given B. 2. Fork Structure: A ← B → C: A and C are independent given B. 3. Collider Structure: A → B ← C: A and C are dependent given B. Example: In a network with nodes for Weather, Traffic, and Accidents, d-Separation can determine if Weather and Accidents are independent given Traffic observations. 5.12 Approximate Inference Definition: Approximate inference methods provide estimates of probabilities when exact inference is computationally expensive. Techniques include sampling methods like Monte Carlo or using variational inference. Example: In a Bayesian Network for climate modeling with thousands of variables, approximate inference might estimate the probability of extreme weather events like hurricanes by sampling. 5.13 Parameter Learning with Maximum Likelihood Es- timation (MLE) Definition: MLE estimates parameters by finding the values that maximize the likelihood of the observed data given the model. Steps: 1. Define the likelihood function based on data. 2. Maximize the likelihood by adjusting the parameters. Example: In a Bayesian Network for marketing, MLE could be used to estimate the probability that a user purchases a product based on historical purchase data. 51 5.14 Bayesian Parameter Learning Definition: Bayesian Parameter Learning updates parameter estimates by in- corporating prior beliefs and observed data, yielding a posterior distribution over parameters. Example: In a Bayesian Network for disease prediction, using prior knowl- edge about disease prevalence combined with new patient data, Bayesian pa- rameter learning refines the probability estimates for various diseases. 5.15 Maximum a Posteriori (MAP) Estimation Definition: MAP estimation finds the parameter values that maximize the pos- terior distribution, combining observed data with prior knowledge. It’s similar to MLE but includes prior beliefs. Steps: 1. Calculate the posterior distribution using Bayes’ Rule. 2. Maximize the posterior to estimate parameters. Example: In a spam detection system, MAP estimation could determine the probability that an email is spam by combining prior probabilities of spam with the likelihood of observing certain words in the email. 52 5.16 Multiple Choice Questions on Probabilistic Reason- ing 1. In a Bayesian Network, nodes represent: (a) Variables that are independent (b) Random variables ⋆ (c) Only observable variables (d) Conditional probabilities 2. A Bayesian Network must be: (a) A directed cyclic graph (b) A directed acyclic graph (DAG) ⋆ (c) A directed acyclic graph (d) An undirected graph 3. The purpose of Conditional Probability Tables (CPTs) in Bayesian Net- works is to: (a) Show all possible outcomes (b) Specify probabilities for each node given its parents ⋆ (c) Determine variable independence (d) Organize the network structure 4. The joint probability distribution of a Bayesian Network is computed as: (a) The sum of all probabilities (b) The product of all probabilities (c) The product of the conditional probabilities of each variable given its parents ⋆ (d) The average of conditional probabilities 5. In a Bayesian Network, if A → B → C, the joint probability is represented as: (a) P (A, B, C) = P (A) + P (B|A) + P (C|B) (b) P (A, B, C) = P (A) × P (B|A) × P (C|B) ⋆ (c) P (A, B, C) = P (A|B) × P (B|C) (d) P (A, B, C) = P (C|B) × P (B) 53 6. Constructing a Bayesian Network requires which of the following steps? (a) Identifying variables and assigning probabilities (b) Building conditional probabilities randomly (c) Identifying relevant variables, defining dependencies, and specifying CPTs ⋆ (d) Testing all possible network structures 7. Which of the following represents conditional independence in Bayesian Networks? (a) A and B are independent if they are in the same network (b) A and C are conditionally independent given B in A → B → C ⋆ (c) A and C are always independent (d) A and B are dependent only if B is observed 8. In a Bayesian Network, inference involves: (a) Removing variables from the network (b) Calculating probabilities of variables given evidence ⋆ (c) Adding nodes to increase accuracy (d) Ignoring conditional dependencies 9. The process of determining CPTs from data is known as: (a) Model building (b) Parameter learning ⋆ (c) Inference (d) Variable elimination 10. Which of the following statements is true about Bayesian Networks? (a) They can contain cycles (b) All variables must have the same number of parents (c) They represent uncertain knowledge using probabilities ⋆ (d) They do not account for conditional independence 11. Inference by Enumeration is: (a) An approximation method (b) An exact inference method by summing over all non-evidence variables ⋆ (c) Used only for discrete variables or Faster than approximate methods 54 12. In Variable Elimination, the process involves: (a) Summing out variables one by one ⋆ (b) Eliminating variables randomly (c) Summing all variables simultaneously (d) Using d-separation to find independent variables 13. d-Separation helps determine: (a) The causal relationship between variables (b) The strength of dependence between variables (c) Conditional independence given a set of variables ⋆ (d) Which variables to eliminate in inference 14. Which structure in d-Separation implies A and C are conditionally inde- pendent given B? (a) Fork structure A ← B → C (b) Collider structure A → B ← C (c) Chain structure A → B → C ⋆ (d) No structure can imply this 15. Approximate Inference methods are used when: (a) The Bayesian Network is small (b) Exact inference is computationally expensive ⋆ (c) The network is acyclic (d) There is no observed evidence 16. An example of an approximate inference method is: (a) Enumeration (b) Monte Carlo sampling ⋆ (c) Variable Elimination (d) CPT estimation 17. Maximum Likelihood Estimation (MLE) aims to: (a) Minimize likelihood for observed data (b) Adjust prior probabilities only (c) Maximize the likelihood of observed data given the model ⋆ (d) Estimate unobserved variables 55 18. Parameter Learning can involve: (a) Removing dependencies between nodes (b) Using data to determine probabilities in CPTs ⋆ (c) Building new variables (d) Randomly assigning CPT values 19. Bayesian Parameter Learning differs from MLE because: (a) It includes prior knowledge in the estimation ⋆ (b) It ignores prior knowledge (c) It only works with large datasets (d) It does not use data 20. Which of the following is true about d-Separation? (a) It requires all nodes to be observed (b) It helps determine conditional independence by blocking paths ⋆ (c) It can only be used with exact inference (d) It requires a cyclic network 21. In a Bayesian Network, if A and C are independent given B, this means: (a) B is irrelevant to A and C (b) Observing B gives all necessary information about A and C ⋆ (c) A and C are always independent (d) B directly influences both A and C 22. Which of the following is a method of exact inference? (a) Variable Elimination ⋆ (b) Sampling (c) Monte Carlo (d) Bayesian estimation 23. Inference by Enumeration is: (a) Always faster than other methods (b) Accurate but computationally expensive ⋆ (c) Used only for continuous variables (d) Dependent on d-separation 56 24. Variable Elimination improves efficiency by: (a) Eliminating variables systematically to simplify calculations ⋆ (b) Using all variables in the final calculation (c) Relying on d-separation (d) Ignoring conditional dependencies 25. A Bayesian Network with the structure A → B → C implies: (a) A is conditionally independent of C given B (b) A directly affects B, which then affects C ⋆ (c) B is independent of A (d) C is directly affected by A 26. Maximum a Posteriori (MAP) Estimation uses: (a) Only observed data (b) Both observed data and prior beliefs ⋆ (c) Data for exact inference only (d) Only prior beliefs without data 27. Approximate inference is often used when: (a) The network has few variables (b) The network is too large for exact inference ⋆ (c) All variables are observed (d) Exact inference is sufficient 28. Bayesian Networks are well-suited for representing: (a) Uncertain knowledge with probabilistic dependencies ⋆ (b) Only deterministic events (c) Only observable knowledge (d) Linear relationships exclusively 29. d-Separation relies on: (a) CPT calculations (b) Blocking paths between nodes ⋆ (c) Variable elimination (d) Enumeration of all nodes 57 30. The difference between Bayesian and Maximum Likelihood Parameter Learning is that: (a) Bayesian learning uses only the data (b) Bayesian learning incorporates prior knowledge ⋆ (c) MLE uses priors (d) MLE ignores the data 31. If P (A) = 0.4, P (B|A) = 0.5, and P (B|¬A) = 0.2, in Bayesian terms: (a) A and B are conditionally independent (b) The conditional probabilities affect B given A ⋆ (c) A directly determines B (d) B determines A 32. An advantage of Bayesian Networks is: (a) They can model complex dependencies among variables ⋆ (b) They do not require CPTs (c) They are only useful for binary variables (d) They can be cyclic 33. Which of the following best describes MAP estimation? (a) It maximizes the posterior probability, considering both prior and observed data ⋆ (b) It ignores the prior (c) It only considers observed data (d) It only estimates priors 34. Variable Elimination is an efficient method because it: (a) Reduces computation by systematically removing variables ⋆ (b) Calculates all probabilities simultaneously (c) Ignores dependencies (d) Eliminates evidence variables only 35. The purpose of approximate inference is to: (a) Provide probability estimates when exact calculations are infeasible ⋆ (b) Ignore observed data (c) Sum probabilities across all variables (d) Calculate exact probabilities 58 36. The role of CPTs in Bayesian Networks is to: (a) Define the probability of each node given its parents ⋆ (b) Describe the network structure (c) Determine the number of nodes (d) Ensure acyclic connections 37. d-Separation is a method used to: (a) Determine conditional independence based on network struc- ture ⋆ (b) Simplify CPTs (c) Eliminate variables (d) Calculate exact probabilities 38. The term “factorization” in Bayesian Networks refers to: (a) Simplifying CPTs (b) Breaking down joint probabilities into products of condi- tional probabilities ⋆ (c) Eliminating variables from CPTs (d) Creating new conditional dependencies 39. In Bayesian Networks, an edge between two nodes implies: (a) Independence (b) A direct probabilistic dependency ⋆ (c) Conditional independence (d) Complete uncertainty 40. A Bayesian Network can represent: (a) Both observable and hidden variables ⋆ (b) Only hidden variables (c) Only observable variables (d) Only dependent variables 41. MAP estimation combines: (a) CPTs and network structure (b) Prior information and observed data ⋆ (c) Exact and approximate inference (d) MLE and sampling methods 59 42. In a Fork structure, A ← B → C, A and C are: (a) Conditionally independent given B ⋆ (b) Conditionally dependent (c) Always dependent (d) Always independent 43. Approximate inference techniques often use: (a) Exact enumeration (b) Sampling methods like Monte Carlo ⋆ (c) Variable elimination only (d) CPT simplification 44. Which is true for Bayesian Parameter Learning? (a) It only uses observed data (b) It updates beliefs based on both data and prior knowledge ⋆ (c) It requires no prior information (d) It always finds exact probabilities 45. The Bayesian Network structure A → B ← C is known as: (a) Collider ⋆ (b) Chain (c) Fork (d) D-separator 46. Bayesian Networks allow: (a) Cycles to simplify dependencies (b) Probabilistic reasoning with conditional dependencies ⋆ (c) Exact reasoning only (d) Random structures 47. The purpose of d-Separation is to: (a) Determine which nodes are conditionally independent ⋆ (b) Build CPTs (c) Optimize inference (d) Eliminate sampling needs 60 48. A Bayesian Network can be used to: (a) Model uncertain events and their dependencies ⋆ (b) Ensure data is always accurate (c) Simplify exact inference (d) Remove conditional dependencies 49. Which best describes Variable Elimination? (a) A method that reduces complexity by summing out vari- ables in order ⋆ (b) A sampling method (c) Only used for small networks (d) Ignores evidence variables 50. In the context of Bayesian Networks, MAP estimation is used to: (a) Find the parameter values that maximize the posterior dis- tribution ⋆ (b) Calculate joint probabilities (c) Remove dependencies (d) Ignore evidence 61 6 Reasoning Over Time in Artificial Intelligence 6.1 Markov Models Definition: Markov models are probabilistic models that describe systems evolving over time. The key assumption is that the future state depends only on the current state and not on the past states (Markov Property). Mathematical Representation: P (Xt |Xt−1 , Xt−2 ,..., X0 ) = P (Xt |Xt−1 ) Example: Weather prediction: States: Sunny, Rainy. Transition probabilities: P (Sunnyt+1 |Sunnyt ) = 0.8, P (Rainyt+1 |Sunnyt ) = 0.2. 6.2 Markov Processes 6.3 First-order Markov Processes Definition: A stochastic process where the probability of transitioning to the next state depends only on the current state. Example: A robot moving between rooms with probabilities based on its cur- rent location. 6.4 Sensor Markov Assumption Definition: Observations at time t depend only on the state at time t. Example: A robot’s sensor detects walls or doors, influenced only by its current room. 6.5 Stationarity Assumption Definition: The transition probabilities do not change over time. Example: The probability of transitioning from Sunny to Rainy remains con- sistent across days. 6.6 Inference Tasks 6.7 Prediction Definition: Compute the probability distribution of future states given the current state. Example: Predict tomorrow’s weather given today’s weather. 62 6.8 Filtering Definition: Estimate the current state given all past observations. Example: Determining a robot’s location based on sensor readings up to the present. 6.9 Smoothing Definition: Infer past states given observations up to the current time. Example: Retrospectively determining a vehicle’s path using GPS data. 6.10 Most Likely Explanation Definition: Find the sequence of states that best explains the observations. Example: Identifying the sequence of user actions based on observed clicks on a website. 6.11 Bayes Filter Definition: A recursive algorithm used to estimate the belief state P (Xt |e1:t ), where Xt is the state and e1:t are the observations up to time t. Steps: 1. Predict Step: X P̂ (Xt ) = P (Xt |Xt−1 )P (Xt−1 ) Xt−1 2. Update Step: P (Xt ) = ηP (et |Xt )P̂ (Xt ) Example: Tracking a car’s position using GPS data, where the belief is updated with each sensor reading. 6.12 Forward-Backward Algorithm Definition: An algorithm for smoothing in Hidden Markov Models. It com- putes the probability of states at each time step using forward and backward messages. Steps: Forward Pass: Compute P (e1:t , Xt ) recursively. Backward Pass: Compute P (et+1:T |Xt ) recursively. Example: Estimating the path of a hurricane using past and future weather data. 63 6.13 Hidden Markov Models (HMMs) Definition: A statistical model where the system’s states are hidden, but ob- servations depend probabilistically on these states. Structure: States: Hidden variables (Xt ). Observations: Visible variables (et ). Transition Model: P (Xt |Xt−1 ). Sensor Model: P (et |Xt ). Example: Speech recognition: States: Words or phonemes. Observations: Acoustic signals. Transition Model: Probability of one phoneme following another. Sensor Model: Probability of a signal given a phoneme. 6.14 Kalman Filter Definition: An algorithm for linear systems with Gaussian noise. It estimates the system’s state and updates it using predictions and observations. Steps: 1. Prediction: Estimate the next state. 2. Update: Adjust the estimate using the observation. Example: Apollo spacecraft navigation used Kalman filters to estimate the spacecraft’s position and velocity. 6.15 Particle Filter Definition: A non-parametric approach to approximate the posterior distribu- tion using a set of weighted samples (particles). Process: 1. Sample particles from the prior distribution. 2. Update weights based on observations. 3. Resample particles to maintain diversity. Example: Robot localization, where particles represent potential positions of the robot. 64 6.16 Dynamic Bayesian Networks (DBNs) Definition: A generalization of HMMs that allows multiple state variables and complex dependencies. Components: Nodes: Represent variables at each time step. Edges: Represent temporal dependencies. Example: Modeling gene regulation, where variables represent gene expressions at different time points. 6.17 Applications ICU Monitoring: – Use temporal models to monitor patient vitals and predict critical events. Weather Forecasting: – Use HMMs or DBNs to predict future weather conditions based on historical data. Example: Predicting rainfall intensity using past observations and atmo- spheric pressure readings. 6.18 Summary Temporal Models: Use probabilistic reasoning to handle changes over time. Belief State Maintenance: Update beliefs with new evidence. Inference Tasks: Filtering, prediction, smoothing, and finding the most likely explanation. Transition and Sensor Models: Core to all reasoning over time. 6.19 Multiple Choice Questions 1. A Markov Model assumes: (a) The future depends on all past states. (b) The future depends only on the current state. ⋆ (c) The future is independent of the current state. (d) All states are equally probable. 65 2. In a first-order Markov process: (a) Transition probabilities depend on all previous states. (b) Transition probabilities depend only on the current state. ⋆ (c) Transition probabilities are fixed over time. (d) Observations depend on future states. 3. The Sensor Markov Assumption states that: (a) Observations depend on past and future states. (b) Observations depend only on the current state. ⋆ (c) Observations are independent of the state. (d) Observations depend on all previous states. 4. What does the stationarity assumption imply? (a) Probabilities change over time. (b) Transition probabilities remain constant over time. ⋆ (c) Observations depend on future states. (d) States are uniformly distributed. 5. Which of the following is an inference task in temporal reasoning? (a) Sorting states. (b) Filtering. ⋆ (c) Labeling variables. (d) Calculating CPTs. 6. Prediction involves: (a) Estimating past states given observa

Introduction to Artificial Intelligence PDF

Document Details

Tags

Related

Summary

Full Transcript