Introduction to Artificial Intelligence PDF
Document Details
Uploaded by DependableNephrite3169
University of Liège
2024
Pouria Katouzian
Tags
Related
- Artificial Intelligence PDF
- Artificial Intelligence (AI) PDF
- Lecture 1 Introduction To Artificial Intelligence PDF
- Introduzione all'Intelligenza Artificiale e Machine Learning PDF
- Foundations of Artificial Intelligence (SCSB1311) PDF
- AI310 & CS361 Intro to Artificial Intelligence Lectures (Fall 2024) PDF
Summary
This document is an introduction to artificial intelligence, specifically focusing on questions and answers on topics like problem-solving, games, and probabilistic reasoning. The document is organized into sections covering multiple-choice questions.
Full Transcript
Introduction to Artificial Intelligence Pouria Katouzian October 2024 1 Contents 1 Intelligent Agents 3 1.1 Multiple Choice Questions...................... 3 2 Games and Adversarial Search...
Introduction to Artificial Intelligence Pouria Katouzian October 2024 1 Contents 1 Intelligent Agents 3 1.1 Multiple Choice Questions...................... 3 2 Games and Adversarial Search 8 2.1 Introduction to Adversarial Search................. 8 3 Solving Problems by Searching 17 3.1 Multiple-Choice Questions...................... 17 4 Probabilistic Reasoning - Detailed Explanation and Examples 26 4.1 Multiple Choice Questions on Probabilistic Reasoning...... 26 5 Reasoning Over Time in Artificial Intelligence 35 5.1 Multiple Choice Questions...................... 35 6 Machine Learning and Neural Networks 40 6.1 Multiple Choice Questions: Machine Learning and Neural Networks 40 7 Introduction to Reinforcement Learning (RL) 45 7.1 Multiple-Choice Questions on Reinforcement Learning...... 45 8 Detailed Explanation of Reinforcement Learning Topics 50 8.1 Multiple-Choice Questions on Reinforcement Learning...... 50 2 1 Intelligent Agents 1.1 Multiple Choice Questions 1. Which of the following are characteristics of planning agents? (a) They act without considering future consequences. (b) They generate sequences of actions to achieve a goal. * (c) They always use condition-action rules. (d) They use a model of the environment to predict future states. * 2. Which search methods are classified as uninformed search methods? (a) A* search (b) Depth-first search * (c) Breadth-first search * (d) Uniform-cost search * 3. A* search uses which of the following to determine the next node to ex- plore? (a) Depth of the node (b) Total path cost (g(n)) * (c) Heuristic estimate to the goal (h(n)) * (d) Random selection of nodes 4. Which of the following are true for reflex agents? (a) They rely on condition-action rules to make decisions. * (b) They can plan sequences of actions to reach a goal. (c) They act based on the current percept. * (d) They always consider the future consequences of their actions. 5. What are some characteristics of problem-solving agents? (a) They consider the consequences of their actions. * (b) They assume the environment is known and deterministic. * (c) They always use a random search strategy. (d) They take decisions based on the hypothesized future out- comes of actions. * 3 6. Which of the following is NOT a characteristic of a problem-solving agent? (a) Considers future actions (b) Acts only on the current percept * (c) Uses a model of the world (d) Operates in a known environment 7. What are the key components of a search problem? (a) Initial state * (b) Actions * (c) Goal state * (d) Feedback from the environment 8. Depth-first search: (a) Always finds the shortest path (b) Explores the deepest nodes first * (c) Is guaranteed to find a solution if one exists * (d) Uses a queue to track nodes 9. What is the primary difference between uninformed and informed search methods? (a) Uninformed search has no knowledge of the goal location (b) Informed search methods use heuristics * (c) Uninformed search methods are faster (d) Informed search methods expand nodes randomly 10. Heuristics are used in informed search methods to: (a) Estimate the cost from a node to the goal * (b) Reduce the size of the search space * (c) Guarantee finding the shortest path (d) Select actions randomly 11. In the A* algorithm, the cost function f (n) is a combination of: (a) The depth of the node and the branching factor (b) The path cost to reach the node and the estimated cost to reach the goal * (c) The number of explored nodes and the current node’s depth (d) Random selection and exploration 4 12. Breadth-first search is guaranteed to find the shortest path in which type of graph? (a) Weighted graphs (b) Unweighted graphs * (c) Directed graphs (d) Cyclic graphs 13. Which of the following are types of uninformed search methods? (a) Uniform-cost search * (b) A* search (c) Depth-first search * (d) Breadth-first search * 14. Planning agents differ from reflex agents because: (a) They consider only the current percept (b) They plan sequences of actions to reach a goal * (c) They act based on condition-action rules (d) They use no information about the future state 15. In a uniform-cost search, nodes are expanded based on: (a) The number of nodes in the tree (b) The lowest path cost * (c) The depth of the node (d) The estimated distance to the goal 16. Reflex agents: (a) Always consider the future consequences of their actions (b) Act based on condition-action rules * (c) Use heuristics to find the best action (d) Can solve complex planning problems 17. Which of the following are examples of informed search? (a) Breadth-first search (b) Depth-first search (c) A* search * (d) Greedy best-first search * 5 18. The goal of a rational agent is to: (a) Maximize its performance based on a given performance measure * (b) Randomly explore the environment (c) Always minimize the number of actions (d) Always explore the entire search space 19. In problem-solving agents, what assumptions are typically made about the environment? (a) It is observable and deterministic * (b) It is multi-agent and stochastic (c) It is unknown and unpredictable (d) It is single-agent and fully known * 20. Which of the following are examples of uninformed search? (a) Uniform-cost search * (b) A* search (c) Depth-first search * (d) Iterative deepening search * 21. A rational agent acts to: (a) Minimize the number of actions (b) Maximize the expected outcome based on its performance measure * (c) Randomly select actions to explore new possibilities (d) Always achieve the highest possible reward in every situation 22. A* search is considered optimal when: (a) The path cost function g(n) is greater than the heuristic h(n) (b) The heuristic h(n) is admissible (it never overestimates the true cost) * (c) It expands nodes in order of their depth (d) It selects the node with the lowest heuristic value at every step 6 23. Problem-solving agents are different from reflex agents because: (a) They do not consider future consequences (b) They use a search strategy to plan sequences of actions * (c) They cannot act in a known environment (d) They do not use percepts to make decisions 24. A reflex agent can be described as: (a) Always planning the next sequence of actions (b) Acting based on current percepts only * (c) Using heuristics to make decisions (d) Always considering future possibilities 25. In a search problem, the goal state is defined as: (a) The desired outcome the agent is trying to reach * (b) The state that minimizes the total path cost (c) The first state explored by the search algorithm (d) The state with the fewest actions 7 2 Games and Adversarial Search 2.1 Introduction to Adversarial Search 1. A game like tic-tac-toe is an example of: (a) A cooperative search (b) An adversarial search ⋆ (c) A multi-player search (d) A random search 2. Multi-agent environments involve: (a) Only one player (b) Players with aligned goals (c) Multiple players, often with opposing goals ⋆ (d) Agents who ignore each other 3. Which of the following is a deterministic game with perfect information? (a) Poker (b) Risk (c) Chess ⋆ (d) Monopoly 8 4. In the formal definition of a game, terminal states are: (a) The initial setup of the game (b) The states where the game ends ⋆ (c) Moves that lead to winning (d) States that occur in the middle of the game 5. In zero-sum games: (a) One player’s gain is another player’s loss ⋆ (b) All players gain equally (c) The total utility can be positive (d) All moves have the same value 6. The minimax algorithm is primarily used for: (a) Finding the optimal move in zero-sum games ⋆ (b) Maximizing random outcomes (c) Determining probabilities (d) Constructing game trees 7. Minimax assumes that: (a) Players will make random moves (b) Both players play optimally ⋆ (c) Only the first player plays optimally (d) All players cooperate 8. Which of the following is true about minimax? (a) It explores the entire game tree for optimal moves ⋆ (b) It only explores a subset of the game tree (c) It always finds a winning move (d) It works best with non-deterministic games 9. The time complexity of minimax is: (a) O(m) (b) O(bm ) ⋆ (c) O(b × m) (d) O(b/m) 9 10. Alpha-beta pruning improves minimax by: (a) Pruning branches that won’t affect the final decision ⋆ (b) Exploring the entire game tree (c) Ignoring opponent moves (d) Focusing only on random moves 11. Alpha-beta pruning is most effective when: (a) Moves are evaluated randomly (b) Moves are evaluated in the best possible order ⋆ (c) Moves are evaluated last (d) The game tree is small 12. Alpha-beta pruning can reduce the time complexity to approximately: (a) O(bm/2 ) ⋆ (b) O(bm ) (c) O(m) (d) O(b) 13. The game tree size is influenced by: (a) The players’ choices (b) The branching factor and the depth ⋆ (c) The initial state only (d) Terminal states only 14. Transposition tables are used to: (a) Store previously computed positions to avoid redundancy ⋆ (b) Change the order of moves (c) Add randomness to the game (d) Estimate the game tree size 15. Imperfect real-time decisions are required in: (a) Turn-based games only (b) Static games (c) Real-time strategy games ⋆ (d) Deterministic games 10 16. Evaluation functions are necessary when: (a) The game is simple (b) The search cannot reach terminal states within time limits ⋆ (c) The game has few moves (d) Alpha-beta pruning is used 17. A good evaluation function in tic-tac-toe might: (a) Assign higher scores for rows with two marks and an open space ⋆ (b) Only evaluate completed rows (c) Ignore diagonal rows (d) Count all empty spaces 18. Quiescence search aims to: (a) Extend search at unstable positions to avoid misleading eval- uations ⋆ (b) Shorten the game tree (c) Increase the branching factor (d) Limit evaluation functions 19. The horizon effect occurs when: (a) An algorithm cannot see beyond a certain depth ⋆ (b) The evaluation function is perfect (c) Terminal states are reached (d) The branching factor is low 20. Multi-agent games often involve: (a) Multiple players who may form alliances or betray each other ⋆ (b) Only two players (c) Players with the same objectives (d) No competitive elements 21. Which of the following is an example of a stochastic game? (a) Chess or Checkers (b) Tic-tac-toe (c) Monopoly ⋆ 11 22. The expectiminimax algorithm is used in games that: (a) Involve randomness and probabilistic outcomes ⋆ (b) Are deterministic (c) Have perfect information (d) Only involve two players 23. In backgammon, expectiminimax accounts for: (a) Dice rolls as chance nodes ⋆ (b) Only player decisions (c) The number of pieces (d) Non-probabilistic outcomes 24. Monte Carlo Tree Search (MCTS) relies on: (a) Deterministic moves (b) Random sampling to explore game states ⋆ (c) Only known positions (d) Alpha-beta pruning 25. Exploration in MCTS refers to: (a) Trying new moves to discover their potential ⋆ (b) Repeating known moves (c) Focusing on the best-known move (d) Minimizing the search area 26. State-of-the-art game programs often use: (a) Techniques like MCTS, alpha-beta pruning, and neural net- works ⋆ (b) Only minimax (c) Pure random strategies (d) No heuristic methods 27. AlphaGo Zero differs from AlphaGo by: (a) Learning entirely through self-play without human data ⋆ (b) Relying on human games for training (c) Using only minimax (d) Not using any neural networks 12 28. Adversarial search is essential in: (a) Scenarios where agents compete to minimize each other’s utility ⋆ (b) Cooperative games only (c) Single-player puzzles (d) Random guessing games 29. In a zero-sum game, if Player A gains 5 points: (a) Player B loses 5 points ⋆ (b) Player B also gains 5 points (c) The total points remain unaffected (d) Player B gains no points 30. The branching factor in a game tree refers to: (a) The average number of moves available at each node ⋆ (b) The number of players (c) The depth of the tree (d) The utility of moves 31. Which of the following best describes AlphaGo’s training approach? (a) It used human games combined with reinforcement learning ⋆ (b) It relied solely on random moves (c) It required no prior game data (d) It used only minimax 32. Monte Carlo Tree Search uses exploration and exploitation to: (a) Balance between trying new moves and using known good moves ⋆ (b) Avoid randomness (c) Focus only on terminal states (d) Reduce the game tree size 33. An evaluation function in chess might consider: (a) Material balance and piece positioning ⋆ (b) Only the number of moves (c) Random positions (d) Terminal states only 13 34. Which of the following is true about transposition tables? (a) They help avoid recalculating positions that have been pre- viously evaluated ⋆ (b) They store all possible moves (c) They increase the game tree size (d) They limit the number of players 35. Quiescence search prevents: (a) Misleading evaluations due to imminent significant moves ⋆ (b) The use of evaluation functions (c) The horizon effect (d) Opponent predictions 36. A horizon effect in chess might result in: (a) Missing a critical move just beyond the search depth ⋆ (b) Calculating all moves perfectly (c) Avoiding checkmate (d) Overestimating the search depth 37. In multi-agent games like Risk, players may: (a) Form temporary alliances and later compete for individual goals ⋆ (b) Always cooperate (c) Share all resources equally (d) Ignore each other 38. Stochastic games involve elements of: (a) Randomness and chance ⋆ (b) Perfect information (c) Single-player actions (d) Deterministic outcomes only 39. The expectiminimax algorithm is essential in games like: (a) Poker, where there is both chance and decision-making ⋆ (b) Chess, which is fully deterministic (c) Tic-tac-toe, with no randomness (d) Checkers, with perfect information 14 40. Monte Carlo Tree Search improves over time by: (a) Running more simulations to better estimate move values ⋆ (b) Reducing the branching factor (c) Ignoring unknown moves (d) Focusing only on winning moves 41. Alpha-beta pruning is more effective when: (a) Moves are evaluated in an optimal order ⋆ (b) Moves are selected randomly (c) Terminal states are not reached (d) Players make suboptimal decisions 42. Which of the following is a characteristic of adversarial search in AI? (a) It involves agents trying to maximize their gain while min- imizing their opponent’s ⋆ (b) It involves no competition (c) It only applies to cooperative games (d) It always requires human input 43. The horizon effect can be reduced by: (a) Extending the search at critical points using quiescence search ⋆ (b) Limiting the search depth (c) Focusing on terminal states only (d) Increasing the branching factor 44. Which is a state-of-the-art game AI that uses MCTS? (a) AlphaGo ⋆ (b) Deep Blue (c) IBM Watson (d) Chinook 45. AlphaGo Zero’s learning approach involved: (a) Self-play with reinforcement learning, without human data ⋆ (b) Training on historical games only (c) Using a single heuristic (d) Relying on minimax exclusively 15 46. Transposition tables are particularly useful in games with: (a) Many repeated states reached through different sequences of moves ⋆ (b) Only two players (c) Simple move sets (d) No chance elements 47. In MCTS, exploitation involves: (a) Using moves known to be good based on previous simula- tions ⋆ (b) Trying random moves (c) Increasing the branching factor (d) Avoiding terminal states 48. The use of adversarial search extends beyond games to fields like: (a) Cybersecurity, where AI must counteract attackers ⋆ (b) Basic arithmetic problems (c) Linear programming (d) Non-competitive fields 49. Which of the following AI systems demonstrated state-of-the-art play in chess? (a) Deep Blue ⋆ (b) AlphaGo (c) AlphaGo Zero (d) IBM Watson 16 3 Solving Problems by Searching 3.1 Multiple-Choice Questions 1. What is a discrete random variable? (a) A variable that can take any value within a range (b) A variable with a finite number of distinct values ⋆ (c) A variable that represents continuous data (d) A variable with no defined probability distribution 2. A probability distribution for a continuous random variable is called: (a) Probability Mass Function (PMF) (b) Probability Density Function (PDF) ⋆ (c) Cumulative Distribution Function (CDF) (d) Frequency Distribution 3. In the context of Bayesian inference, P (A | B) represents: (a) The joint probability of A and B (b) The conditional probability of B given A (c) The conditional probability of A given B ⋆ (d) The probability of A 4. Two events are independent if: (a) P (A ∩ B) = P (A) + P (B) (b) P (A | B) = P (A) ⋆ (c) P (A | B) = P (B | A) (d) P (A ∩ B) = 0 5. The Bayes’ Rule is used to: (a) Find the probability of an event given its complement (b) Update prior beliefs with new evidence ⋆ (c) Calculate joint probabilities (d) Determine independence between two events 6. Which of the following reflects a frequentist view of probability? (a) Probability represents a degree of belief (b) Probability is a measure of ignorance (c) Probability is the long-run frequency of events ⋆ (d) Probability can change with new information 17 7. Kolmogorov’s second axiom states that: (a) The probability of the sample space is 1 (b) The probability of any event is non-negative (c) The probability of mutually exclusive events is additive ⋆ (d) The probability of complementary events sums to 1 8. A marginal distribution can be obtained by: (a) Dividing joint probabilities by conditional probabilities (b) Summing over the conditional distributions (c) Summing the probabilities over all possible values of the other variables ⋆ (d) Multiplying the probabilities of independent events 9. If P (A) = 0.3, P (B) = 0.5, and A and B are independent, then P (A ∩ B) =: (a) 0.15 ⋆ (b) 0.35 (c) 0.6 (d) 0.8 10. In a probability distribution, the sum of all probabilities must equal: (a) 0 (b) 1 ⋆ (c) Any positive value (d) Any real number 11. The Principle of Maximum Expected Utility suggests that agents should: (a) Minimize risk (b) Maximize the probability of an outcome (c) Choose actions with the highest expected utility ⋆ (d) Avoid uncertain situations 12. A conditional distribution describes: (a) The distribution of multiple variables (b) The distribution of one variable given the value of another ⋆ (c) The sum of probabilities across all variables (d) Independent events 18 13. Which of the following is true for a Naive Bayes Model? (a) All features are dependent given the class (b) It requires a large amount of training data (c) It assumes all features are conditionally independent given the class ⋆ (d) It does not use conditional probabilities 14. Which method can be used to find P (A | B) using P (B | A), P (A), and P (B)? (a) Kolmogorov’s axioms (b) Chain Rule (c) Bayes’ Rule ⋆ (d) Product Rule 15. Probabilistic inference is used to: (a) Predict future events without using probabilities (b) Make decisions based on observed data ⋆ (c) Measure the accuracy of a prediction (d) Increase uncertainty in an AI system 16. The Chain Rule allows us to: (a) Calculate marginal distributions (b) Express a joint distribution as a product of conditional prob- abilities ⋆ (c) Combine two independent events (d) Update beliefs with new evidence 17. In Bayesian inference, a posterior probability: (a) Is calculated without any prior knowledge (b) Remains constant regardless of new evidence (c) Is updated based on observed evidence ⋆ (d) Represents the probability of new evidence given the hypothesis 18. In probabilistic terms, ”ignorance” can contribute to uncertainty because: (a) It increases the number of random variables (b) It affects the probabilities of events being unknown ⋆ (c) It forces all events to be treated as independent (d) It changes conditional probabilities into joint probabilities 19 19. Which of the following is NOT a property of conditional independence? (a) P (A | B, C) = P (A | C) (b) P (B | A, C) = P (B | C) (c) P (A, B | C) = P (A | C) · P (B | C) (d) P (A, B) = P (A | B) · P (B) ⋆ 20. A joint distribution provides: (a) The probability of each event separately (b) A summary of the expected utility for each action (c) The probability of all combinations of multiple random vari- ables ⋆ (d) The probability of one event given another 21. An event with a probability of 0: (a) Is impossible ⋆ (b) Has occurred (c) Is guaranteed (d) Is highly likely 22. Frequentism interprets probability as: (a) A measure of uncertainty (b) The likelihood of an event based on subjective belief (c) A degree of confidence (d) The long-run frequency of occurrence of an event ⋆ 23. The Product Rule states that P (A ∩ B) can be calculated as: (a) P (A) + P (B | A) (b) P (A) · P (B) (c) P (A) · P (B | A) ⋆ (d) P (A | B) + P (B) 24. Which of the following represents a probability mass function (PMF)? (a) P (X) = 1 − e−x for x ≥ 0 λk e−λ (b) P (X = k) = k! for k = 0, 1, 2,... ⋆ 2 (c) P (X = x) = x (d) P (X = x) = sin(x) 20 25. In a probabilistic model, P (A | B) = 0.8 means: (a) There’s an 80% chance that B occurs if A does (b) There’s an 80% chance that A occurs if B does ⋆ (c) A and B are independent events (d) A and B are mutually exclusive 26. The Naive Bayes model works well in practice because: (a) It assumes complete dependence among features (b) It simplifies calculations by assuming conditional indepen- dence ⋆ (c) It only works with binary data (d) It does not require a large dataset 27. Conditional probability can be found using which rule? (a) Kolmogorov’s second axiom (b) Chain Rule (c) Product Rule ⋆ (d) Independence Rule 28. A probability density function (PDF) applies to: (a) Discrete random variables (b) Continuous random variables ⋆ (c) Both discrete and continuous variables (d) Non-random variables 29. Which of the following is an example of an inference problem? (a) Predicting tomorrow’s temperature ⋆ (b) Measuring current temperature (c) Checking if a statement is true (d) Calculating the sum of two numbers 30. Kolmogorov’s first axiom states that: (a) P (A) ≥ 0 for any event A ⋆ (b) P (Ω) = 1 (c) P (A ∪ B) = P (A) + P (B) for mutually exclusive events (d) P (A | B) = P (B | A) 21 31. If P (A ∪ B) = P (A) + P (B), then: (a) A and B are independent (b) A and B are mutually exclusive ⋆ (c) A and B are conditionally independent (d) A and B are always true 32. An agent using the Maximum Expected Utility principle should: (a) Choose the action with the highest utility only (b) Choose the action with the highest probability of success (c) Choose the action with the highest expected utility, consid- ering all possible outcomes ⋆ (d) Avoid uncertain outcomes 33. The cumulative distribution function (CDF) represents: (a) The likelihood of a specific outcome (b) The probability that a variable takes on a value less than or equal to a certain value ⋆ (c) The sum of all probabilities (d) The joint probability of multiple variables 34. Which rule can simplify the calculation of a joint probability if events are conditionally independent? (a) Chain Rule (b) Bayes’ Rule (c) Independence Rule (d) Conditional Independence ⋆ 35. Bayesian inference updates which of the following probabilities? (a) Prior probability (b) Marginal probability (c) Joint probability (d) Posterior probability ⋆ 36. A conditional probability distribution over random variables given fixed values for others is called: (a) Marginal Distribution (b) Conditional Distribution ⋆ (c) Joint Distribution or Independence 22 37. The naive Bayes model assumes: (a) Features are dependent on each other (b) Features are independent of each other given the class ⋆ (c) All features are binary (d) It does not use conditional probabilities 38. Probabilistic assertions allow an agent to: (a) Measure exact outcomes with certainty (b) Express uncertainty about propositions ⋆ (c) Guarantee outcomes (d) Calculate only joint probabilities 39. The marginal distribution of a random variable can be found by: (a) Dividing by the prior probability (b) Summing over all values of other variables in the joint dis- tribution ⋆ (c) Multiplying by conditional probabilities (d) Using the product rule 40. The process of calculating P (A ∪ B) when A and B are not mutually exclusive is given by: (a) P (A ∪ B) = P (A) + P (B) (b) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) ⋆ (c) P (A ∪ B) = P (A) × P (B) (d) P (A ∪ B) = P (A | B) + P (B) 41. In probabilistic inference, evidence refers to: (a) Variables whose values are unknown (b) Variables with assigned probabilities based on observations ⋆ (c) A measure of prior belief (d) Probabilities that do not change with new data 42. In probabilistic terms, ignorance typically leads to: (a) Higher certainty (b) Greater uncertainty ⋆ (c) Independence (d) Reduced sample space 23 43. A joint probability distribution for two independent events can be written as: (a) P (A ∩ B) = P (A | B) · P (B) (b) P (A ∩ B) = P (A) · P (B) ⋆ (c) P (A ∩ B) = P (A) + P (B) (d) P (A ∩ B) = P (A) ÷ P (B) 44. In Bayesian inference, a prior probability represents: (a) Updated belief after observing evidence (b) The initial belief before any evidence is observed ⋆ (c) The probability of the evidence given the hypothesis (d) The likelihood of multiple events 45. A probability density function (PDF) is typically used to: (a) Calculate probabilities for discrete random variables (b) Define probabilities for continuous random variables ⋆ (c) Find joint probabilities (d) Estimate expected utility 46. Independence between two variables means: (a) P (A | B) = P (B | A) (b) P (A ∩ B) = 0 (c) P (A | B) = P (A) ⋆ (d) P (A) + P (B) = 1 47. Conditional independence is useful in probabilistic models because: (a) It reduces the number of parameters needed ⋆ (b) It always leads to independence between all variables (c) It eliminates the need for marginal distributions (d) It makes events mutually exclusive 48. A chain rule in probability is useful for: (a) Simplifying joint probabilities into conditional probabilities ⋆ (b) Calculating marginal probabilities only (c) Finding independence among events (d) Estimating prior probabilities 24 49. Which of the following can be an example of Bayesian inference? (a) Using a frequency table to find probabilities (b) Updating the likelihood of rain given new weather data ⋆ (c) Estimating the joint probability of multiple events (d) Summing probabilities across all possible outcomes 50. The Bayes’ Rule is foundational in AI because: (a) It measures random variables (b) It enables updating beliefs based on evidence ⋆ (c) It calculates exact probabilities without data (d) It identifies mutually exclusive events 25 4 Probabilistic Reasoning - Detailed Explana- tion and Examples 4.1 Multiple Choice Questions on Probabilistic Reasoning 1. In a Bayesian Network, nodes represent: (a) Variables that are independent (b) Random variables ⋆ (c) Only observable variables (d) Conditional probabilities 2. A Bayesian Network must be: (a) A directed cyclic graph (b) A directed acyclic graph (DAG) ⋆ (c) A directed acyclic graph (d) An undirected graph 3. The purpose of Conditional Probability Tables (CPTs) in Bayesian Net- works is to: (a) Show all possible outcomes (b) Specify probabilities for each node given its parents ⋆ (c) Determine variable independence (d) Organize the network structure 4. The joint probability distribution of a Bayesian Network is computed as: (a) The sum of all probabilities (b) The product of all probabilities (c) The product of the conditional probabilities of each variable given its parents ⋆ (d) The average of conditional probabilities 5. In a Bayesian Network, if A → B → C, the joint probability is represented as: (a) P (A, B, C) = P (A) + P (B|A) + P (C|B) (b) P (A, B, C) = P (A) × P (B|A) × P (C|B) ⋆ (c) P (A, B, C) = P (A|B) × P (B|C) (d) P (A, B, C) = P (C|B) × P (B) 26 6. Constructing a Bayesian Network requires which of the following steps? (a) Identifying variables and assigning probabilities (b) Building conditional probabilities randomly (c) Identifying relevant variables, defining dependencies, and specifying CPTs ⋆ (d) Testing all possible network structures 7. Which of the following represents conditional independence in Bayesian Networks? (a) A and B are independent if they are in the same network (b) A and C are conditionally independent given B in A → B → C ⋆ (c) A and C are always independent (d) A and B are dependent only if B is observed 8. In a Bayesian Network, inference involves: (a) Removing variables from the network (b) Calculating probabilities of variables given evidence ⋆ (c) Adding nodes to increase accuracy (d) Ignoring conditional dependencies 9. The process of determining CPTs from data is known as: (a) Model building (b) Parameter learning ⋆ (c) Inference (d) Variable elimination 10. Which of the following statements is true about Bayesian Networks? (a) They can contain cycles (b) All variables must have the same number of parents (c) They represent uncertain knowledge using probabilities ⋆ (d) They do not account for conditional independence 11. Inference by Enumeration is: (a) An approximation method (b) An exact inference method by summing over all non-evidence variables ⋆ (c) Used only for discrete variables or Faster than approximate methods 27 12. In Variable Elimination, the process involves: (a) Summing out variables one by one ⋆ (b) Eliminating variables randomly (c) Summing all variables simultaneously (d) Using d-separation to find independent variables 13. d-Separation helps determine: (a) The causal relationship between variables (b) The strength of dependence between variables (c) Conditional independence given a set of variables ⋆ (d) Which variables to eliminate in inference 14. Which structure in d-Separation implies A and C are conditionally inde- pendent given B? (a) Fork structure A ← B → C (b) Collider structure A → B ← C (c) Chain structure A → B → C ⋆ (d) No structure can imply this 15. Approximate Inference methods are used when: (a) The Bayesian Network is small (b) Exact inference is computationally expensive ⋆ (c) The network is acyclic (d) There is no observed evidence 16. An example of an approximate inference method is: (a) Enumeration (b) Monte Carlo sampling ⋆ (c) Variable Elimination (d) CPT estimation 17. Maximum Likelihood Estimation (MLE) aims to: (a) Minimize likelihood for observed data (b) Adjust prior probabilities only (c) Maximize the likelihood of observed data given the model ⋆ (d) Estimate unobserved variables 28 18. Parameter Learning can involve: (a) Removing dependencies between nodes (b) Using data to determine probabilities in CPTs ⋆ (c) Building new variables (d) Randomly assigning CPT values 19. Bayesian Parameter Learning differs from MLE because: (a) It includes prior knowledge in the estimation ⋆ (b) It ignores prior knowledge (c) It only works with large datasets (d) It does not use data 20. Which of the following is true about d-Separation? (a) It requires all nodes to be observed (b) It helps determine conditional independence by blocking paths ⋆ (c) It can only be used with exact inference (d) It requires a cyclic network 21. In a Bayesian Network, if A and C are independent given B, this means: (a) B is irrelevant to A and C (b) Observing B gives all necessary information about A and C ⋆ (c) A and C are always independent (d) B directly influences both A and C 22. Which of the following is a method of exact inference? (a) Variable Elimination ⋆ (b) Sampling (c) Monte Carlo (d) Bayesian estimation 23. Inference by Enumeration is: (a) Always faster than other methods (b) Accurate but computationally expensive ⋆ (c) Used only for continuous variables (d) Dependent on d-separation 29 24. Variable Elimination improves efficiency by: (a) Eliminating variables systematically to simplify calculations ⋆ (b) Using all variables in the final calculation (c) Relying on d-separation (d) Ignoring conditional dependencies 25. A Bayesian Network with the structure A → B → C implies: (a) A is conditionally independent of C given B (b) A directly affects B, which then affects C ⋆ (c) B is independent of A (d) C is directly affected by A 26. Maximum a Posteriori (MAP) Estimation uses: (a) Only observed data (b) Both observed data and prior beliefs ⋆ (c) Data for exact inference only (d) Only prior beliefs without data 27. Approximate inference is often used when: (a) The network has few variables (b) The network is too large for exact inference ⋆ (c) All variables are observed (d) Exact inference is sufficient 28. Bayesian Networks are well-suited for representing: (a) Uncertain knowledge with probabilistic dependencies ⋆ (b) Only deterministic events (c) Only observable knowledge (d) Linear relationships exclusively 29. d-Separation relies on: (a) CPT calculations (b) Blocking paths between nodes ⋆ (c) Variable elimination (d) Enumeration of all nodes 30 30. The difference between Bayesian and Maximum Likelihood Parameter Learning is that: (a) Bayesian learning uses only the data (b) Bayesian learning incorporates prior knowledge ⋆ (c) MLE uses priors (d) MLE ignores the data 31. If P (A) = 0.4, P (B|A) = 0.5, and P (B|¬A) = 0.2, in Bayesian terms: (a) A and B are conditionally independent (b) The conditional probabilities affect B given A ⋆ (c) A directly determines B (d) B determines A 32. An advantage of Bayesian Networks is: (a) They can model complex dependencies among variables ⋆ (b) They do not require CPTs (c) They are only useful for binary variables (d) They can be cyclic 33. Which of the following best describes MAP estimation? (a) It maximizes the posterior probability, considering both prior and observed data ⋆ (b) It ignores the prior (c) It only considers observed data (d) It only estimates priors 34. Variable Elimination is an efficient method because it: (a) Reduces computation by systematically removing variables ⋆ (b) Calculates all probabilities simultaneously (c) Ignores dependencies (d) Eliminates evidence variables only 35. The purpose of approximate inference is to: (a) Provide probability estimates when exact calculations are infeasible ⋆ (b) Ignore observed data (c) Sum probabilities across all variables (d) Calculate exact probabilities 31 36. The role of CPTs in Bayesian Networks is to: (a) Define the probability of each node given its parents ⋆ (b) Describe the network structure (c) Determine the number of nodes (d) Ensure acyclic connections 37. d-Separation is a method used to: (a) Determine conditional independence based on network struc- ture ⋆ (b) Simplify CPTs (c) Eliminate variables (d) Calculate exact probabilities 38. The term “factorization” in Bayesian Networks refers to: (a) Simplifying CPTs (b) Breaking down joint probabilities into products of condi- tional probabilities ⋆ (c) Eliminating variables from CPTs (d) Creating new conditional dependencies 39. In Bayesian Networks, an edge between two nodes implies: (a) Independence (b) A direct probabilistic dependency ⋆ (c) Conditional independence (d) Complete uncertainty 40. A Bayesian Network can represent: (a) Both observable and hidden variables ⋆ (b) Only hidden variables (c) Only observable variables (d) Only dependent variables 41. MAP estimation combines: (a) CPTs and network structure (b) Prior information and observed data ⋆ (c) Exact and approximate inference (d) MLE and sampling methods 32 42. In a Fork structure, A ← B → C, A and C are: (a) Conditionally independent given B ⋆ (b) Conditionally dependent (c) Always dependent (d) Always independent 43. Approximate inference techniques often use: (a) Exact enumeration (b) Sampling methods like Monte Carlo ⋆ (c) Variable elimination only (d) CPT simplification 44. Which is true for Bayesian Parameter Learning? (a) It only uses observed data (b) It updates beliefs based on both data and prior knowledge ⋆ (c) It requires no prior information (d) It always finds exact probabilities 45. The Bayesian Network structure A → B ← C is known as: (a) Collider ⋆ (b) Chain (c) Fork (d) D-separator 46. Bayesian Networks allow: (a) Cycles to simplify dependencies (b) Probabilistic reasoning with conditional dependencies ⋆ (c) Exact reasoning only (d) Random structures 47. The purpose of d-Separation is to: (a) Determine which nodes are conditionally independent ⋆ (b) Build CPTs (c) Optimize inference (d) Eliminate sampling needs 33 48. A Bayesian Network can be used to: (a) Model uncertain events and their dependencies ⋆ (b) Ensure data is always accurate (c) Simplify exact inference (d) Remove conditional dependencies 49. Which best describes Variable Elimination? (a) A method that reduces complexity by summing out vari- ables in order ⋆ (b) A sampling method (c) Only used for small networks (d) Ignores evidence variables 50. In the context of Bayesian Networks, MAP estimation is used to: (a) Find the parameter values that maximize the posterior dis- tribution ⋆ (b) Calculate joint probabilities (c) Remove dependencies (d) Ignore evidence 34 5 Reasoning Over Time in Artificial Intelligence 5.1 Multiple Choice Questions 1. A Markov Model assumes: (a) The future depends on all past states. (b) The future depends only on the current state. ⋆ (c) The future is independent of the current state. (d) All states are equally probable. 35 2. In a first-order Markov process: (a) Transition probabilities depend on all previous states. (b) Transition probabilities depend only on the current state. ⋆ (c) Transition probabilities are fixed over time. (d) Observations depend on future states. 3. The Sensor Markov Assumption states that: (a) Observations depend on past and future states. (b) Observations depend only on the current state. ⋆ (c) Observations are independent of the state. (d) Observations depend on all previous states. 4. What does the stationarity assumption imply? (a) Probabilities change over time. (b) Transition probabilities remain constant over time. ⋆ (c) Observations depend on future states. (d) States are uniformly distributed. 5. Which of the following is an inference task in temporal reasoning? (a) Sorting states. (b) Filtering. ⋆ (c) Labeling variables. (d) Calculating CPTs. 6. Prediction involves: (a) Estimating past states given observations. (b) Computing the probability of future states given the current state. ⋆ (c) Inferring the current state given past observations. (d) Determining the most likely sequence of states. 7. Filtering is the task of: (a) Estimating the current state given all past observations. ⋆ (b) Estimating future states. (c) Identifying the sequence of states. (d) Constructing a Bayesian Network. 36 8. Smoothing is used to: (a) Estimate current states. (b) Infer past states given observations up to the current time. ⋆ (c) Compute future states. (d) Determine the prior distribution. 9. Most likely explanation identifies: (a) The probability of future states. (b) Past states given no observations. (c) The sequence of states that best explains the observations. ⋆ (d) Transition probabilities. 10. Bayes Filter is used for: (a) Exact inference only. (b) Recursive belief state estimation. ⋆ (c) Sampling particles. (d) Creating Bayesian Networks. 11. In the predict step of a Bayes filter: (a) The next state is estimated based on the current state. ⋆ (b) Observations are incorporated into the belief. (c) Particles are resampled. (d) Past states are updated. 12. In the update step of a Bayes filter: (a) Future states are estimated. (b) Observations are used to refine the belief. ⋆ (c) Weights are assigned to particles. (d) Temporal dependencies are ignored. 13. The Forward-Backward algorithm is used for: (a) Prediction only. (b) Filtering only. (c) Smoothing. ⋆ (d) Sampling. 37 14. Hidden Markov Models (HMMs) are characterized by: (a) Observations being independent of states. (b) Deterministic transitions between states. (c) Hidden states and observations that depend on states. ⋆ (d) States being directly observable. 15. In an HMM, the transition model represents: (a) The probability of transitioning between hidden states. ⋆ (b) The probability of observations given states. (c) The joint distribution of states. (d) The prior probabilities of observations. 16. The sensor model in an HMM represents: (a) Transition probabilities. (b) The probability of observations given hidden states. ⋆ (c) The joint distribution of all variables. (d) The most likely sequence of states. 17. Kalman filters are suitable for: (a) Non-linear systems with uniform noise. (b) Systems without observations. (c) Linear systems with Gaussian noise. ⋆ (d) Systems with deterministic transitions. 18. Kalman filters involve: (a) Sampling-based estimation. (b) Prediction and update steps. ⋆ (c) Enumeration of all possible states. (d) Ignoring noise in observations. 19. Particle filters are used for: (a) Exact inference in large systems. (b) Linear systems with Gaussian noise. (c) Approximate inference in non-linear systems. ⋆ (d) Systems without state transitions. 38 20. In a particle filter, resampling: (a) Eliminates noisy observations. (b) Maintains diversity in particles. ⋆ (c) Avoids transitions between states. (d) Ignores observations. 21. Dynamic Bayesian Networks (DBNs) generalize: (a) Only Markov Models. (b) Reinforcement learning models. (c) Deterministic systems. (d) Hidden Markov Models and Kalman Filters. ⋆ 22. DBNs differ from HMMs by: (a) Eliminating hidden states. (b) Ignoring temporal dependencies. (c) Allowing multiple state variables and complex dependen- cies. ⋆ (d) Using non-stationary distributions. 23. Smoothing in HMMs is achieved using: (a) Kalman Filters. (b) Particle Sampling. (c) Transition Models only. (d) Forward-Backward Algorithm. ⋆ 24. Which of the following is an application of HMMs? (a) Weather prediction. (b) ICU Monitoring. (c) Particle Filtering. (d) Speech recognition. ⋆ 25. Kalman filters were famously used in: (a) Weather prediction systems. (b) Google Search Algorithms. (c) Apollo spacecraft navigation. ⋆ (d) Particle filtering in robotics. 39 6 Machine Learning and Neural Networks 6.1 Multiple Choice Questions: Machine Learning and Neural Networks 1. What is reinforcement learning (RL)? (a) A method to memorize data (b) A supervised learning technique (c) A paradigm where agents learn through interaction with the environment ⋆ (d) A method to optimize neural networks 2. What is the main goal of reinforcement learning? (a) Minimize the error rate (b) Maximize cumulative rewards ⋆ (c) Create a deterministic system (d) Predict future rewards 3. Which of the following is NOT an RL key concept? (a) Exploration (b) Supervision ⋆ (c) Exploitation (d) Regret 4. In RL, what does the agent learn from? (a) Fixed data (b) Rewards ⋆ (c) Explicit instructions (d) Labeled datasets 5. What does the term ‘exploration’ mean in RL? (a) Using known information to maximize rewards (b) Trying new actions to gather more information ⋆ (c) Randomly acting without strategy (d) Minimizing the chance of failure 40 6. What is a Markov Decision Process (MDP)? (a) A framework for supervised learning (b) A mathematical model for decision-making under uncer- tainty ⋆ (c) A technique for data optimization (d) A neural network model 7. Which of these is NOT part of an MDP? (a) States (S) (b) Actions (A) (c) Reward Function (R) (d) Cost Function (C) ⋆ 8. What does the reward function (R) do in an MDP? (a) Defines the transition probabilities (b) Maps rewards to actions (c) Provides immediate rewards for states ⋆ (d) Assigns a penalty to bad decisions 9. What is the purpose of the discount factor (γ)? (a) Penalize negative rewards (b) Balance immediate and future rewards ⋆ (c) Improve the transition probabilities (d) Ensure only recent actions matter 10. In a self-driving car example, what can be considered a state? (a) Stop, go, turn (b) The car’s position at a red light or green light ⋆ (c) The probability of reaching a goal (d) Distance traveled 11. What is the Bellman equation used for? (a) Predicting future states (b) Iteratively computing state values ⋆ (c) Defining policies (d) Optimizing rewards 41 12. What is the main step in value iteration? (a) Directly update policies (b) Use the Bellman equation to compute state utilities ⋆ (c) Approximate actions through linear regression (d) Collect experience in the environment 13. What is the key difference between value iteration and policy iteration? (a) Value iteration directly computes policies (b) Policy iteration alternates between evaluation and improve- ment ⋆ (c) Value iteration uses state-action pairs (d) Policy iteration requires labeled data 14. What does the policy evaluation step in policy iteration do? (a) Computes the utilities of states under the current policy ⋆ (b) Updates transition probabilities (c) Maximizes the Q-values of all states (d) Estimates future rewards 15. In RL, what is meant by ‘regret’ ? (a) Accumulating penalties for bad actions ⋆ (b) A situation where no rewards are received (c) Learning optimal policies without mistakes (d) Over-reliance on supervised learning 16. Which paradigm assumes a fixed policy? (a) Active RL (b) Passive RL ⋆ (c) Supervised RL (d) Unsupervised RL 17. What trade-off is essential in Active RL? (a) Utility vs. Reward (b) Exploration vs. Exploitation ⋆ (c) Convergence vs. Divergence (d) Policy vs. Value 42 18. What does temporal-difference (TD) learning combine? (a) Monte Carlo methods and supervised learning (b) Dynamic programming and Monte Carlo methods ⋆ (c) Gradient descent and value iteration (d) Policy iteration and direct utility estimation 19. What parameter in TD learning controls the rate of updates? (a) Discount factor (b) Learning rate ⋆ (c) Reward multiplier (d) Transition probability 20. What is updated in the TD update equation? (a) Policies (b) Utilities of states ⋆ (c) Transition functions (d) Reward functions 21. What is the key characteristic of model-free RL? (a) No explicit modeling of transition and reward functions ⋆ (b) Direct estimation of state utilities (c) Fixed policies (d) Direct optimization of Q-values 22. In Q-Learning, what is updated after every transition? (a) State utilities (b) State-action pair values ⋆ (c) Policy values (d) Transition probabilities 23. What is the Q-Learning update equation? (a) V (s) ← V (s) + α(r + γV (s′ ) − V (s)) (b) Q(s, a) ← Q(s, a) + α(r + γ maxa′ Q(s′ , a′ ) − Q(s, a)) ⋆ (c) V (s) = R(s) + γ maxa s′ P (s′ |s, a)V (s′ ) P (d) π(s) = argmaxa Q(s, a) 43 24. Why is Q-Learning considered off-policy? (a) It follows the optimal policy during training (b) It learns the optimal policy while following another ⋆ (c) It never learns policies (d) It requires a model of 25. What problem does feature-based representation solve? (a) Long-term reward estimation (b) High-dimensional state spaces ⋆ (c) Overfitting in state utilities (d) Policy approximation 26. How does ϵ-greedy strategy encourage exploration? (a) By choosing random actions with probability 1 − ϵ (b) By choosing random actions with probability ϵ ⋆ (c) By avoiding optimal actions (d) By maximizing rewards immediately 27. What is the limitation of basic Q-Learning? (a) Requires pre-defined policies (b) Fails in high-dimensional environments ⋆ (c) Cannot handle exploration (d) Overfits to training data 28. What does Deep Q-Learning replace the Q-table with? (a) A transition matrix (b) A neural network ⋆ (c) A feature vector (d) A Monte Carlo estimator 29. Which of these is an example of reinforcement learning? (a) Training a neural network on labeled images (b) Teaching a robot to play chess by letting it practice ⋆ (c) Clustering unlabeled data into groups (d) Using a decision tree for predictions 30. What is the purpose of generalizing across states? (a) Reduce memory usage ⋆ (b) Improve accuracy in supervised learning (c) Increase reward values (d) Avoid convergence 44 7 Introduction to Reinforcement Learning (RL) 7.1 Multiple-Choice Questions on Reinforcement Learn- ing 1. What is reinforcement learning (RL)? (a) A method to memorize data (b) A supervised learning technique (c) A paradigm where agents learn through interaction with the environment ⋆ (d) A method to optimize neural networks 2. What is the main goal of reinforcement learning? (a) Minimize the error rate (b) Maximize cumulative rewards ⋆ (c) Create a deterministic system (d) Predict future rewards 3. Which of the following is NOT an RL key concept? (a) Exploration (b) Supervision ⋆ (c) Exploitation (d) Regret 4. In RL, what does the agent learn from? (a) Fixed data (b) Rewards ⋆ (c) Explicit instructions (d) Labeled datasets 5. What does the term ‘exploration’ mean in RL? (a) Using known information to maximize rewards (b) Trying new actions to gather more information ⋆ (c) Randomly acting without strategy (d) Minimizing the chance of failure 45 6. What is a Markov Decision Process (MDP)? (a) A framework for supervised learning (b) A mathematical model for decision-making under uncer- tainty ⋆ (c) A technique for data optimization (d) A neural network model 7. Which of these is NOT part of an MDP? (a) States (S) (b) Actions (A) (c) Reward Function (R) (d) Cost Function (C) ⋆ 8. What does the reward function (R) do in an MDP? (a) Defines the transition probabilities (b) Maps rewards to actions (c) Provides immediate rewards for states ⋆ (d) Assigns a penalty to bad decisions 9. What is the purpose of the discount factor (γ)? (a) Penalize negative rewards (b) Balance immediate and future rewards ⋆ (c) Improve the transition probabilities (d) Ensure only recent actions matter 10. In a self-driving car example, what can be considered a state? (a) Stop, go, turn (b) The car’s position at a red light or green light ⋆ (c) The probability of reaching a goal (d) Distance traveled 11. What is the Bellman equation used for? (a) Predicting future states (b) Iteratively computing state values ⋆ (c) Defining policies (d) Optimizing rewards 46 12. What is the main step in value iteration? (a) Directly update policies (b) Use the Bellman equation to compute state utilities ⋆ (c) Approximate actions through linear regression (d) Collect experience in the environment 13. What is the key difference between value iteration and policy iteration? (a) Value iteration directly computes policies (b) Policy iteration alternates between evaluation and improve- ment ⋆ (c) Value iteration uses state-action pairs (d) Policy iteration requires labeled data 14. What does the policy evaluation step in policy iteration do? (a) Computes the utilities of states under the current policy ⋆ (b) Updates transition probabilities (c) Maximizes the Q-values of all states (d) Estimates future rewards 15. In RL, what is meant by ‘regret’ ? (a) Accumulating penalties for bad actions ⋆ (b) A situation where no rewards are received (c) Learning optimal policies without mistakes (d) Over-reliance on supervised learning 16. Which paradigm assumes a fixed policy? (a) Active RL (b) Passive RL ⋆ (c) Supervised RL (d) Unsupervised RL 17. What trade-off is essential in Active RL? (a) Utility vs. Reward (b) Exploration vs. Exploitation ⋆ (c) Convergence vs. Divergence (d) Policy vs. Value 47 18. What does temporal-difference (TD) learning combine? (a) Monte Carlo methods and supervised learning (b) Dynamic programming and Monte Carlo methods ⋆ (c) Gradient descent and value iteration (d) Policy iteration and direct utility estimation 19. What parameter in TD learning controls the rate of updates? (a) Discount factor (b) Learning rate ⋆ (c) Reward multiplier (d) Transition probability 20. What is updated in the TD update equation? (a) Policies (b) Utilities of states ⋆ (c) Transition functions (d) Reward functions 21. What is the key characteristic of model-free RL? (a) No explicit modeling of transition and reward functions ⋆ (b) Direct estimation of state utilities (c) Fixed policies (d) Direct optimization of Q-values 22. In Q-Learning, what is updated after every transition? (a) State utilities (b) State-action pair values ⋆ (c) Policy values (d) Transition probabilities 23. What is the Q-Learning update equation? (a) V (s) ← V (s) + α r + γV (s′ ) − V (s) (b) Q(s, a) ← Q(s, a) + α r + γ maxa′ Q(s′ , a′ ) − Q(s, a) ⋆ (c) V (s) = R(s) + γ maxa s′ P (s′ |s, a)V (s′ ) P (d) π(s) = argmaxa Q(s, a) 48 24. Why is Q-Learning considered off-policy? (a) It follows the optimal policy during training (b) It learns the optimal policy while following another ⋆ (c) It never learns policies (d) It requires a model of the environment 25. What problem does feature-based representation solve? (a) Long-term reward estimation (b) High-dimensional state spaces ⋆ (c) Overfitting in state utilities (d) Policy approximation 26. How does ϵ-greedy strategy encourage exploration? (a) By choosing random actions with probability 1 − ϵ (b) By choosing random actions with probability ϵ ⋆ (c) By avoiding optimal actions (d) By maximizing rewards immediately 27. What is the limitation of basic Q-Learning? (a) Requires pre-defined policies (b) Fails in high-dimensional environments ⋆ (c) Cannot handle exploration (d) Overfits to training data 28. What does Deep Q-Learning replace the Q-table with? (a) A transition matrix (b) A neural network ⋆ (c) A feature vector (d) A Monte Carlo estimator 29. Which of these is an example of reinforcement learning? (a) Training a neural network on labeled images (b) Teaching a robot to play chess by letting it practice ⋆ (c) Clustering unlabeled data into groups (d) Using a decision tree for predictions 30. What is the purpose of generalizing across states? (a) Reduce memory usage ⋆ (b) Improve accuracy in supervised learning (c) Increase reward values (d) Avoid convergence 49 8 Detailed Explanation of Reinforcement Learn- ing Topics 8.1 Multiple-Choice Questions on Reinforcement Learn- ing 1. What does an agent aim to maximize in reinforcement learning? (a) Errors (b) Loss (c) Rewards ⋆ (d) Complexity 2. What is the role of the environment in reinforcement learning? (a) Provide explicit instructions (b) Provide feedback to the agent through states and rewards ⋆ (c) Generate labeled data (d) Train neural networks 3. Which of the following is NOT a component of an MDP? (a) States (S) (b) Actions (A) (c) Discount factor (γ) (d) Activation function ⋆ 4. What does the discount factor (γ) control in reinforcement learning? (a) The learning rate (b) The importance of future rewards ⋆ (c) The exploration strategy (d) The number of actions 5. What is a Markov Decision Process (MDP)? (a) A mathematical framework for decision-making under un- certainty ⋆ (b) A supervised learning technique (c) A clustering algorithm (d) A data compression model 50 6. In RL, what does exploration mean? (a) Always selecting the highest reward action (b) Trying new actions to discover their effects ⋆ (c) Avoiding random actions (d) Minimizing regret 7. Which algorithm iteratively updates state values using the Bellman equa- tion? (a) Policy iteration (b) Gradient descent (c) Value iteration ⋆ (d) Linear regression 8. In policy iteration, what happens during the policy evaluation step? (a) State utilities are computed under the current policy ⋆ (b) The policy is randomly modified (c) Actions are added to the policy (d) Rewards are discounted 9. Which of the following is a model-free learning algorithm? (a) Linear programming (b) Q-Learning ⋆ (c) K-means clustering (d) Dynamic programming 10. What does the reward function R(s) represent? (a) The probability of a state (b) The immediate reward received after taking an action in a state ⋆ (c) The policy function (d) The optimal action 11. Which of these is a key advantage of temporal-difference (TD) learning? (a) Updates are made after every step, combining dynamic pro- gramming and Monte Carlo methods ⋆ (b) Requires a model of the environment (c) Needs labeled data (d) Directly predicts actions 51 12. In the Q-Learning update rule, what does maxa′ Q(s′ , a′ ) represent? (a) The minimum reward in the next state (b) The transition probability (c) The maximum future reward possible from the next state ⋆ (d) The learning rate 13. What does ϵ-greedy exploration do? (a) Chooses the best action with probability ϵ (b) Chooses a random action with probability 1 − ϵ (c) Balances random exploration and exploitation ⋆ (d) Selects only high-reward actions 14. What is the purpose of generalizing across states? (a) Avoid convergence (b) Handle large state spaces efficiently ⋆ (c) Increase rewards (d) Minimize learning rates 15. What is the primary function of a discount factor in RL? (a) To limit the agent’s decisions (b) To prioritize immediate rewards over distant ones ⋆ (c) To increase computational speed (d) To improve policy evaluation 16. What distinguishes Q-Learning as off-policy? (a) It only learns suboptimal policies (b) It learns the optimal policy while following another policy during exploration ⋆ (c) It requires a complete model of the environment (d) It updates rewards instead of values 17. Which algorithm uses a neural network to approximate Q-values? (a) Linear regression (b) Policy iteration (c) Deep Q-Learning (DQN) ⋆ (d) Supervised learning 52 18. What does the Bellman equation represent in reinforcement learning? (a) The gradient of loss functions (b) A recursive relationship for computing state utilities ⋆ (c) A method for feature extraction (d) A supervised learning objective 19. What is a key feature of passive RL? (a) No rewards are used (b) Exploration strategies are applied (c) A fixed policy is followed ⋆ (d) States are explored randomly 20. What is the role of a transition model P (s′ |s, a)? (a) Defines the probability of reaching a new state after an ac- tion ⋆ (b) Represents rewards for actions (c) Predicts Q-values directly (d) Selects actions for the policy 21. Which of these is an example of RL exploration? (a) Always choosing the highest reward action (b) Trying an action with uncertain outcomes⋆ (c) Minimizing reward penalties (d) Avoiding states with high values 22. What is a limitation of basic Q-Learning? (a) It cannot solve small state spaces (b) It is not compatible with rewards (c) It struggles with high-dimensional state spaces⋆ (d) It relies entirely on transition models 23. How does Deep Q-Learning differ from Q-Learning? (a) It does not require rewards (b) It avoids using policies (c) It uses a neural network to approximate Q-values⋆ (d) It minimizes the importance of future rewards 53 24. What does optimistic initialization encourage? (a) Exploration of unknown actions⋆ (b) Faster exploitation (c) Random transitions (d) Discounting future rewards 25. Which type of learning combines dynamic programming and Monte Carlo methods? (a) Supervised learning (b) Deep learning (c) Temporal-difference (TD) learning⋆ (d) Policy iteration 26. In reinforcement learning, what does the agent learn from? (a) Explicit instructions (b) Rewards provided by the environment⋆ (c) Predefined rules (d) Supervised feedback 27. What is the purpose of the policy improvement step in policy iteration? (a) Approximate state values (b) Update the policy to maximize utilities⋆ (c) Randomly modify the policy (d) Minimize the discount factor 28. Which of these is a key drawback of TD learning? (a) It requires a model of the environment (b) It may require many updates to converge⋆ (c) It ignores future rewards (d) It cannot handle high-dimensional states 29. What is the relationship between exploration and exploitation? (a) They always conflict (b) A balance is needed to optimize long-term rewards⋆ (c) Exploitation prevents regret (d) Exploration always yields higher rewards 54 30. In the Bellman equation, what does the term R(s) represent? (a) The total reward from the start (b) The immediate reward received at state s⋆ (c) The learning rate (d) The maximum reward possible 31. Which of the following is true for model-based RL? (a) It avoids learning transition probabilities (b) It explicitly learns the transition and reward models⋆ (c) It does not handle stochastic environments (d) It directly optimizes Q-values 32. What is a primary advantage of ϵ-greedy exploration? (a) It always maximizes rewards (b) It requires no tuning (c) It balances random exploration and exploitation⋆ (d) It only explores at the start 33. What is the primary difference between active and passive RL? (a) Active RL learns the optimal policy, while passive RL fol- lows a fixed policy⋆ (b) Passive RL uses supervised learning (c) Active RL ignores rewards (d) Passive RL involves deep learning 34. What does the term “off-policy” mean in Q-Learning? (a) The policy is fixed (b) The learning algorithm updates without exploration (c) The agent learns the optimal policy while exploring with a different policy⋆ (d) The learning rate is ignored 35. Why is a neural network used in Deep Q-Learning? (a) To provide rewards (b) To approximate Q-values in large state spaces⋆ (c) To implement transition probabilities (d) To optimize rewards directly 55 36. In reinforcement learning, what does the term “regret” refer to? (a) The cost of choosing suboptimal actions⋆ (b) Ignoring future rewards (c) Overfitting to training data (d) Penalizing all actions 37. What is the main purpose of feature-based representation in RL? (a) To increase the size of the Q-table (b) To generalize across large state-action spaces⋆ (c) To eliminate rewards (d) To reduce the need for exploration 38. How does the agent interact with the environment in RL? (a) By observing rewards only (b) By taking actions and receiving feedback in the form of re- wards⋆ (c) By minimizing the Bellman equation (d) By predefining the transition model 39. Which of the following is an application of reinforcement learning? (a) Regression modeling (b) Training robots to perform tasks⋆ (c) Image classification (d) Clustering data 40. What is the role of the learning rate α in Q-Learning? (a) To control the discount factor (b) To determine the step size for updating Q-values⋆ (c) To maximize immediate rewards (d) To avoid exploration 41. Which of the following is an example of exploration in RL? (a) Always exploiting known rewards (b) Trying a new action with unknown outcomes⋆ (c) Following a fixed policy (d) Reducing the learning rate 42. What is the primary advantage of temporal-difference (TD) learning? 56 (a) It avoids random exploration (b) It updates values incrementally without needing the entire trajectory⋆ (c) It eliminates the need for rewards (d) It reduces convergence time significantly 43. What does the term “policy” mean in reinforcement learning? (a) The set of states available to the agent (b) The strategy that defines the agent’s actions at each state⋆ (c) The reward function (d) The utility of a state 44. Why is reinforcement learning suitable for gaming AI? (a) It relies on supervised data (b) It can learn optimal strategies through trial and error⋆ (c) It requires minimal computation (d) It avoids future rewards 45. In model-free RL, what is directly estimated? (a) Transition probabilities (b) State-action values (Q-values)⋆ (c) Neural network weights (d) The optimal policy directly 46. Which of these challenges is addressed by Deep Q-Learning? (a) The absence of reward functions (b) Handling large and complex state spaces⋆ (c) Improving policy iteration (d) Reducing computational complexity 47. What is the purpose of the Bellman equation in MDPs? (a) To minimize regret (b) To evaluate exploration strategies (c) To compute the utility of states recursively⋆ (d) To predict future actions 57 48. In RL, what does the term ”trial and error” imply? (a) The agent learns by attempting actions and receiving feed- back⋆ (b) The agent avoids suboptimal actions (c) The agent predicts rewards perfectly (d) The agent follows a supervised approach 49. What does the term ”Q” in Q-Learning represent? (a) The transition model (b) The quality of a state-action pair⋆ (c) The reward function (d) The learning rate 50. Why is ϵ-greedy strategy effective? (a) It balances random exploration with exploitation of known rewards⋆ (b) It avoids suboptimal actions (c) It guarantees immediate convergence (d) It reduces the need for discount factors 58