Class Test 2 (CS2103) Question.pdf

Full Transcript

Class Test (CS2103) | Max-Marks: 25 | Time-Limit: 45 mins Date: 17-09-2024 Roll:___________________________ Name:___________________________________ Answer the questions (Q1 to Q4) based on the grid world. Consider the stochastic (4 x 3) Grid...

Class Test (CS2103) | Max-Marks: 25 | Time-Limit: 45 mins Date: 17-09-2024 Roll:___________________________ Name:___________________________________ Answer the questions (Q1 to Q4) based on the grid world. Consider the stochastic (4 x 3) Grid world where the “intended” outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. Transitions into any non-terminal state have a reward of -0.04. Assume that the rewards are undiscounted. Given the current estimate of utilities for states as follows: (round numerical answers to 6 decimals) [Q1] (NUM) (2 marks) Calculate the probability of the agent being in state (2, 1) after taking the Up action two times starting at state (1, 1). [ANS]: ______________________ [Q2] (NUM) (4 marks) Calculate the utility for states (1, 1) and (3, 2). [ANS]: 𝑈(1, 1) = __________________ 𝑈(3, 2) = _________________ [Q3] (NUM) (2 marks) Based on the new utility calculated for states (1, 1) and (3, 2) in the previous question, find out the optimal actions for those states i.e., the optimal actions to take when the agent is in that state. The action must be one of (Up, Down, Right, Left). [ANS]: π (1, 1) = __________________ π (3, 2) = __________________ [Q4] (MCQ) (2 marks) We assumed the reward 𝑅(𝑠) = -0.04. What should be the value of 𝑅(𝑠) such that an optimally acting agent will never reach a terminal state? [A] 𝑅(𝑠) < 0 [B] 𝑅(𝑠) = 0 [C] 𝑅(𝑠) > 0 [D] optimal agent will always reach a terminal state [Q5] (MCQ) (2 marks) Assume that an MDP has an initial state S*. According to the Markovian Property, the probability of reaching a state S’ from a state S should depend: [A] on State S only [B] on State S’ only [C] on State S* only [D] on State S* and S [Q6] (MCQ) (2 marks) Fill in the blanks: An optimal policy is the one which yields highest _____. In a/an _____ horizon MDP, the optimal policy is independent of the starting state if the rewards are ______. [A] reward, infinite, discounted [B] expected utility, finite, undiscounted [C] reward, finite, undiscounted [D] expected utility, infinite, discounted [Q7] (MCQ) (3 marks) Select the correct bellman equation for Utility [A] 𝑈(𝑠) = max [𝑅(𝑠, 𝑎) + γ ∑ 𝑃(𝑠'|𝑎, 𝑠)𝑈(𝑠')] 𝑎 𝑠' [B] 𝑈(𝑠) = max ∑ 𝑃(𝑠'|𝑎, 𝑠)[𝑅(𝑠, 𝑎, 𝑠') + γ𝑈(𝑠')] 𝑎 𝑠' [C] 𝑈(𝑠) = 𝑅(𝑠) + γ max ∑ 𝑃(𝑠'|𝑎, 𝑠)𝑈(𝑠') 𝑎 𝑠' [D] All of the above are correct [Q8] {MSQ} (4 marks) Use the provided truth table to select correct logical entailments. p q r p → q∨r p→r q→r 1 1 1 1 1 1 [A] {p → q∨r} ⊨ (p→r) 1 1 0 1 0 0 [B] {p → r} ⊨ (p → q∨r) 1 0 1 1 1 1 [C] {q → r} ⊨ (p → q∨r) 1 0 0 0 0 1 0 1 1 1 1 1 [D] {p → q∨r, q → r} ⊨ (p→r) 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 [Q9] (MCQ) (2 marks) An inference algorithm that derives only entailed sentences is called _____ and if it can derive any sentence that is entailed, it is called _____. Hence, if KB is true in the real world, then any sentence α derived from KB by a _____ inference procedure is also true in the real world. [A] sound, complete, sound [B] complete, truth-preserving, sound [C] truth-preserving, sound, complete [D] complete, sound, truth-preserving [Q10] {MSQ} (2 marks) Let Γ and Δ be sets of sentences in Propositional Logic, and let φ and ψ be individual sentences in Propositional Logic. Select the correct statements from the following: [A] If Γ ⊨ φ and Δ ⊨ φ, then Γ ∩ Δ ⊨ φ [B] If Γ ⊨ φ and Δ ⊨ φ, then Γ ∪ Δ ⊨ φ [C] If Γ ⊨ φ and Δ ⊭ φ, then Γ ∪ Δ ⊨ φ [D] If Γ ⊭ ψ, then Γ ⊨ ¬ψ

Use Quizgecko on...
Browser
Browser