Full Transcript

Artificial Intelligence (AI) is a branch of computer science that aims to create systems capable of performing tasks that typically require human intelligence. These tasks include understanding natural language, recognizing patterns, learning from experience, reasoning, and making decisions. AI syst...

Artificial Intelligence (AI) is a branch of computer science that aims to create systems capable of performing tasks that typically require human intelligence. These tasks include understanding natural language, recognizing patterns, learning from experience, reasoning, and making decisions. AI systems are designed to mimic cognitive functions associated with human minds, such as learning from data, adapting to new situations, and solving problems. They rely on algorithms and computational models to process large amounts of data, identify patterns, and make predictions or decisions based on that information. There are various approaches to AI, including symbolic AI, which involves programming computers to manipulate symbols based on predefined rules, and machine learning, which enables systems to learn from data without being explicitly programmed. Deep learning, a subset of machine learning, utilizes artificial neural networks with many layers (hence "deep") to analyse` complex data and extract meaningful insights. Ethical Problems Bias and Discrimination: AI systems can perpetuate or even exacerbate biases present in their training data, leading to unfair treatment of certain groups. Privacy Violations: AI technologies can intrude on individuals’ privacy by collecting, storing, and analyzing vast amounts of personal data without consent. Lack of Accountability: It can be unclear who is responsible for the actions of AI systems, especially when they cause harm or operate in unexpected ways. Dehumanization: Over-reliance on AI can lead to dehumanization in various sectors, such as customer service and caregiving, by replacing meaningful human contact. Approach to Limit Impact: Implementing rigorous ethical guidelines, promoting transparency in AI algorithms and data usage, and ensuring that AI systems are regularly audited for bias and compliance with privacy laws. Legal Problems Intellectual Property Issues: Determining the ownership of AI-generated content or inventions can be complex. Liability for Harm: Legal challenges arise in assigning liability when AI systems cause damage or harm, whether physically, financially, or emotionally. Compliance with International Laws: AI systems operating across borders must navigate varying international regulations, which can be problematic. Approach to Limit Impact: Crafting clear legal frameworks that define the rights and responsibilities associated with AI outputs, and establishing international agreements to manage cross-border AI interactions. Social Problems Job Displacement: AI can lead to the displacement of workers, especially in sectors like manufacturing and administrative jobs, potentially increasing unemployment rates. Erosion of Human Skills: Overdependence on AI can lead to a decline in certain human skills, particularly those related to decision-making and problem-solving. Social Manipulation: AI can be used to manipulate social and political scenarios, such as through deepfakes or algorithmically curated content that can influence public opinions and elections. Approach to Limit Impact: Developing policies that support workforce transition through retraining programs, maintaining balances in human-AI roles to preserve essential skills, and regulating the use of AI in sensitive areas such as media and political campaigns. Agents and Environments: In the context of artificial intelligence, an agent is anything that can perceive its environment through sensors and act upon that environment through actuators. An environment is everything outside the agent that can be sensed and affected by the agent's actions. Agents and environments interact continuously, with the agent receiving input from the environment through sensors and producing output through actuators. Rationality: Rationality in AI refers to the ability of an agent to select actions that maximize its expected performance measure, given its knowledge and beliefs about the world. A rational agent is one that makes decisions that are expected to lead to the best outcome, based on its understanding of the environment and its goals. PEAS: PEAS stands for Performance measure, Environment, Actuators, and Sensors. It is a framework used to define the design specifications of an intelligent agent. Performance measure: This specifies what the agent is trying to achieve or optimize. It defines the criteria for success. Environment: This describes the context in which the agent operates, including the objects, entities, and conditions that the agent can sense and affect. Actuators: These are the mechanisms through which the agent can take actions or manipulate the environment. Sensors: These are the mechanisms through which the agent can perceive or gather information about the environment. Environment Types: Environments in AI can be categorized into different types based on their characteristics: Fully observable vs. partially observable: In a fully observable environment, the agent's sensors provide complete information about the state of the environment. In a partially observable environment, some aspects of the environment may be hidden from the agent's sensors. Deterministic vs. stochastic: In a deterministic environment, the next state of the environment is completely determined by the current state and the actions taken by the agent. In a stochastic environment, there is some randomness or uncertainty in the transition between states. Episodic vs. sequential: In an episodic environment, the agent's actions only affect the immediate reward or outcome, and each episode is independent of previous episodes. In a sequential environment, the agent's actions have long-term consequences, and the current state depends on previous states and actions. Static vs. dynamic: In a static environment, the environment does not change while the agent is deliberating. In a dynamic environment, the environment can change while the agent is acting. Agent Types: Agents can also be classified into different types based on their design and behavior: Simple reflex agents: These agents select actions based solely on the current percept (input) and predefined rules or mappings from percepts to actions. Model-based reflex agents: These agents maintain an internal model of the world and use it to plan and reason about actions. Goal-based agents: These agents have explicit goals or objectives and take actions that are expected to achieve those goals. Utility-based agents: These agents evaluate actions based on their expected utility or value and select actions that maximize expected utility. Learning agents: These agents improve their performance over time by learning from experience and adapting their behavior based on feedback from the environment. Simple Reflex Agents: Architecture: Simple reflex agents operate based on a set of condition-action rules without maintaining any internal state representation. Decision-making Process: They select actions solely based on the current percept from the environment and the rules provided. Learning Mechanisms: Typically, these agents do not learn but rely on pre-defined rules provided by programmers or designers. Adaptation Abilities: Limited adaptability as they can't learn from experience or dynamically adjust their behavior. Reflex Agents with State: Architecture: These agents maintain an internal state representation of the world to make more informed decisions. Decision-making Process: They consider both the current percept and the internal state when selecting actions, often using state-action rules or condition-action rules. Learning Mechanisms: While they may not necessarily learn, they can adapt by updating their internal state representation based on new information. Adaptation Abilities: They can adapt to changes in the environment by updating their internal models and adjusting their behavior accordingly. Model-Based Reflex Agent: Architecture: These agents maintain an internal state representation of the world and use it to make decisions. Decision-making Process: They select actions based on both the current percept and the internal state representation, enabling more sophisticated decision-making. Learning Mechanisms: They may incorporate learning algorithms to update their internal state representation based on experience. Adaptation Abilities: They can adapt to changes in the environment by updating their internal models and adjusting their behaviour accordingly. Goal-Based Agents: Architecture: Goal-based agents maintain internal goals or objectives and take actions to achieve those goals. Decision-making Process: They consider the current state of the environment, their internal goals, and a strategy for achieving those goals. Learning Mechanisms: These agents may incorporate learning algorithms to refine their goal representation or their strategies for achieving those goals. Adaptation Abilities: Highly adaptable as they can dynamically adjust their goals and strategies based on changing circumstances. Utility-Based Agents: Architecture: Utility-based agents evaluate actions based on their utility or desirability and select the action with the highest expected utility. Decision-making Process: They consider not only goals but also the potential consequences of their actions, weighing the trade-offs between different outcomes. Learning Mechanisms: They may incorporate learning algorithms to estimate the utilities of different actions based on past experiences or expert knowledge. Adaptation Abilities: Highly adaptable as they can dynamically adjust their behavior based on changes in the environment or in their utility functions. Learning Agent: Architecture: Learning agents interact with an environment, selecting actions to maximize cumulative rewards over time. Decision-making Process: They learn optimal strategies through trial and error, receiving feedback in the form of rewards or penalties from the environment. Learning Mechanisms: They employ learning algorithms such as Q-learning or deep Q-networks to update their policies based on rewards received. Adaptation Abilities: Highly adaptable as they continuously learn and update their strategies based on new experiences. Tree Search algorithm A strategy is defined by picking the order of node expansion Strategies are evaluated along the following dimensions: completeness—does it always find a solution if one exists? time complexity—number of nodes generated/expanded space complexity—maximum number of nodes in memory optimality—does it always find a least-cost solution? Time and space complexity are measured in terms of b—maximum branching factor of the search tree d—depth of the least-cost solution m—maximum depth of the state space (may be ∞) Uninformed strategies use only the information available in the problem definition without any additional Breadth-first search Breadth-first search is a tree traversal algorithm that explores nodes level by level. Using a queue to store frontier nodes supports the behaviour of this search. Depth-first search is another tree traversal algorithm that goes deep into a tree exploring for nodes branch by branch. Description: BFS explores all neighbor nodes at the present depth level before moving to the nodes at the next level. It systematically expands outward from the initial state until it finds the goal state. Strategy: BFS uses a FIFO (First-In-First-Out) queue to store the frontier nodes. Advantages: Guarantees the shortest path to the goal in terms of number of edges. Disadvantages: Requires significant memory space to store the frontier nodes, especially for large search spaces. Uniform-cost search Uniform Cost Search (UCS) is a type of uninformed search that performs a search based on the lowest path cost. UCS helps us find the path from the starting node to the goal node with the minimum path cost. Description: UCS expands the node with the lowest path cost (i.e., cumulative cost from the initial state) rather than just the shallowest node. Strategy: UCS uses a priority queue ordered by the cumulative path cost to store the frontier nodes. Advantages: Finds the optimal solution in terms of the least total cost. Disadvantages: May expand many nodes with low costs before finding the optimal solution, leading to high time complexity. Depth-first search Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph. Description: DFS explores as far as possible along each branch before backtracking. Strategy: DFS uses a LIFO (Last-In-First-Out) stack to store the frontier nodes. Advantages: Requires less memory compared to BFS as it only needs to store nodes along the current path. Disadvantages: Not guaranteed to find the shortest path, can get stuck in infinite loops if the search space contains cycles. Depth-limited search DLS begins at the base node and recursively examines every node at its current level before heading to the next level. If the goal state is not found at the current depth level, the algorithm moves to the next level until the depth limit is reached or the goal state is found. Description: DLS is a variant of DFS that limits the maximum depth of exploration. Strategy: DLS terminates the search when it reaches a specified depth limit, preventing infinite loops. Advantages: Prevents DFS from getting stuck in infinite loops in cyclic search spaces. Disadvantages: May miss the solution if the depth limit is too shallow. Iterative deepening search In computer science, iterative deepening search or more specifically iterative deepening depth-first search is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with increasing depth limits until the goal is found. Description: IDS is a combination of DFS and DLS. It repeatedly performs DFS with increasing depth limits until the goal state is found. Strategy: IDS systematically increases the depth limit with each iteration. Advantages: Retains the memory efficiency of DFS while guaranteeing the completeness of BFS. Disadvantages: Redundant exploration of nodes at lower levels can increase time complexity, especially in large search spaces. Best-first search Greedy search – looks for the fastest/slowest growth/descent towards the goal node before the expansion Description: Greedy search is a heuristic-based search algorithm that selects the node that appears to be the most promising at each step. It prioritizes nodes that are closest to the goal state according to a heuristic function, without considering the entire path or the cost incurred to reach that node. Strategy: At each step, greedy search expands the node that is estimated to be the closest to the goal state based on a heuristic evaluation function. Heuristic Function: Greedy search relies heavily on the heuristic function, which provides an estimate of the cost or distance from a given state to the goal state. This heuristic guides the search by biasing it towards nodes that appear to be more promising based on the heuristic evaluation. Advantages: Greedy search is often memory-efficient and can be computationally fast, especially when the search space is relatively small. Disadvantages: Greedy search does not guarantee optimality or completeness. It may get stuck in local optima or fail to find the optimal solution if the heuristic function is not admissible or if it leads the search down the wrong path. Completeness: Greedy search is not guaranteed to find a solution, especially if it encounters a dead end or if the heuristic function leads it astray. Optimality: Greedy search is not guaranteed to find the optimal solution, as it makes decisions based solely on local information without considering the entire search space or the cost of reaching the goal. A∗ search – uses heuristic function which calculates an estimation of the cost for expansion at each node of the search space Description: A* search is an informed search algorithm that efficiently finds the shortest path from a start state to a goal state in a weighted graph or tree. It intelligently combines information about the cost to reach a node from the start state (known as g-value) and an estimate of the cost to reach the goal state from that node (known as h-value). Strategy: A* search selects the node with the lowest combination of the cost-so-far (g-value) and the estimated cost-to-go (h-value) at each step. It prioritizes nodes that have a lower total estimated cost (f-value = g-value + h-value), aiming to minimize the total cost to reach the goal state. Heuristic Function: A* search relies on a heuristic function (h-value) that provides an estimate of the cost or distance from a given state to the goal state. This heuristic guides the search by biasing it towards nodes that appear to be more promising based on the estimated total cost. Advantages: A* search is both complete (guaranteed to find a solution if one exists) and optimal (guaranteed to find the optimal solution with the minimum total cost). It combines the efficiency of greedy search with the optimality of uniform-cost search, making it highly effective in finding optimal solutions in many problem domains. Disadvantages: A* search may encounter performance issues or become inefficient if the heuristic function is not admissible (underestimates the true cost) or not consistent (satisfies the triangle inequality). It may require significant memory and computation resources, especially in large search spaces with complex heuristic functions. Completeness: A* search is complete, meaning it will always find a solution if one exists. Optimality: A* search is optimal, meaning it will always find the optimal solution with the minimum total cost if the heuristic function is admissible (never overestimates the true cost) and consistent (satisfies the triangle inequality). Admissible heuristic Role of Heuristic Function: a. Purpose: A heuristic function estimates the cost or utility of reaching a goal state from a given state in a search problem. It guides the search process by providing a measure of "closeness" to the goal’s. Effect on Search: Heuristic functions influence the search process by biasing it towards states that appear more promising based on the heuristic evaluation. This can significantly reduce the search space and improve efficiencies. Use in Different Search Methods: Heuristic functions are commonly used in informed search algorithms such as A* search, where they guide the search towards the most promising paths based on the estimated cost to reach the goal. A heuristic function, also simply called a heuristic, is a function that ranks alternatives in search algorithms at each branching step based on available information to decide which branch to follow. For example, it may approximate the exact solution. The objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for solving the problem at hand. This solution may not be the best of all the solutions to this problem, or it may simply approximate the exact solution. But it is still valuable because finding it does not require a prohibitively long time. Heuristics may produce results by themselves, or they may be used in conjunction with optimization algorithms to improve their efficiency Relaxed Problem Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution Key point: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem This could be an useful advice how to formulate suitable heuristics Week 6a and 6b Action Planning as AI problem Planning in AI is the problem of finding a sequence of primitive actions to achieve some goal. The sequence of actions is the system's plan which then can be executed. Action planning is needed in many AI systems Autonomous agents which demonstrate intelligent behaviour (robots, drones, driverless vehicles, etc.) AI programs which automate rational thinking (theorem provers, expert systems, schedulers, NLP systems, etc.) Hybrid systems which extend the human ability to operate in hostile environment (surgical devices, underwater video cameras, etc.) Planning research has been central to AI from the beginning, partly because of practical interest but also because of the “intelligence” features of human planners. Large logistics problems, operational planning, robotics, scheduling etc. A number of international Conferences on Planning Bi-annual Planning competition Planning Domain Definition Language (PDDL) PDDL is a human-readable format for problems in automated planning that gives a description of the possible states of the world, a description of the set of possible actions, a specific initial state of the world, and a specific set of desired goals. The Planning Domain Definition Language (PDDL) is a subset of First-Order Predicate Logic (FOL) which is more expressive than propositional logic. It allows for more detailed representation of the states. The solution of the planning problem (the plan) is a sequence of actions leading to the goal state from an initial state PDDL is derived from the planning language of Stanford Research Institute Problem Solver (STRIPS). Initial and goal states s, described using predicates over state variables. A set of Actions(s) described in terms of their preconditions and effects on the states. Closed world assumption principle (CWA): Unmentioned state variables are assumed false. Action: Fly(from, to) Precondition: At(p, from), Plane(p), Airport(from), Airport(to) Effect: ¬At(p, from), At(p, to) Components of a PDDL planning task: Objects: Things in the world that interest us. Predicates: Properties of objects that we are interested in; can be true or false. Initial state: The state of the world that we start in. Goal specification: Things that we want to be true. Actions/Operators: Ways of changing the state of the world. PDDL / STRIPS operators or OPERATOR / Action Representation In the context of PDDL (Planning Domain Definition Language), "operators" generally refer to actions and their associated preconditions and effects. These operators describe the possible actions that can be taken within a planning problem. "Action" refers to a basic unit of change or transformation that can occur in a planning problem. Actions represent the ways in which the state of the world can be altered during the execution of a plan. Actions in PDDL are fundamental to defining the structure of planning problems and specifying how they can be solved through automated planning techniques. They provide a formal way to describe the actions available to an agent and the conditions under which they can be executed to achieve a desired goal. Operators contain three components: Action description Precondition - conjunction of positive literals Effect - conjunction of positive or negative literals which describe how situation changes when operator is applied Restricted language for describing the states using unary and binary predicates A restricted language for describing states using unary and binary predicates typically involves defining a set of predicates along with their arguments to represent properties of objects or relationships between them. Unary predicates have one argument, while binary predicates have two arguments. Example: Unary Predicates: Unary predicates represent properties or attributes of individual objects in the domain.Examples: At(x): Object x is located at a specific position. Robot(x): Object x is a robot. BatteryLow(x): Object x has a low battery. Binary Predicates: Binary predicates represent relationships or connections between pairs of objects in the domain.Examples: Adjacent(x, y): Object x is adjacent to object y. Carries(x, y): Object x carries object y. Connected(x, y): Object x is connected to object y. Restricted language ⇒ efficient algorithm When working with a restricted language for describing states using unary and binary predicates, an efficient algorithm for planning or reasoning about these states can use the simplicity and structure of the language. To design such Algorithm, keep in mind this consideration: Exploit Predicate Structure, Symbolic Reasoning, Constraint Propagation technique, Incremental Updates and Heuristic Search Static planning Forward search in state space (progression) Starts the search for actions from the current state and plans subsequent actions towards the goal state Considers all actions that are applicable in a given state (too much unnecessary searches) not efficient! Forward chaining follows a down-up strategy, going from bottom to top. It uses known facts to start from the initial state (facts) and works toward the goal state, or conclusion. The forward chaining method is also known as data-driven because we achieve our objective by employing available data. The forward chaining method is widely used in expert systems such as CLIPS, business rule systems and manufacturing rule systems. It uses a breadth-first search as it has to go through all the facts first. It can be used to draw multiple conclusions. Heuristics for forward state-space search For forward state-space search there are a number of domain-independent heuristics: 1. Relaxing actions: Ignore-preconditions heuristic Ignore-delete-consequence heuristic 2. Using state abstractions: Reduce the state space by considering only more general states (taxonomic inheritance) rather than all specific states (purely relational) Programs that have won the bi-annual Planning competition has often used fast forward search with heuristics (FF), or planning graphs (GraphPlan), or constraint-satisfaction planners (SAT). Backward search in state space (regression) Starts the search for actions from the goal state and plans the previous actions from the initial state Considers only actions that are relevant the actions and the states can be indexed to lower the complexity of the search for action in each state by filtering the alternative continuations Backward chaining uses an up-down strategy going from top to bottom. The modus ponens inference rule is used as the basis for the backward chaining process. This rule states that if both the conditional statement (p->q) and the antecedent (p) are true, then we can infer the subsequent (q). In backward chaining, the goal is broken into sub-goals to prove the facts are true. It is called a goal-driven approach, as a list of goals decides which rules are selected and used. The backward chaining algorithm is used in game theory, automated theorem-proving tools, inference engines, proof assistants and various AI applications. The backward-chaining method mostly used a depth-first search strategy for proof. Heuristics for backward state-space search In backward state-space search, the goal is to find a sequence of actions or transitions that lead from a goal state back to an initial state. Heuristic search algorithms aim to efficiently explore the state space by guiding the search towards promising regions based on heuristic estimates of the remaining cost to reach the initial state. Heuristics commonly used in backward state-space search: Inverse Actions, Goal Distance Heuristic, Relaxed Planning Graph Heuristic, Abstraction Heuristic, Pattern Database Heuristic, Symbolic Heuristic. Planning graphs and incremental progression The main disadvantage of state-space search is the size of the search tree (exponential). Also, the heuristics are not admissible in general (difficult to find suitable). The planning graph is organized in alternating levels of possible states Si and applicable actions Ai. Links between levels represent preconditions and effects whereas links within the levels express conflicts (mutex-links). Problems of static planning Linear planning and Sussman anomaly Planners in the early 1970s considered the plan to include totally ordered action sequences problems were decomposed in subgoals the resulting subplans were combined together in some order But linear planning is incomplete! there are some very simple problems it cannot handle due to interaction between goals (i.e., Sussman anomaly) a complete planner must be able to interleave subplans and thus, to modify the subgoals when they are combined The Sussman anomaly is a problem in artificial intelligence, first described by Gerald Sussman, that illustrates a weakness of noninterleaved planning algorithms, which were prominent in the early 1970s. Most modern planning systems are not restricted to noninterleaved planning and thus can handle this anomaly. While the significance/value of the problem is now a historical one, it is still useful for explaining why planning is non-trivial. In the problem, three blocks (labeled A, B, and C) rest on a table. The agent must stack the blocks such that A is atop B, which in turn is atop C. However, it may only move one block at a time. The problem starts with B on the table, C atop A, and A on the table: However, noninterleaved planners typically separate the goal (stack A atop B atop C) into subgoals, such as: get A atop B get B atop C Suppose the planner starts by pursuing Goal 1. The straightforward solution is to move C out of the way, then move A atop B. But while this sequence accomplishes Goal 1, the agent cannot now pursue Goal 2 without undoing Goal 1, since both A and B must be moved atop C: If instead the planner starts with Goal 2, the most efficient solution is to move B. But again, the planner cannot pursue Goal 1 without undoing Goal 2: Non-linear methods and partial planning Hierarchical planning the planning is divided into phases based on the taxonomic classification of actions First the more abstract actions are planned The specific implementation of the abstract actions are planned subsequently Partial-order planning, state-of-the-art during the 1980s and 90s used for specific tasks, such as operations scheduling also used when it is important for humans to understand the plans - e.g., operational plans for spacecraft and Mars rovers are checked by human operators before uploaded to the vehicles Planning and Action in Dynamic World The real world is changing (planning in real-time) Contingent Planning: Contingent planning involves generating a plan in advance by accounting for all possible contingencies or future states of the world. This approach requires considering a wide range of possible scenarios and generating a plan that can handle each of them effectively. While contingent planning can be comprehensive, it may lead to overly complex plans and may not be suitable for highly dynamic environments where the future states are uncertain. Conditional Planning: Conditional planning allows the planner to check the actual state of the world in real-time and then generate a plan based on that current state. This approach enables more flexibility as plans can be tailored to the specific circumstances of the current environment. However, conditional planning still requires some level of preplanning and may not handle all possible contingencies as comprehensively as contingent planning. Algorithms: backward chaining with additional rule which guarantees that every premise is satisfied AND–OR tree search where all perceptions form AND branches (very similar to backward chaining algorithm) Continuous Planning: Continuous planning involves a cycle of execution monitoring and replanning in real-time. The system continuously monitors the execution of the current plan and the state of the world. If unexpected changes occur or if the current plan becomes invalid, the system dynamically replans to generate a new plan that addresses the current situation. This approach is well-suited for highly dynamic environments where changes occur frequently and unpredictably. Continuous planning requires efficient algorithms for real-time replanning and may involve trade-offs between plan optimality and computational complexity. What Can go Wrong Incomplete Information: Unknown preconditions, Disjunctive effects. Incorrect information : Current state incorrect, Missing/incorrect postconditions in operators. Qualification problem: can never finish listing all the required preconditions and possible conditional outcomes of actions Solutions: Contingent or sensorless planning: Devise a plan that works regardless of state or outcome, considering all eventualities Conditional planning: Plan to obtain information (observation actions), Devise subplan for each contingency, combine the sublplans - e.g., Continuous planning/Replanning: Assume normal states and outcomes, Check the progress during execution, replan if necessary Execution Monitoring and Replanning: Current plan failure when preconditions of remaining plan not met Preconditions of remaining plan = all preconditions of remaining steps not achieved by remaining steps = all causal links crossing current time point makes it impossible to continue the plan On failure, replan to achieve open conditions from current state IPEM (Integrated Planning, Execution, and Monitoring): keep updating Start to match current state links from actions replaced by links from Start when done Example: Evacuation from a tall building in the case of fire Plan immediate action along the escape route After each change of floor check if the escape route is still clear to continue In the case of obstacle, replan for finding alternative escape route In the case similar to 11th September jump through the window… Markov Decision Processes Who is Markov? Russian mathematician Andrey Markov (1856-1922) Whenever we say that a system is Markovian it means something along the lines of the future does not depend of the past given the present The Markov decision process (MDP) is a mathematical framework used for modelling decision-making problems where the outcomes are partly random and partly controllable. It’s a framework that can address most reinforcement learning (RL) problems. Components of an MDP and Markov Decision Policies MDPs are defined by the following components: States(S): These are the possible conditions or configurations in which the agent can find itself. Each state provides the agent with complete information about the past relevant to future decisions. Actions(A): In each state, the agent can choose from a set of possible actions. The choice of action may depend on the state. Transition Function(P): This function defines the probability of moving from one state to another given an action. It's often denoted as 𝑃(𝑠′∣𝑠,𝑎) meaning the probability of moving to state 𝑠’ from state 𝑠 after taking action 𝑎. Reward Function(R): This function gives the immediate reward (or penalty) received after transitioning from one state to another via an action. It can be specific to the state and action (𝑅(𝑠,𝑎), or it can depend on the state transition (𝑅(𝑠,𝑎,𝑠′)). Start State: This is where the agent begins the decision process. There may also be a set of initial states with probabilities. Terminal State: These are special states where the process ends. The agent's task is often to reach one of these states while maximizing some cumulative measure of reward. Markov property in the context of Markov Decision Processes Non-Markovian Processes If a process is not Markovian, the probability of transitioning to the next state 𝑃(𝑠𝑡+1∣𝑠𝑡,𝑎𝑡,𝑠𝑡−1,𝑎𝑡−1,𝑠𝑡−2,𝑎𝑡−2…𝑠0) on the entire history of past states and actions. This means that to predict the future, you need to know the complete sequence of previous states and actions. Markovian Processes: For a process that is Markovian, the next state depends only on the current state and the action taken in that state. This is denoted as 𝑃(𝑠𝑡+1∣𝑠𝑡,𝑎𝑡) This property simplifies the analysis and computation in decision processes because it does not require keeping track of the entire history, just the current state and action. Practical Implementation: The practical implication of a state being Markovian is that the current state 𝑠t encapsulates all relevant information from the past needed to predict the future. This implies that the state representation is designed in such a way that no additional historical data is required to make decisions or predictions about what happens next. Other MDP Concepts Rewards A reward is a scalar feedback signal given to an agent based on the actions it takes in specific states. It reflects the desirability of an outcome from the agent's perspective. The goal of the agent is often to maximize the cumulative reward it receives over time. Types of Rewards Immediate Reward: Given immediately after an action is taken in a state. It is often defined as 𝑅(𝑠,𝑎) where 𝑠 is the current state and 𝑎 is the action taken. Future Reward: Rewards that the agent expects to gain in the future as a result of its current actions. The calculation of future rewards often involves considering the probabilities of transitioning to various possible next states and the rewards associated with those states. Role in Decision-Making: rewards are critical for shaping the behaviour of an agent: Positive Rewards incentivize the agent to take actions that lead to positive outcomes. Negative Rewards (or penalties) discourage actions that might lead to undesirable outcomes. Reward Functions The reward function in an MDP defines the reward received after transitioning from one state to another via a chosen action. It can be specific to the action and the resulting state: R(s,a,s’) This form specifies the reward for moving from state S o state 𝑠′ due to action 𝑎 The design of the reward function is crucial in Markov Decision Processes as it significantly impacts an agent's learning and performance. Sparse rewards, given intermittently, contrast with “dense” rewards, which are issued at every step and can facilitate faster learning but are harder to design. Reward shaping modifies the reward function to make desired outcomes more apparent and immediate, aiding in more effective learning. Challenges include the complexity of designing an appropriate reward function and the credit assignment problem, where it's difficult to link delayed rewards to specific actions. Overall, a well-designed reward structure is essential for guiding agent behaviour towards achieving set objectives. Living Reward or Living Cost Living Reward/Living Cost: This is a reward or cost associated with each step or action taken by an agent within an environment. It's used to incentivize or penalize certain behaviours. A living reward might be positive, encouraging continued action, while a living cost, often negative, discourages certain actions by imposing a cost on each move. Transition Function (Stochastic) Stochastic Transition Function: In an MDP, the transition function determines the probability of moving from one state to another given an action. It's stochastic because the outcome is typically probabilistic, meaning that even if the agent performs the same action in the same state multiple times, it might end up in different subsequent states. This function captures the uncertainty and variability of the environment. Sequence of Rewards Sequence of Rewards: Refers to the series of rewards an agent collects over time as it moves from state to state and takes actions within an MDP. Analyzing sequences of rewards helps in understanding the cumulative benefits or costs of different strategies or actions over time. Discounting Discounting: This is a technique used to prioritize immediate rewards over future rewards. In many decision-making processes, future rewards are "discounted" at a certain rate, which decreases their present value. This is typically represented by a discount factor 𝛾(where 0 ≤ 𝛾 < 1). Discounting helps in handling infinite sequences of rewards by making the sum of these rewards finite. Infinite Utilities Infinite Utilities in theoretical models, especially when considering long-term or indefinite durations, utilities can potentially become infinite. Discounting is a critical method used to manage these scenarios, ensuring that the cumulative utility remains finite and computable. Quantities Quantities in the context of MDPs, quantities often refer to measurable aspects like the number of steps taken, rewards received, or changes in state values. These metrics are crucial for evaluating the performance of policies and understanding the dynamics of the decision-making process. Relationship Between Values Relationship Between Values in MDPs, the relationship between the values of states is governed by the Bellman equations. These equations express the value of a state as the maximum expected return from that state, considering all possible actions, the probabilities of resulting states, and their respective values. This relationship forms the backbone of both value iteration and policy iteration techniques used to solve MDPs. Solving an MDP To solve an MDP means to find an optimal policy that maximizes the cumulative reward. There are several methods for this: Using Heuristic Search To solve a Markov Decision Process (MDP) using heuristic search, we must start by selecting a heuristic that approximates the value of states or the effectiveness of actions. This might be based on domain-specific knowledge or a simplified model of the MDP. Implement heuristic search techniques such as greedy algorithms, which choose actions based on immediate rewards and heuristic estimates of future state values, or Monte Carlo Tree Search (MCTS), which uses random simulations to explore promising decision paths. Evaluate the effectiveness of these strategies continuously, adjusting the heuristic or search parameters to optimize performance. This approach enables faster convergence on effective policies by focusing computational efforts on the most promising parts of the state space and actions, making it manageable to address larger or more complex problems. Using Value Iteration Value Iteration is an algorithm used to find the optimal policy in a Markov Decision Process (MDP) by updating the state values iteratively. It begins by initializing all state values, typically to zero. The core of the algorithm is the Bellman equation, which updates the value of each state based on the maximum expected utility of taking each possible action. This utility is calculated by considering the probabilities of transitioning to other states, the rewards associated with those transitions, and the discounted future values of those states. Specifically, the update rule is: where 𝛾 is the discount factor that balances immediate and future rewards. The iterations continue until the change in values between iterations falls below a predefined small threshold, indicating that the values have converged. Once convergence is achieved, the optimal policy can be derived by choosing the action that maximizes the expected utility for each state. This method ensures that the derived policy maximizes the total expected return from any given state. Using Policy Iteration Policy Iteration is a method for solving Markov Decision Processes (MDPs) that involves two main steps: evaluating a given policy and improving it iteratively until the policy converges to the optimal one. Here’s a straightforward explanation: 1. Start with an Initial Policy: Begin by initializing a policy, typically by randomly assigning an action to each state. 2. Policy Evaluation: Compute the value of each state under the current policy. This involves solving the Bellman equation for the policy, which for each state is: This equation calculates the expected return for each state under the current policy 𝜋, accounting for the transition probabilities 𝑃(𝑠′∣𝑠,𝜋(𝑠), the rewards 𝑅(𝑠,𝜋(𝑠),𝑠′), and the discount factor 𝛾. 3. Policy improvement. Once the state values are computed, update the policy by choosing the action that maximizes the expected utility for each state, given the current value estimates: if the policy has changed. If it has, repeat the policy evaluation with the new policy. If the policy remains unchanged, it implies convergence to the optimal policy. 4. Convergence to Optimal Policy: The process of evaluation and improvement repeats until the policy does not change between iterations, indicating that the optimal policy has been found. This method efficiently finds the policy that maximizes the expected return from all states over time, adjusting the policy based on detailed evaluations of the possible outcomes of each action. 9 Learning from examples Introduction to Machine Learning (ML) Machine Learning is about creating new facts without being able to infer them logically from the exiting knowledge and experience. Machine learning can be useful in a number of tasks which require knowledge Detection: discovering implicitly present interference from the outside world Classification: grouping the items into categories, groups or classes according to certain discriminating characteristics Recognition: Establishing the class of an item based on common attributes Identification: Unambiguously recognizing an item based on unique attributes Prediction: Predicting the appearance of a particular object, class or pattern There are three types of feedback that can accompany the inputs, and that determine the three main types of learning: Supervised Learning agent observes input-output pairs learns a function that maps from input to output *** output prediction capability *** Unsupervised Learning agent processes data input learns patterns in the input without any explicit feedback *** input classification capability **** Utility-based Learning agent learns from a series of reinforcements: rewards & punishments *** improvement over the time *** Supervised Learning Training set of N examples of input-output mapping (x1, y1), (x2, y2),... (xN, yN) , y = f (x) Model h is hypothesis about the world, approximates the true function f drawn from a hypothesis space H of possible functions h Model of the data, drawn from a model class H Consistency hypothesis: h must be such, that for each xi in the training set h (xi) = yi. In reality, we learn only the best-fit function which h (xi) ≈ yi The true measure of the quality of a hypothesis depends on how well it handles inputs it has not yet seen during the training. We evaluate it by applying the function to the input from a test set (xj , yj) and compare the predicted output h (xj) to the actual yj Supervised Learning Quality Bias: the tendency of a predictive hypothesis to deviate from the expected value when averaged over different training set Underfitting: fails to find a pattern in the data, possibly due to insufficient training. Overfitting: when it performs poorly on test data due to too much training Variance: the amount of change in the hypothesis due to fluctuation in the training data. Bias–variance tradeoff: a choice between more complex, low-bias hypotheses that fit the training data well and simpler, low-variance hypotheses that may generalize better. Supervised Learning - Decision Trees A decision tree is a representation of a function that maps a vector of attribute values to a single output value—a “decision.” reaches its decision by performing a sequence of tests, starting at the root and following the appropriate branch until a leaf is reached. each internal node in the tree corresponds to a test of the value of one of the input attributes the branches from the node are labeled with the possible values of the attribute, the leaf nodes specify what value is to be returned by the function. Aim: find a small tree consistent with the training examples Idea: (recursively) choose the “most significant” in the sense of closest to the decision attribute as root of every subtree Decision Trees – Applicability Decision trees can be made more widely useful by handling the following complications: Missing data Continuous and multivalued input attributes Continuous-valued output attribute Decision trees are also unstable in that adding just one new example can change the test at the root, which changes the entire tree. Model Selection and Optimization Task of finding a good hypothesis can be split into two subtasks: Model selection: model selection chooses a good hypothesis space Optimization (training) finds the best hypothesis within that space. The set of data can be split into a training set to create the hypothesis, and a test set to evaluate it. Error rate: the proportion of times that h (x) ≠ y for a sample (x, y) When considering different models: three data sets are needed: A training set to train candidate models. A validation set, also known as a development set or dev set, to evaluate the candidate models and choose the best one. A test set to do a final unbiased evaluation of the best model. When insufficient amount of data to create three sets: k-fold cross-validation split the data into k equal subsets; popular values for k are 5 & 10 perform k rounds of learning on each round 1/k of the data are held out as a validation set and the remaining examples are used as the training set. Criterion for selection is to minimize the loss function rather than maximize the utility function. The loss function L(x, y, yˆ) is defined as the amount of utility lost by predicting that h(x) = yˆ when the correct answer is f (x) = y: A simplified version of the loss function which is independent of x: L(y, yˆ) The learning agent maximizes its expected utility by choosing the hypothesis that minimizes expected loss over all input–output pairs it will see. Parametric Models Parametric model: learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) Nonparametric Models Nonparametric model: model that cannot be characterized by a bounded set of parameters Example: Simplest nonparametric learning method: Table lookup take all the training examples, put them in a lookup table, and then when asked for h(x), see if x is in the table; if it is, return the corresponding y. Support vector machines (SVM) SVMs retain three attractive properties over deep learning networks and random forests, which are much more complex methods: SVMs construct a maximum margin separator - a decision boundary has the largest possible distance to example points SVMs create a linear separating hyperplane SVMs are nonparametric - the separating hyperplane is defined by a set of examples Instead of minimizing expected empirical loss on the training data, SVMs attempt to minimize expected generalization loss. Other non-parametric methods Locality-sensitive hashing (LSH) Nonparametric regression The kernel trick Developing Machine Learning Systems Problem formulation Define the problem, the input, output and loss function Chose metrics that should be tracked Data collection, assessment, and management When data are limited, data augmentation can help For unbalanced class, undersample the majority, over-sample the minority Feature engineering, Exploratory data analysis (EDA) Model selection and training Receiver operating characteristic (ROC) curve Confusion matrix Trust, interpretability, and explainability Source control Testing Review, Monitoring, Accountability, Inspect the actual model and understand why it got a particular answer for input Operation, monitoring, and maintenance Monitor your performance on live data Nonstationarity—the world changes over time 10 Week _ Knowledge based learning A Logical Formulation of Learning Learning by extension of the goal predicate Certain hypothesis predicts that a set of examples will be examples of the goal predicate Based on this prediction the goal predicate is extending to include the examples Learning by ruling out wrong hypothesis hypotheses that are not consistent with the examples can be ruled out false negative for the hypothesis - hypothesis says it should be negative but in fact it is positive false positive for the hypothesis - if the hypothesis says it should be positive but in fact it is negative Learning by searching for the current-best-hypothesis maintain a single hypothesis, adjust it as new examples arrive in order to maintain consistency If there is a new example that is false negative – generalize by extending the hypothesis to include the false negative, If there is a new example that is false positive - specialize by decrease the hypothesis to exclude the false positive example. Generalization & specialization are logical relationships between hypotheses, i.e. if hypothesis h1, with definition C1, is a generalization of hypothesis h2 with definition C2, we have ∀ x C2(x) ⇒ C1(x) to construct a generalization of h we must find a definition C1 that is logically implied by C2. Steps of the learning process A consistent hypothesis. A false negative. The hypothesis is generalized. A false positive. The hypothesis is specialized. Least-commitment search (continuation) In order the general structure of the boundary-set to be sufficient for representing the version space without the need to explore it entirely it has to have two properties: Every consistent hypothesis (other than those in the boundary sets) is more specific than some member of the G-set, and more general than some member of the S-set Every hypothesis more specific than some member of the G-set and more general than some member of the S-set is a consistent hypothesis Knowledge in Learning Modern approach to AI is to design agents that already know something about the solution and are trying to learn some more during the process of solving problems Explanation based learning, or EBL – Explanation-based learning (EBL) extracts general rules from single examples by explaining the examples and generalizing the explanation. The goal predicate is formulated using fixed features The hypothesis is formulated logically from the background knowledge The agent does not actually learn anything factually new from the examples, it relies only on the background knowledge Relevance-based learning, or RBL Relevance-based learning (RBL) uses prior knowledge in the form of determinations to identify the relevant attributes The goal predicate accounts the relevance of a set of features The background knowledge together with the observations allows the agent to infer a new, general rule that explains the observations and it uses it to formulate the hypothesis The agent effectively uses deductive form of learning and cannot create new knowledge starting from scratch, it needs some initial knowledge Knowledge-based inductive learning, or KBI Knowledge-based inductive learning (KBIL) finds inductive hypotheses that explain sets of observations with the help of background knowledge. the background knowledge together with the new hypothesis combined explain the examples KBI is studied mainly in the field of inductive logic programming, or ILP Prior knowledge plays two key roles in reducing the complexity of learning Explanation-Based Learning A cumulative learning process which uses not only the initial background knowledge, but also its extension over the time in result of the learning process The extension of the knowledge is based on extracting general rules from individual observations Memorization: accumulate a database of input–output pairs; when the function for transforming the input into an output is required EBL create general rules that cover an entire class of cases EBL general process: Given an example, construct a proof that the goal predicate applies to the example using the available background knowledge. 2. In parallel, construct a generalized proof tree for the parametrized goal using the same inference steps as in the original proof. 3. Construct a new rule whose left-hand side consists of the leaves of the proof tree and whose right-hand side is the parametrized goal (after applying the necessary bindings from the generalized proof). 4. Drop any conditions from the left-hand side that are true regardless of the values of the parameters in the goal. Learning Using Relevance Information Inductive logic programming (ILP) techniques perform KBIL on knowledge that is expressed in first-order logic. Functional Dependencies or Determinations? Functional Dependence is a strict form of relevance of the attributes, which describe the problem Example: given nationality, language is fully determined; special syntax formalizes this as Nationality (x, n) ≻ Language (x, l) Determinations specify a sufficient basis vocabulary to construct hypotheses concerning the target predicate. Example: the nationality, the parents’ nationalities and the place of birth determine the language; A determination P ≻ Q says that if any examples match on hypothesis P, then they must also match on hypothesis Q. A determination is consistent with a set of examples if every pair that matches on the predicates on the left-hand side also matches on the goal predicate Inductive Logic Programming Top-down inductive learning method Inductive logic programming (ILP) combines inductive methods for learning with the full power of first-order inductive logic (FOIL) for logical inference The hypotheses are represented as logic programs and given the examples as facts in the programs the logical inference behind the program interpreter derives as conclusions the rules to solve the problem Use first-order literals instead of attributes to represent the examples (logically, they are facts) Use clauses to represent the hypothesis (logically, they are heuristics) Starts the inference with a clause, which represents a very general rule for solving the problem (initial hypothesis) Gradually specialize the so that it matches the facts (fits the examples) Three kinds of literals that can be added: Literals build using predicates: the literal can be negated or unnegated, any existing predicate (including the goal predicate) can be used, the arguments must all be variables literal must include at least one variable from an earlier literal or from the head of the clause. Equality and inequality literals build using comparisons: Such type of literals are used to relate new variables, coming from the examples which already appeare in the clause Arithmetic comparisons: when dealing with functions of variables, literals such as x > y and y ≤ z can be added; their truth will be determined by computational means CHOOSE-LITERAL uses a heuristic based on the idea of information gain (Ockham’s razor to eliminate irrelevant hypotheses) Inductive learning with inverse deduction An inverse resolution step takes a resolvent C and produces two clauses C1 and C2, such that C is the result of resolving C1 and C2. Alternatively, it may take a resolvent C and clause C1 and produce a clause C2 such that C is the result of resolving C1 and C2.

Use Quizgecko on...
Browser
Browser