Understanding Intelligence and AI

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is a characteristic of a 'supercritical mind' according to Turing?

  • Fails to sustain a chain reaction of thoughts.
  • Operates solely on subcritical plutonium.
  • Generates new ideas beyond what it is given. (correct)
  • Processes input passively without innovation.

What is the primary aim of AI alignment?

  • To ensure AI systems always achieve goals regardless of human values.
  • To prioritize correctness over logic or value in AI decision-making.
  • To match AI's goals with human intentions. (correct)
  • To design AI systems that can wish for things.

What is a key difference between Expert Systems and Decision Support Systems (DSS)?

  • Expert Systems focus on data processing and analytics, while DSS emphasize reasoning.
  • There is no significant difference between Expert Systems and DSS.
  • Expert Systems primarily support human decision-making, while DSS aim to make decisions independently.
  • Expert Systems aim to make the decisions whereas DSS support human decision-making. (correct)

Which characteristic distinguishes Agents from Expert Systems?

<p>Agents exhibit proactive behavior responding dynamically to their environment whereas Expert Systems rely on user input. (D)</p> Signup and view all the answers

What is the role of the Inference Engine in an expert system?

<p>To apply rules to facts to derive conclusions. (A)</p> Signup and view all the answers

Which of the following is NOT a characteristic of probabilistic reasoning in expert systems?

<p>Assigns binary logic to decisions. (D)</p> Signup and view all the answers

What is the primary purpose of using Naive Bayes in classification problems?

<p>To simplify probability calculations by assuming attributes are conditionally independent. (B)</p> Signup and view all the answers

In the context of graphical models, what is a key difference between Bayesian Networks and Markov Networks?

<p>Bayesian Networks use directed graphs, while Markov Networks use undirected graphs. (A)</p> Signup and view all the answers

What is the main goal of 'marginalizing through variable elimination'?

<p>To compute exact probabilities efficiently by removing variables step by step. (D)</p> Signup and view all the answers

Which of the following best describes the Loopy Belief Propagation algorithm?

<p>An iterative message-passing algorithm used for approximate probabilistic inference in networks with cycles. (A)</p> Signup and view all the answers

Which of the following presents a disadvantage of Expert Systems:

<p>They struggle with common sense reasoning. (B)</p> Signup and view all the answers

Which concept is considered a 'mental attitude' often attributed to agents?

<p>Belief (A)</p> Signup and view all the answers

In the context of agents, what does 'active behaviour' encapsulate that 'passive behaviour' does not?

<p>&quot;When&quot;, &quot;With whom&quot;, &quot;Whether at all&quot;. (B)</p> Signup and view all the answers

What is a key characteristic of Simple Reactive Agents?

<p>Intelligence emerges from the interaction between simple behaviors and direct observations of the environment. (C)</p> Signup and view all the answers

What is the main focus of Subsumption Architecture (SRA) in agent design?

<p>A hierarchical set of behaviors to accomplish tasks (D)</p> Signup and view all the answers

When is a model-based agent needed?

<p>When immediate observations are not sufficient to determine the best next action. (B)</p> Signup and view all the answers

What are the main problems with Logic-Based Agents (LBA)

<p>Building a sufficiently complete model of a complex environment is very hard. (D)</p> Signup and view all the answers

Goal-based agents come into play when...?

<p>You need agents that can alter goals depending on environmental response (B)</p> Signup and view all the answers

In a horizontally layered architecture, what does a mediator function do?

<p>A function that is needed if actions contradict each other (B)</p> Signup and view all the answers

What does the acronym MARL stand for?

<p>Multi-Agent Reinforcement Learning (C)</p> Signup and view all the answers

When should AI systems coordinate?

<p>As computer applications become more complex and dynamic. (A)</p> Signup and view all the answers

Which of the following best describes 'Task and Result Sharing' as a coordination mechanism?

<p>Complex solution is divied up into subtasks where agents produce partial solutions that are synthesized to produce a final solution. (B)</p> Signup and view all the answers

In the contract net model, what role(s) can an agent assume?

<p>Both a manager and a contractor (B)</p> Signup and view all the answers

A key principle behind FA/C (Functionally Accurate Cooperation) is that one should not aim to build a system in which only...

<p>Completely accurate information is exchanged among cooperating entities. (B)</p> Signup and view all the answers

What is the primary focus of Joint Planning in multi-agent systems?

<p>Coordinating their plans to achieve a shared goal. (B)</p> Signup and view all the answers

In Partial Global Planning (PGP), what is one key limitation?

<p>Local actions may be executed without joint agreement. (B)</p> Signup and view all the answers

What role does the fitness parameter play in Evolutionary Algorithms?

<p>Fitness is usually the criterion that needs to be optimized.. (A)</p> Signup and view all the answers

What is the role of selection of Evolutionary Algorithms?

<p>To steer reproduction based on solution probabilities for survival. (D)</p> Signup and view all the answers

What is the name of the tool that holds several limited size tournaments

<p>Tournament selection (B)</p> Signup and view all the answers

What is the name of what happens when the two parents share their genetic code and generate offspring

<p>Crossover (C)</p> Signup and view all the answers

Which of the listed Schemas has a higher probability of surviving mutation given the mutation probability for a single bit where P_m is 0.1: Schema A: 10*0 where * means value does not matter or Schema B 10**0?

<p>Schema A (B)</p> Signup and view all the answers

Which is preferable designing the genoptype design to have genes close together or far apart and independent?

<p>A and B (D)</p> Signup and view all the answers

Compared to Supervised Learning, how does Reinforcement Learning aquire feedback during training?

<p>Through rewards and self-improvement. (B)</p> Signup and view all the answers

What does the temporal discounting factor, $\gamma$, represent in reinforcement learning?

<p>Puts some time pressure on the agent (Getting a reward sooner is better than late) (A)</p> Signup and view all the answers

What are the characteristics of State Value $V (S)$ and State Value $Q(S, A)$

<p>$V (S)$ represents Expected reward from being in state S whereat $Q(S, A)$ Expected reward from taking action A. (D)</p> Signup and view all the answers

Which of the the following most accuratly represents the goal of Model Free Policy Evaluation

<p>Learning from experience (B)</p> Signup and view all the answers

What can be used with existing estimates of state-values in order to produce new samples of state-values?

<p>Collectivity Bootstraping -Temporal difference learning (B)</p> Signup and view all the answers

SARSA (State-Action-Reward-State-Action); compare vs other action updates

<p>Updates based on the action actually taken by the current policy , unlike the current policy. (B)</p> Signup and view all the answers

If $T → 0$, what type of Boltzmann policy emerges?

<p>Gready. (A)</p> Signup and view all the answers

Within the context of using Q-learning: if an agent can not perform lifelong exploration into account, what is one of the limitations ?

<p>It requires sufficient exploration to learn the right values (C)</p> Signup and view all the answers

Policy value function can be calculated in multiple environments; what types of environments?

<p>Eposodic or continuous (D)</p> Signup and view all the answers

What does $\nabla_\theta \pi(s,a,\theta)$ represent?

<p>Policy gradient theorem. (A)</p> Signup and view all the answers

Flashcards

AI as a Field

The everyday notion of human intelligence is used as a starting point. The goal is computational precision of this notion, resulting in a multidisciplinary field.

Turing Test Idea

A computer program is intelligent if its responses are indistinguishable from those of a human.

Supercriticality

The ability of a mind to amplify ideas and produce a self-sustaining cascade of thoughts, leading to active thinking and innovation.

Visionary Approach

Focuses on creating AI systems that produce intelligent behavior in a way similar to humans or animals. Often leads to debates on Weak vs. Strong AI.

Signup and view all the flashcards

Pragmatic Approach

Aims to build AI that exhibits behavior comparable to natural intelligence, without necessarily replicating human-like thought processes.

Signup and view all the flashcards

Chinese Room Argument

A thought experiment arguing that systems producing correct answers don't necessarily understand them, it supports the Weak AI position.

Signup and view all the flashcards

AI Alignment

Focuses on ensuring that an AI's goals match human intentions, emphasizing careful goal-setting to avoid unintended consequences.

Signup and view all the flashcards

Expert System

A computer system emulating human expert decision-making, using knowledge and inference to solve specialized problems that are difficult and require expertise.

Signup and view all the flashcards

Inference Engine

Apply rules to facts to derive conclusions, using knowledge representation languages and Description Logics to represent knowledge.

Signup and view all the flashcards

Probabilistic Reasoning

Unlike traditional rule-based systems, it allows handling uncertainty using probabilities and assigning confidence values to decisions.

Signup and view all the flashcards

Naive Bayes

Simplifies probability calculations by assuming attributes are conditionally independent given the class, making calculations easier.

Signup and view all the flashcards

Marginalization vs. Maximization

The probability of a variable computed by summing over all possible values of unobserved variables; finds the most probable value.

Signup and view all the flashcards

Loopy Belief Propagation

Iterative message-passing algorithm used for approximate probabilistic inference in networks with cycles, by exchanging belief updates between variables and factors.

Signup and view all the flashcards

Pros and Cons of Expert Systems

Consistency, memory, logic, and reproducibility are advantages. Lacks common sense, creativity, adaptability, and requires manual maintenance.

Signup and view all the flashcards

Agent Properties

Autonomous, Adapting, Situated, Pursuing Goals, Persisting, Create/Set goals.

Signup and view all the flashcards

Logic Based Agent

Symbolic or knowledge-based systems where logical deductions are the foundation rather than reacting to sensor input without reasoning.

Signup and view all the flashcards

Symbol Grounding

Coupling perception with symbolic facts, connecting abstract representations to real-world sensory data.

Signup and view all the flashcards

Simple Reactive Agents

Behavioral and situated systems, where intelligence emerges from interaction and direct observation, not just thinking or reasoning.

Signup and view all the flashcards

Subsumption architecture (SRA)

Consists of selecting behaviours through a stuctured hierachy

Signup and view all the flashcards

Model Based Agent

Models the world state to determine best action; updates predictions based on environmental observations.

Signup and view all the flashcards

Goal Based Agent

Involves explicit evaluation and determination of goals to influence action selection based on environmental response.

Signup and view all the flashcards

Coordination Definition

Managing dependencies between activities, where agents are aware of their dependencies on others and adjust accordingly.

Signup and view all the flashcards

Task and Result Sharing

Breaks down a complex problem into smaller subtasks, distributed among multiple agents, then combines partial solutions to construct the final result.

Signup and view all the flashcards

Blackboard Model

A coordination model where agents collaborate by reading and writing to a shared memory space, contributing partial solutions and incrementally refining a problem's solution.

Signup and view all the flashcards

Contract Net Model

An early negotiation protocol where nodes act as managers and contractors, often used when the individuals' local knowledge is incomplete, uncertain and inconsistent.

Signup and view all the flashcards

Joint Planning

Deals with how multiple agents coordinate their plans to achieve a shared goal.

Signup and view all the flashcards

Partial Global Planning (PGP)

A general coordination schema that extends the FA/C principle by allowing agents to represent and reason about actions and interactions of other agents.

Signup and view all the flashcards

Population of Candidate Solutions

Randomly generated sets of possible solutions. They are sufficiently diverse and large, acting as a basis for better solutions that evolve through each generation.

Signup and view all the flashcards

Genotype vs. Phenotype

An encoding of the potential solution and the actual tested instance.

Signup and view all the flashcards

Fitness-Based Survival

Selection probabilities for generating offspring, survival of the genotype, i.e. inclusion in the mating pool for the next generation.

Signup and view all the flashcards

Fitness Wheel Tuning

The tuning of the selection pressure; Boltzmann and Gibbs distribution enable this.

Signup and view all the flashcards

Mutation

Involves random changes in the genotype, offering new search regions, and increases population diversity through changes in individuals' attributes.

Signup and view all the flashcards

Schema

A shared property between some genotypes, indicating fixed value positions or positions where the value does not matter.

Signup and view all the flashcards

The Schema Theorem

A higher probability of survival after crossover exists through schemata at shorter defining lengths with low order.

Signup and view all the flashcards

Building Blocks Theorem

GA uses identification of building blocks/ Schemata through recombination for well-performing, lower order fitness algorithms.

Signup and view all the flashcards

SNES strategy

Aim to achieve separable natural evolution strategy to increase diversity and prevent sub-optimal convergence.

Signup and view all the flashcards

Reinforcement Learning

RL learns from reward, trial, and error versus from a dataset.

Signup and view all the flashcards

Reward (RL)

The set of rewards/ punishments the agent may receive in it's current state.

Signup and view all the flashcards

RL Problem Goal

Situation to action mapping, a policy defines the direction and magnitude of rewards, the goal is to learn the most advantageous path.

Signup and view all the flashcards

Discounting Reward

Ensures sum of rewards becomes finite; adds time pressure, rewarding sooner over later.

Signup and view all the flashcards

Study Notes

Lecture 1 - Intelligence

  • There is no universally accepted definition of intelligence and it can take different specific forms, including social, emotional, senso-motoric, and mental intelligence
  • Artificial Intelligence (AI) uses the everyday notion of human intelligence as a starting point, focusing on computational precision
  • AI is a multidisciplinary field involving the Design, Analysis, and Application of computer-based systems that meet intelligence criteria
  • The Turing Test, also known as the imitation Game, was proposed by Alan Turing in 1950 as a demonstration for AI
  • The test aims to define intelligence through indistinguishability from human intelligence
  • A computer program is considered intelligent if its responses are indistinguishable from a human's
  • Turing explored supercriticality in intelligence, drawing an analogy to nuclear fission
  • A subcritical mind doesn't generate new ideas, while a supercritical mind amplifies them
  • This raises the question of designing machines that actively think and innovate, not just process inputs
  • An operational definition is needed, driven by visionary and pragmatic approaches
  • The visionary approach focuses on creating AI systems with human-like intelligent behavior, while the pragmatic approach aims to replicate natural intelligence without necessarily mimicking human thought processes
  • The Strong AI vs Weak AI debate centers around these motivations, questioning whether AI can truly "think" or just mimic intelligence
  • The Chinese Room experiment argues that a system producing correct answers doesn't necessarily understand them, supporting the Weak AI position
  • Perspectives on behavior and understanding diverge, with some arguing that only behavior matters (Weak AI view) and others asserting the necessity of true understanding (Strong AI view)
  • Intelligence can be defined by acting humanly (imitating human behavior), acting rationally (doing the "right" things with a goal), thinking humanly (Cognitive Modelling), and thinking rationally (symbolic approach to AI)
  • Goal, value, and Al alignment are essential, defining the "right" action while considering correctness, logic, or value
  • Misaligned objectives can lead to unexpected consequences, emphasizing careful goal-setting
  • AI Alignment ensures it aligns with human intentions, addressing concerns in applications like self-driving cars/chatbots where unintended behavior can have consequences

Lecture 2 - Expert Systems

  • Expert systems emulate human expert decision-making in specialized domains, using knowledge and inference
  • They differ from intelligent systems by often lacking real-world interaction and embodiment
  • Expert systems use rule-based decision making where intelligent systems integrate sensors, embodiment, and real-world interaction

Expert Systems vs Other Systems

  • Expert systems combine knowledge and reasoning to make decisions, while Decision Support Systems (DSS's) focus on data processing and analytics to assist users
  • Key difference: Expert Systems make decisions, DSS supports human decision-making
  • Expert systems rely on user input and lack proactivity, while agents are autonomous and interact with their environment dynamically
  • Key difference: Expert Systems require human guidance, while agents operate independently

How Expert Systems Work

  • Core components: Knowledge Base that stores rules and facts, and an inference engine that applies rules to facts for conclusions
  • Knowledge representation languages, like Description Logics, exist
  • Probabilistic reasoning handles uncertainty and offers flexible decisions

Probabilistic Models

  • Bayes' Rule updates beliefs based on new evidence
  • Naïve Bayes assumes independent attributes for simplified probability calculations
  • Graphical models, such as Bayesian Networks, use directed graphs with conditional probability tables
  • Markov Networks use undirected graphs with cliques
  • Factor Graphs offer a hybrid representation, grouping random variables into conditionally independent cliques

Marginalization vs Maximization

  • Marginalization computes the probability of a variable by summing over all possibilities
  • Maximization finds the most probable value Marginalizing with Variable Elimination computes accurate probabilities efficiently by eliminating variables step-by-step

Loopy Belief Propagation

  • Loopy Belief Propagation is an iterative algorithm for approximate probabilistic inference in networks with cycles

Expert Systems Pros and Cons

  • Advantages: Includes consistency, memory, logic, reproducibility
  • Disadvantages: Limited common sense, creativity, adaptability, and reliance on manual updates

Lecture 3 - Agents

  • Agents can be autonomous, adaptive, situated, goal-oriented, and persistent
  • Mental attitudes, such as beliefs, intentions, and desires, are associated with agents
  • Examples of include interacting with the environment, or offering autonomous advice like digital assistants

Agent Concepts

  • Agents and objects share identity, state, and passive behavior but agents uniquely possess active behavior
  • They can be logic-based (using symbolic AI) or reactive (responding to stimuli)
  • Minimal agents should perceive their environment and act through actuators, intelligent agents add pro-activeness, reactivity, and social ability
  • Agent environments vary by accessibility, determinism, episodicity, dynamics, and action space

Agent Architectures

  • Logic-based agents use symbolic representation and logical deduction, while reactive agents respond directly to their environment
  • Planning/search agents use an environmental model to determine actions
  • Agents can be categorized as simple reactive, model-based, or goal-based
  • Model-based agents use beliefs about the environment, updating these based on predictions and observations
  • Goal-based agents involve explicit goal evaluation and determination
  • Architectures can be layered horizontally or vertically for proactive and reactive behavior

Lecture 4 - Multi Agent Coordination

  • Coordination involves managing dependencies between activities, where agents are aware of their interdependencies
  • Key aspects of coordination: environment (diversity, dynamics, predictability), cooperating entities (number, homogeneity, goals), and cooperation (frequency, levels, patterns)
  • Coordination theory focuses on goals, activities, actors, and interdependencies to understand this domain
  • Types of Interdependencies: prerequisite, shared resource, and simultaneity

Why Coordinate

  • The Principle of bounded rationality, limited human processing capacity, and the increasing complexity of computer applications all necessitate it

Coordination Models

  • Basic models include client-server, involving requests and responses between processes
  • Task and result sharing is used to divide a large process into sub processes

Blackboard Model

  • Blackboard Model allows agents to share memory and contribute together to generate a solution

The Contract Net Model

  • The Contract Net Model is when a manager announces tasks, contractors bid, and the best bid is selected

FA/C Principle

  • The FA/C Principle (Functionally Accurate Cooperation) serves as a guideline when working individual local knowledge that is incomplete, uncertain and inconsistent

Joint Planning

  • Joint Planning addresses how multiple agents coordinate plans towards one larger goal
  • Taxonomy of planning: single-component or multi-component approaches
  • Relationships among plans: positive (Equality, Subsumption, Favorableness) and negative (Resource conflicts, Incompatibility)

Partial Global Planning (PGP)

  • PGP extends the FA/C principle by allowing agents to reason about actions while decentralized
  • Local actions may occur without joint agreement, based on relatively simple abstraction

Lecture 5 - Evolutionary Algorithms

  • Solutions can occur through an evolutionary process Each encoding must be translated into a functional phenotype

Populations

  • Involves a population of candidate solutions and is needed to be sufficiently diverse and large
  • Population will act as the basis for better solutions, and improves each generation

Genotype vs. Phenotype

  • In evolutionary algorithms, a genotype represents a potential solution's encoding, while the phenotype is the actual solution derived from it
  • simplest form of a genotype is a bit string, but other representations exist
  • Genotype gives rise to Phenotype, Phenotype can be tested for performance

Selection: Fitness

  • Fitness is usually the criterion that needs to be optimized. The evaluation of fitness can be quite involved
  • Will limit the number of individuals that make up the population and therefore the fitness function
  • Solution: Sampling may be used to speed things up

Fitness Based

  • The performance of candidate solutions is used to steer selection probabilities for survival, and reproduction probabilities to generate offspring and survival of the genotype
  • Includes: Roulette Wheel Selection and Fitness wheel tuning, Boltzmann or Gibbs distribution
  • Also includes: Tournament Selection

Reproduction

  • Occurs through crossover and mutation
  • Two parents share their genetic code and generate offspring

Crossover

  • Includes: Single point crossover, Double point crossover, and Uniform crossover
  • There are many other possibilities dependent on genome-structure

Schemata

  • Theory offers some insight into why good solutions survive and get combined into even better solutions
  • Dealt with at the base level with elementary bit strings

Schema - definition

  • Consider binary genotypes
  • Binary numbers indicate fixed value positions
  • ’s indicate positions for which the value does not matter. Think of a schema as a shared property between (some) genotypes Schema Schemata

Theory

  • Can be used to reason about the disruptive properties of mutation and crossover With a low probability of surviving mutation, and lower probability of surviving singlepoint crossover

Dealing with problems

  • Crossover and mutation can have destructive consequences, To solve we can use Elitism
  • Which essentially means Having k-best individuals transferred to the population of the next generation avoids loss of good solutions

Cooperative Coevolution

  • This provides a way to split a big problem into smaller parts that can be solved that are independent
  • This makes the search for the best solution faster and easier and later be combined into a full solution.

Separable Natural Evolution Strategy

  • Maintains a Gaussian (natural) distribution for each gene in a genotype in which each gene could be independent
  • An individual from the population is generated by sampling each gene separately

Complex Solutions

  • Evoluntionary algos dont scale well with problem complexity
  • The building block theorem basically states that partial solutions should combine into the full solution
  • Which means that the problem is divisible into smaller sub-problems

Fitness Requirements

  • A fitness function needs to be able to separate partial solutions from non-solutions. And have there to be a spread of fitness values

Lecture 6, 7, 8 - Reinforcement Learning

  • Reinforcement Learning is comparable to supervised learning and unsupervised learning:
  • Reward or punishment = numerical value and it encompases artifical intelligence
  • Includes: Supervised Learning and Reinforcement Learning Can either be Behavioral Cloning, or involves learning from interactions

Hardships With RL

  • There are no learning examples,rewards arent immediate, rewards can by sprarse

RL - Terminology

  • State: Where am I now? What does the world look like? Action: What will I do? Transition: How does this change where I am? Reward: What do I get for being here and doing what I did? Policy: How should I behave?

A Reinforcement Learning Problem

  • State: Which square the agent is in, where visible cards are located
  • Action: Moving, hit or stick, selecting open squares
  • Transition: Moving to square, getting card, adds o to chosen square
  • Reward: Depends if it is going to give a positve or negative or not reward at all
  • Policy - learning what todo equals situation to action mapping, a policy

Notations

Transitions: all well if the probability distributions are stationary! Discount Rewards: Mathematically ensures that the sum of rewards becomes finite, and add some pressure to the agent

StateDependentRewards

  • Reward: might not be very useful
  • Q table stores the expected return for each state-action pair, given a specific policy, so the problem of selecting the optimal strategy in all states

Policies

  • Is known as the learning goal for the rl agent and is mapping from states to probability distributions over actions

Methods

  • Specifies how to change Ï€ with the experience collected by the agent in order to maximize G for all states S

Bandit Problem

  • The challenge lies in balancing exploration, trying different actions to learn their rewards, and exploitation, choosing the best-known action to maximize gains
  • The multi armed bandit problem has only 1 state If there are states but no transitions this creates a contextual bandit

Strategies

  • Includes: Strategies, you need to try all option which is epsilon grredy or can use Upper condidence bounds

Value Based Methods

  • Value-based methods are Reinforcement Learning (RL) approaches that focus on learning a value function to make decisions
  • Includes: Estimating State values, Planning Problems and Terminal States Can also include, zero rewards until goal, and Terminal States and Thr Value

###DeterministicEnvironment

  • With all uncertainty, states are given from 0 to 1 and one origin of uncertainty

Bellman Equation

  • The equation breaks down a state's value into immediate rewards and future state values, weighted by their probabilities
  • Can solve a system of equations to find the values for all states.

Iterations

  • we start with a guess and update values in steps till we reach the correct values
  • Sweep is were we keep updating state values

Sweeps and Bellman Error

a sweep means updating all state values once, and we keep doing sweeps until the values stabilize

Algorithms

  • Iterative Policy Algorithm Given: MDP dynamics p and a policy Ï€ Find: Vs ∈ S : Ï…Ï€(8)

Policy Iteration Involves two iterating algorithms, policy and value in each Guarentees that the new policy performs at least as well as the previous one.

Monte Carlo

  • Has Monte carlo sampling estimates the value of taking action a in stats by following a policy pie With the goal to fix first action of rollback to and then follow policy for remainder Note that is Not nessarly a stand alone algo and Is a method for estimating valuing functions within rl.

TD learing

  • Q values Will be filled in when the action is chosen by theRl agent

SARSA

Stands for State:Action:Reward:State Action On policy temporal difference algorithm

Boltzmann

Boltzmann Exploration is a more informed alternative to É›-greedy You need to tune temperatures to decay according to algorithm

LearningAlgos

Is learning an optimal strategy always ideal, that can sarca learn to adapt.

Q Learning

  • Better implovment to sarcas here well implovment in values
  • Where the optimal action instead of one taken. You can can now consult it current values and see to find optimal values to find its value

Direct Policy Search Methods

  • Direct Policy Search is an alternative to Q-learning because it directly optimizes the policy without needing to estimate value functions
  • Direct Policy Search uses the frequency of visited states to create a State Distribution to replace using v(s)
  • You then find Steady state distribution, which satisfies probability of being stable over time

Policy Optomzations

  • Can involve genetic algos can be involved in other techinqes Genetic algos can used to improve overtime as the evolve Seperable natural eval strat involves guassian distrib for each policy.

Using neuro evolution

Neuro evolution is a varient of evolutionary for policy search that helps neural nets Youes direct enconding to help the NN tune parameters Than you can policy a gradient to create functions that increase probabilitities across neural space

Lecture 9 - Multi Agent Reinforcement Learning

  • Multi Agent Reinforcement Learning: Involves many tasks over many players that need to coop and are complex
  • One agent informs what the overall mode looks.
  • So when you want to use single agent as a multiagent. One must take the following actions.
  • Must be from < Se, SA1, Sam> The spaces the action exponentially grows

Cooporating MARL

  • Allows for Multiple agenst to coop a goal. Allows for rewards across systems.

CreditAssignentproblem

  • Looks at value learning methods . To figure what to figure reward Is which agents had this to reward
  • In multiagent additionally The actions of which agents led to this reward? Can some times create campetitve MAS

Best Response Theory

  • It is important that is optimal give to other strategies as a way to stay ahead

Nash Equilibrium.

  • Multi nash equilibrium exist this way This method Is a method were know one improves there strat

Non-stations environment

  • There no long a markov system here The way is to apply Reinforce learning with specialied learning algos

NormalForms

  • Example game is prision dileman two pres campaginers commit crime can they confess with eachother

Eye wide shut approc

  • The other agents are ignored for single agent learinng that create lost covage ,it also might still help on different sectors

Matching Pennies

  • Here two agents try out witt each other by going through action Here this set often changes at 0 Here you have to do find algos and parameter settings

Stable through simplicity

Automata approach Is a more direct that will go adjust actions

Actions for automata

  • All well
  • Actions that do well get higher prob not well probability.

The Stateautomata handle

  • Handing out for getting actions with states. Average reward,Cross learnings,Vectors

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser