Podcast
Questions and Answers
Which of the following is a characteristic of a 'supercritical mind' according to Turing?
Which of the following is a characteristic of a 'supercritical mind' according to Turing?
- Fails to sustain a chain reaction of thoughts.
- Operates solely on subcritical plutonium.
- Generates new ideas beyond what it is given. (correct)
- Processes input passively without innovation.
What is the primary aim of AI alignment?
What is the primary aim of AI alignment?
- To ensure AI systems always achieve goals regardless of human values.
- To prioritize correctness over logic or value in AI decision-making.
- To match AI's goals with human intentions. (correct)
- To design AI systems that can wish for things.
What is a key difference between Expert Systems and Decision Support Systems (DSS)?
What is a key difference between Expert Systems and Decision Support Systems (DSS)?
- Expert Systems focus on data processing and analytics, while DSS emphasize reasoning.
- There is no significant difference between Expert Systems and DSS.
- Expert Systems primarily support human decision-making, while DSS aim to make decisions independently.
- Expert Systems aim to make the decisions whereas DSS support human decision-making. (correct)
Which characteristic distinguishes Agents from Expert Systems?
Which characteristic distinguishes Agents from Expert Systems?
What is the role of the Inference Engine in an expert system?
What is the role of the Inference Engine in an expert system?
Which of the following is NOT a characteristic of probabilistic reasoning in expert systems?
Which of the following is NOT a characteristic of probabilistic reasoning in expert systems?
What is the primary purpose of using Naive Bayes in classification problems?
What is the primary purpose of using Naive Bayes in classification problems?
In the context of graphical models, what is a key difference between Bayesian Networks and Markov Networks?
In the context of graphical models, what is a key difference between Bayesian Networks and Markov Networks?
What is the main goal of 'marginalizing through variable elimination'?
What is the main goal of 'marginalizing through variable elimination'?
Which of the following best describes the Loopy Belief Propagation algorithm?
Which of the following best describes the Loopy Belief Propagation algorithm?
Which of the following presents a disadvantage of Expert Systems:
Which of the following presents a disadvantage of Expert Systems:
Which concept is considered a 'mental attitude' often attributed to agents?
Which concept is considered a 'mental attitude' often attributed to agents?
In the context of agents, what does 'active behaviour' encapsulate that 'passive behaviour' does not?
In the context of agents, what does 'active behaviour' encapsulate that 'passive behaviour' does not?
What is a key characteristic of Simple Reactive Agents?
What is a key characteristic of Simple Reactive Agents?
What is the main focus of Subsumption Architecture (SRA) in agent design?
What is the main focus of Subsumption Architecture (SRA) in agent design?
When is a model-based agent needed?
When is a model-based agent needed?
What are the main problems with Logic-Based Agents (LBA)
What are the main problems with Logic-Based Agents (LBA)
Goal-based agents come into play when...?
Goal-based agents come into play when...?
In a horizontally layered architecture, what does a mediator function do?
In a horizontally layered architecture, what does a mediator function do?
What does the acronym MARL stand for?
What does the acronym MARL stand for?
When should AI systems coordinate?
When should AI systems coordinate?
Which of the following best describes 'Task and Result Sharing' as a coordination mechanism?
Which of the following best describes 'Task and Result Sharing' as a coordination mechanism?
In the contract net model, what role(s) can an agent assume?
In the contract net model, what role(s) can an agent assume?
A key principle behind FA/C (Functionally Accurate Cooperation) is that one should not aim to build a system in which only...
A key principle behind FA/C (Functionally Accurate Cooperation) is that one should not aim to build a system in which only...
What is the primary focus of Joint Planning in multi-agent systems?
What is the primary focus of Joint Planning in multi-agent systems?
In Partial Global Planning (PGP), what is one key limitation?
In Partial Global Planning (PGP), what is one key limitation?
What role does the fitness
parameter play in Evolutionary Algorithms?
What role does the fitness
parameter play in Evolutionary Algorithms?
What is the role of selection
of Evolutionary Algorithms?
What is the role of selection
of Evolutionary Algorithms?
What is the name of the tool that holds several limited size tournaments
What is the name of the tool that holds several limited size tournaments
What is the name of what happens when the two parents share their genetic code and generate offspring
What is the name of what happens when the two parents share their genetic code and generate offspring
Which of the listed Schemas has a higher probability of surviving mutation given the mutation probability for a single bit where P_m is 0.1: Schema A: 10*0
where *
means value does not matter or Schema B 10**0
?
Which of the listed Schemas has a higher probability of surviving mutation given the mutation probability for a single bit where P_m is 0.1: Schema A: 10*0
where *
means value does not matter or Schema B 10**0
?
Which is preferable designing the genoptype design to have genes close together or far apart and independent?
Which is preferable designing the genoptype design to have genes close together or far apart and independent?
Compared to Supervised Learning, how does Reinforcement Learning aquire feedback during training?
Compared to Supervised Learning, how does Reinforcement Learning aquire feedback during training?
What does the temporal discounting factor, $\gamma$, represent in reinforcement learning?
What does the temporal discounting factor, $\gamma$, represent in reinforcement learning?
What are the characteristics of State Value $V (S)$ and State Value $Q(S, A)$
What are the characteristics of State Value $V (S)$ and State Value $Q(S, A)$
Which of the the following most accuratly represents the goal of Model Free Policy Evaluation
Which of the the following most accuratly represents the goal of Model Free Policy Evaluation
What can be used with existing estimates of state-values in order to produce new samples of state-values?
What can be used with existing estimates of state-values in order to produce new samples of state-values?
SARSA (State-Action-Reward-State-Action); compare vs other action updates
SARSA (State-Action-Reward-State-Action); compare vs other action updates
If $T → 0$, what type of Boltzmann policy emerges?
If $T → 0$, what type of Boltzmann policy emerges?
Within the context of using Q-learning: if an agent can not perform lifelong exploration into account, what is one of the limitations ?
Within the context of using Q-learning: if an agent can not perform lifelong exploration into account, what is one of the limitations ?
Policy value function can be calculated in multiple environments; what types of environments?
Policy value function can be calculated in multiple environments; what types of environments?
What does $\nabla_\theta \pi(s,a,\theta)$ represent?
What does $\nabla_\theta \pi(s,a,\theta)$ represent?
Flashcards
AI as a Field
AI as a Field
The everyday notion of human intelligence is used as a starting point. The goal is computational precision of this notion, resulting in a multidisciplinary field.
Turing Test Idea
Turing Test Idea
A computer program is intelligent if its responses are indistinguishable from those of a human.
Supercriticality
Supercriticality
The ability of a mind to amplify ideas and produce a self-sustaining cascade of thoughts, leading to active thinking and innovation.
Visionary Approach
Visionary Approach
Signup and view all the flashcards
Pragmatic Approach
Pragmatic Approach
Signup and view all the flashcards
Chinese Room Argument
Chinese Room Argument
Signup and view all the flashcards
AI Alignment
AI Alignment
Signup and view all the flashcards
Expert System
Expert System
Signup and view all the flashcards
Inference Engine
Inference Engine
Signup and view all the flashcards
Probabilistic Reasoning
Probabilistic Reasoning
Signup and view all the flashcards
Naive Bayes
Naive Bayes
Signup and view all the flashcards
Marginalization vs. Maximization
Marginalization vs. Maximization
Signup and view all the flashcards
Loopy Belief Propagation
Loopy Belief Propagation
Signup and view all the flashcards
Pros and Cons of Expert Systems
Pros and Cons of Expert Systems
Signup and view all the flashcards
Agent Properties
Agent Properties
Signup and view all the flashcards
Logic Based Agent
Logic Based Agent
Signup and view all the flashcards
Symbol Grounding
Symbol Grounding
Signup and view all the flashcards
Simple Reactive Agents
Simple Reactive Agents
Signup and view all the flashcards
Subsumption architecture (SRA)
Subsumption architecture (SRA)
Signup and view all the flashcards
Model Based Agent
Model Based Agent
Signup and view all the flashcards
Goal Based Agent
Goal Based Agent
Signup and view all the flashcards
Coordination Definition
Coordination Definition
Signup and view all the flashcards
Task and Result Sharing
Task and Result Sharing
Signup and view all the flashcards
Blackboard Model
Blackboard Model
Signup and view all the flashcards
Contract Net Model
Contract Net Model
Signup and view all the flashcards
Joint Planning
Joint Planning
Signup and view all the flashcards
Partial Global Planning (PGP)
Partial Global Planning (PGP)
Signup and view all the flashcards
Population of Candidate Solutions
Population of Candidate Solutions
Signup and view all the flashcards
Genotype vs. Phenotype
Genotype vs. Phenotype
Signup and view all the flashcards
Fitness-Based Survival
Fitness-Based Survival
Signup and view all the flashcards
Fitness Wheel Tuning
Fitness Wheel Tuning
Signup and view all the flashcards
Mutation
Mutation
Signup and view all the flashcards
Schema
Schema
Signup and view all the flashcards
The Schema Theorem
The Schema Theorem
Signup and view all the flashcards
Building Blocks Theorem
Building Blocks Theorem
Signup and view all the flashcards
SNES strategy
SNES strategy
Signup and view all the flashcards
Reinforcement Learning
Reinforcement Learning
Signup and view all the flashcards
Reward (RL)
Reward (RL)
Signup and view all the flashcards
RL Problem Goal
RL Problem Goal
Signup and view all the flashcards
Discounting Reward
Discounting Reward
Signup and view all the flashcards
Study Notes
Lecture 1 - Intelligence
- There is no universally accepted definition of intelligence and it can take different specific forms, including social, emotional, senso-motoric, and mental intelligence
- Artificial Intelligence (AI) uses the everyday notion of human intelligence as a starting point, focusing on computational precision
- AI is a multidisciplinary field involving the Design, Analysis, and Application of computer-based systems that meet intelligence criteria
- The Turing Test, also known as the imitation Game, was proposed by Alan Turing in 1950 as a demonstration for AI
- The test aims to define intelligence through indistinguishability from human intelligence
- A computer program is considered intelligent if its responses are indistinguishable from a human's
- Turing explored supercriticality in intelligence, drawing an analogy to nuclear fission
- A subcritical mind doesn't generate new ideas, while a supercritical mind amplifies them
- This raises the question of designing machines that actively think and innovate, not just process inputs
- An operational definition is needed, driven by visionary and pragmatic approaches
- The visionary approach focuses on creating AI systems with human-like intelligent behavior, while the pragmatic approach aims to replicate natural intelligence without necessarily mimicking human thought processes
- The Strong AI vs Weak AI debate centers around these motivations, questioning whether AI can truly "think" or just mimic intelligence
- The Chinese Room experiment argues that a system producing correct answers doesn't necessarily understand them, supporting the Weak AI position
- Perspectives on behavior and understanding diverge, with some arguing that only behavior matters (Weak AI view) and others asserting the necessity of true understanding (Strong AI view)
- Intelligence can be defined by acting humanly (imitating human behavior), acting rationally (doing the "right" things with a goal), thinking humanly (Cognitive Modelling), and thinking rationally (symbolic approach to AI)
- Goal, value, and Al alignment are essential, defining the "right" action while considering correctness, logic, or value
- Misaligned objectives can lead to unexpected consequences, emphasizing careful goal-setting
- AI Alignment ensures it aligns with human intentions, addressing concerns in applications like self-driving cars/chatbots where unintended behavior can have consequences
Lecture 2 - Expert Systems
- Expert systems emulate human expert decision-making in specialized domains, using knowledge and inference
- They differ from intelligent systems by often lacking real-world interaction and embodiment
- Expert systems use rule-based decision making where intelligent systems integrate sensors, embodiment, and real-world interaction
Expert Systems vs Other Systems
- Expert systems combine knowledge and reasoning to make decisions, while Decision Support Systems (DSS's) focus on data processing and analytics to assist users
- Key difference: Expert Systems make decisions, DSS supports human decision-making
- Expert systems rely on user input and lack proactivity, while agents are autonomous and interact with their environment dynamically
- Key difference: Expert Systems require human guidance, while agents operate independently
How Expert Systems Work
- Core components: Knowledge Base that stores rules and facts, and an inference engine that applies rules to facts for conclusions
- Knowledge representation languages, like Description Logics, exist
- Probabilistic reasoning handles uncertainty and offers flexible decisions
Probabilistic Models
- Bayes' Rule updates beliefs based on new evidence
- Naïve Bayes assumes independent attributes for simplified probability calculations
- Graphical models, such as Bayesian Networks, use directed graphs with conditional probability tables
- Markov Networks use undirected graphs with cliques
- Factor Graphs offer a hybrid representation, grouping random variables into conditionally independent cliques
Marginalization vs Maximization
- Marginalization computes the probability of a variable by summing over all possibilities
- Maximization finds the most probable value Marginalizing with Variable Elimination computes accurate probabilities efficiently by eliminating variables step-by-step
Loopy Belief Propagation
- Loopy Belief Propagation is an iterative algorithm for approximate probabilistic inference in networks with cycles
Expert Systems Pros and Cons
- Advantages: Includes consistency, memory, logic, reproducibility
- Disadvantages: Limited common sense, creativity, adaptability, and reliance on manual updates
Lecture 3 - Agents
- Agents can be autonomous, adaptive, situated, goal-oriented, and persistent
- Mental attitudes, such as beliefs, intentions, and desires, are associated with agents
- Examples of include interacting with the environment, or offering autonomous advice like digital assistants
Agent Concepts
- Agents and objects share identity, state, and passive behavior but agents uniquely possess active behavior
- They can be logic-based (using symbolic AI) or reactive (responding to stimuli)
- Minimal agents should perceive their environment and act through actuators, intelligent agents add pro-activeness, reactivity, and social ability
- Agent environments vary by accessibility, determinism, episodicity, dynamics, and action space
Agent Architectures
- Logic-based agents use symbolic representation and logical deduction, while reactive agents respond directly to their environment
- Planning/search agents use an environmental model to determine actions
- Agents can be categorized as simple reactive, model-based, or goal-based
- Model-based agents use beliefs about the environment, updating these based on predictions and observations
- Goal-based agents involve explicit goal evaluation and determination
- Architectures can be layered horizontally or vertically for proactive and reactive behavior
Lecture 4 - Multi Agent Coordination
- Coordination involves managing dependencies between activities, where agents are aware of their interdependencies
- Key aspects of coordination: environment (diversity, dynamics, predictability), cooperating entities (number, homogeneity, goals), and cooperation (frequency, levels, patterns)
- Coordination theory focuses on goals, activities, actors, and interdependencies to understand this domain
- Types of Interdependencies: prerequisite, shared resource, and simultaneity
Why Coordinate
- The Principle of bounded rationality, limited human processing capacity, and the increasing complexity of computer applications all necessitate it
Coordination Models
- Basic models include client-server, involving requests and responses between processes
- Task and result sharing is used to divide a large process into sub processes
Blackboard Model
- Blackboard Model allows agents to share memory and contribute together to generate a solution
The Contract Net Model
- The Contract Net Model is when a manager announces tasks, contractors bid, and the best bid is selected
FA/C Principle
- The FA/C Principle (Functionally Accurate Cooperation) serves as a guideline when working individual local knowledge that is incomplete, uncertain and inconsistent
Joint Planning
- Joint Planning addresses how multiple agents coordinate plans towards one larger goal
- Taxonomy of planning: single-component or multi-component approaches
- Relationships among plans: positive (Equality, Subsumption, Favorableness) and negative (Resource conflicts, Incompatibility)
Partial Global Planning (PGP)
- PGP extends the FA/C principle by allowing agents to reason about actions while decentralized
- Local actions may occur without joint agreement, based on relatively simple abstraction
Lecture 5 - Evolutionary Algorithms
- Solutions can occur through an evolutionary process Each encoding must be translated into a functional phenotype
Populations
- Involves a population of candidate solutions and is needed to be sufficiently diverse and large
- Population will act as the basis for better solutions, and improves each generation
Genotype vs. Phenotype
- In evolutionary algorithms, a genotype represents a potential solution's encoding, while the phenotype is the actual solution derived from it
- simplest form of a genotype is a bit string, but other representations exist
- Genotype gives rise to Phenotype, Phenotype can be tested for performance
Selection: Fitness
- Fitness is usually the criterion that needs to be optimized. The evaluation of fitness can be quite involved
- Will limit the number of individuals that make up the population and therefore the fitness function
- Solution: Sampling may be used to speed things up
Fitness Based
- The performance of candidate solutions is used to steer selection probabilities for survival, and reproduction probabilities to generate offspring and survival of the genotype
- Includes: Roulette Wheel Selection and Fitness wheel tuning, Boltzmann or Gibbs distribution
- Also includes: Tournament Selection
Reproduction
- Occurs through crossover and mutation
- Two parents share their genetic code and generate offspring
Crossover
- Includes: Single point crossover, Double point crossover, and Uniform crossover
- There are many other possibilities dependent on genome-structure
Schemata
- Theory offers some insight into why good solutions survive and get combined into even better solutions
- Dealt with at the base level with elementary bit strings
Schema - definition
- Consider binary genotypes
- Binary numbers indicate fixed value positions
- ’s indicate positions for which the value does not matter. Think of a schema as a shared property between (some) genotypes Schema Schemata
Theory
- Can be used to reason about the disruptive properties of mutation and crossover With a low probability of surviving mutation, and lower probability of surviving singlepoint crossover
Dealing with problems
- Crossover and mutation can have destructive consequences, To solve we can use Elitism
- Which essentially means Having k-best individuals transferred to the population of the next generation avoids loss of good solutions
Cooperative Coevolution
- This provides a way to split a big problem into smaller parts that can be solved that are independent
- This makes the search for the best solution faster and easier and later be combined into a full solution.
Separable Natural Evolution Strategy
- Maintains a Gaussian (natural) distribution for each gene in a genotype in which each gene could be independent
- An individual from the population is generated by sampling each gene separately
Complex Solutions
- Evoluntionary algos dont scale well with problem complexity
- The building block theorem basically states that partial solutions should combine into the full solution
- Which means that the problem is divisible into smaller sub-problems
Fitness Requirements
- A fitness function needs to be able to separate partial solutions from non-solutions. And have there to be a spread of fitness values
Lecture 6, 7, 8 - Reinforcement Learning
- Reinforcement Learning is comparable to supervised learning and unsupervised learning:
- Reward or punishment = numerical value and it encompases artifical intelligence
- Includes: Supervised Learning and Reinforcement Learning Can either be Behavioral Cloning, or involves learning from interactions
Hardships With RL
- There are no learning examples,rewards arent immediate, rewards can by sprarse
RL - Terminology
- State: Where am I now? What does the world look like? Action: What will I do? Transition: How does this change where I am? Reward: What do I get for being here and doing what I did? Policy: How should I behave?
A Reinforcement Learning Problem
- State: Which square the agent is in, where visible cards are located
- Action: Moving, hit or stick, selecting open squares
- Transition: Moving to square, getting card, adds o to chosen square
- Reward: Depends if it is going to give a positve or negative or not reward at all
- Policy - learning what todo equals situation to action mapping, a policy
Notations
Transitions: all well if the probability distributions are stationary! Discount Rewards: Mathematically ensures that the sum of rewards becomes finite, and add some pressure to the agent
StateDependentRewards
- Reward: might not be very useful
- Q table stores the expected return for each state-action pair, given a specific policy, so the problem of selecting the optimal strategy in all states
Policies
- Is known as the learning goal for the rl agent and is mapping from states to probability distributions over actions
Methods
- Specifies how to change π with the experience collected by the agent in order to maximize G for all states S
Bandit Problem
- The challenge lies in balancing exploration, trying different actions to learn their rewards, and exploitation, choosing the best-known action to maximize gains
- The multi armed bandit problem has only 1 state If there are states but no transitions this creates a contextual bandit
Strategies
- Includes: Strategies, you need to try all option which is epsilon grredy or can use Upper condidence bounds
Value Based Methods
- Value-based methods are Reinforcement Learning (RL) approaches that focus on learning a value function to make decisions
- Includes: Estimating State values, Planning Problems and Terminal States Can also include, zero rewards until goal, and Terminal States and Thr Value
###DeterministicEnvironment
- With all uncertainty, states are given from 0 to 1 and one origin of uncertainty
Bellman Equation
- The equation breaks down a state's value into immediate rewards and future state values, weighted by their probabilities
- Can solve a system of equations to find the values for all states.
Iterations
- we start with a guess and update values in steps till we reach the correct values
- Sweep is were we keep updating state values
Sweeps and Bellman Error
a sweep means updating all state values once, and we keep doing sweeps until the values stabilize
Algorithms
- Iterative Policy Algorithm Given: MDP dynamics p and a policy π Find: Vs ∈ S : υπ(8)
Policy Iteration Involves two iterating algorithms, policy and value in each Guarentees that the new policy performs at least as well as the previous one.
Monte Carlo
- Has Monte carlo sampling estimates the value of taking action a in stats by following a policy pie With the goal to fix first action of rollback to and then follow policy for remainder Note that is Not nessarly a stand alone algo and Is a method for estimating valuing functions within rl.
TD learing
- Q values Will be filled in when the action is chosen by theRl agent
SARSA
Stands for State:Action:Reward:State Action On policy temporal difference algorithm
Boltzmann
Boltzmann Exploration is a more informed alternative to É›-greedy You need to tune temperatures to decay according to algorithm
LearningAlgos
Is learning an optimal strategy always ideal, that can sarca learn to adapt.
Q Learning
- Better implovment to sarcas here well implovment in values
- Where the optimal action instead of one taken. You can can now consult it current values and see to find optimal values to find its value
Direct Policy Search Methods
- Direct Policy Search is an alternative to Q-learning because it directly optimizes the policy without needing to estimate value functions
- Direct Policy Search uses the frequency of visited states to create a State Distribution to replace using v(s)
- You then find Steady state distribution, which satisfies probability of being stable over time
Policy Optomzations
- Can involve genetic algos can be involved in other techinqes Genetic algos can used to improve overtime as the evolve Seperable natural eval strat involves guassian distrib for each policy.
Using neuro evolution
Neuro evolution is a varient of evolutionary for policy search that helps neural nets Youes direct enconding to help the NN tune parameters Than you can policy a gradient to create functions that increase probabilitities across neural space
Lecture 9 - Multi Agent Reinforcement Learning
- Multi Agent Reinforcement Learning: Involves many tasks over many players that need to coop and are complex
- One agent informs what the overall mode looks.
- So when you want to use single agent as a multiagent. One must take the following actions.
- Must be from < Se, SA1, Sam> The spaces the action exponentially grows
Cooporating MARL
- Allows for Multiple agenst to coop a goal. Allows for rewards across systems.
CreditAssignentproblem
- Looks at value learning methods . To figure what to figure reward Is which agents had this to reward
- In multiagent additionally The actions of which agents led to this reward? Can some times create campetitve MAS
Best Response Theory
- It is important that is optimal give to other strategies as a way to stay ahead
Nash Equilibrium.
- Multi nash equilibrium exist this way This method Is a method were know one improves there strat
Non-stations environment
- There no long a markov system here The way is to apply Reinforce learning with specialied learning algos
NormalForms
- Example game is prision dileman two pres campaginers commit crime can they confess with eachother
Eye wide shut approc
- The other agents are ignored for single agent learinng that create lost covage ,it also might still help on different sectors
Matching Pennies
- Here two agents try out witt each other by going through action Here this set often changes at 0 Here you have to do find algos and parameter settings
Stable through simplicity
Automata approach Is a more direct that will go adjust actions
Actions for automata
- All well
- Actions that do well get higher prob not well probability.
The Stateautomata handle
- Handing out for getting actions with states. Average reward,Cross learnings,Vectors
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.