Foundation of AI and ML (4351601) PDF
Document Details
Sir Bhavsinhji Polytechnic Institute
H. P. Jagad
Tags
Related
Summary
These lecture notes cover the foundations of artificial intelligence and machine learning. The document details different types of machine learning, including supervised, unsupervised, and reinforcement learning. It also discusses concepts such as well-posed learning problems and different machine learning algorithms. The notes are targeted at undergraduate-level students.
Full Transcript
Foundation of AI and ML (4351601) Foundation of ML - H. P. Jagad Lecturer(IT) Sir BPTI Bhavnagar http://hpjagad.blogspot.com Unit-2...
Foundation of AI and ML (4351601) Foundation of ML - H. P. Jagad Lecturer(IT) Sir BPTI Bhavnagar http://hpjagad.blogspot.com Unit-2 1 Process by which individual acquire knowledge, skill & understanding through experiences, observations & interactions with the world around them. Acquisition of knowledge Processing & understanding Practice & Repetition Human Feedback & Correction Learning Application & Experience Reflection: - Thinking about own learning process Metacognition: - Thinking about others’ learning Social Interaction & Collaboration Complex & Life long process. Unit-2 2 Introduction To Machine Learning Unit-2 3 Introduction To Machine Learning It was first introduced by Arthur Samuel in 1959 at IBM. Sample historical data - training data, ML algorithms build a mathematical model that helps in making predictions or decisions without being explicitly programmed. ML system learns from historical data, builds the It is a subset of AI, which enables the machine to prediction models, and automatically learn from data, whenever it receives new data, predicts the output for it. improve performance from past experiences & make predictions. Unit-2 4 Tom Mitchell defines ML as “A computer program is said to learn from experience E in context to some task T and some performance measure P, if its performance on T, as was measured by P, upgrades with experience E. “ Any problem is called Well posed if it has T, P, E The framework involves 3 questions: Well Posed 1. What is the problem? Learning 2. Why does the problem need to be solved? Problem 3. How to solve the problem? Ex.- Handwriting Recognition Problem T– recognizing and classifying handwritten words within images P– % of words accurately classified E – Database of handwritten words with given classifications Unit-2 5 Types of Machine Learning Machine Learning Supervised Unsupervised Reinforcement Classification Regression Clustering Association (Define labels) (No labels) (Group of Similar) (dependency) Unit-2 6 Train the machines using the "labelled" dataset, and based on the training, the machine predicts the output. Training data contains labelled categories for each of the available images. Provide the training to the machine to understand the images, such as the shape, size etc. Supervised I/P the image of a Triangle (Test data) & ask the m/c to Learning identify the object and predict the o/p. Now, the m/c is well / trained, so it will check all the features & find the o/p. Predictive Learning Unit-2 7 Machine learns from training data and uses the knowledge to assign labels or categorized to new, unseen test data. The target variable is a categorical value. The goal is to predict the class or category of the target variable based on the input Supervised variables. Learning The output variable is categorical, such as Yes or No, Male or 1. Female, Red or Blue etc. Classification Popular ML Algorithms- Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine & k-Nearest Neighbours where y = categorical output y=f(x) Unit-2 8 The target variable is a real/continuous Supervised Learning value. The goal is to predict the value of the 2. Regression target variable based on the input variables. It estimates the relationship between the target and the independent variable. It is used to find the trends in data. By performing it, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Used to predict continuous output variables, such as market trends, weather prediction, exam marks or sales revenue etc. In Simple Linear Regression, there is one predictor while in multiple LR, multiple predictor can be used. Y=aX+b Unit-2 9 Predict game results: - Based on historical data Medical Diagnosis: - Based on past labelled data with labels for disease conditions. Supervised Predict price of stock or real estate: - Based on location, size, historical data and market trends. Learning Classify Texts: - Analyze text data such as email, article or Applications messages and classify them into different categories like emails are spam or not. Speech Recognition Unit-2 10 There is no labelled training data to learn from & no specific predictions are made. No need for supervision. To group or categories the unsorted dataset according to the similarities, patterns, and differences. Machines are instructed to find the hidden patterns from the input dataset. Focus on Pattern discovery or knowledge discovery. Unsupervised Machine Learning Unit-2 11 Unsupervised Large set of data are grouped into clusters of small set of similar data. Put similar items in one cluster Learning & dissimilar in another cluster. 1. Clustering Finding some similar patterns in the unlabelled dataset such as shape, size, color, behavior, etc. and divides them as per the presence and absence of those similar patterns. If distance is small, that data items are considered part of same cluster. High distance suggests that item do not belong to same cluster. That is called distance based clustering. Ex- grouping the customers by their purchasing behavior, Network Analysis: identifying plagiarism & copyright Clustering v/s classification: Similar but the difference is type of dataset that we are using. In classification, work with labeled dataset. In clustering, Unit-2 work with unlabelled dataset. 12 It finds interesting relations among variables within a large dataset. Find the dependency of one data item on another data item and map those variables accordingly so that it can generate maximum profit. It determines the set of items that occurs together in the dataset. It makes marketing strategy more effective. Such as people Unsupervised who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. Learning Ex- Market Basket analysis, Web usage recommendation system etc. 2. Association Unit-2 13 An AI agent (A software component) automatically explore its surrounding by hitting & trail, taking action, learning Reinforcement from experiences, and improving its performance. Learning Agent gets rewarded for each good action and get punished for each bad action. Goal is to maximize the rewards. Feedback-based process This process is similar to a human being/animal. Ex- a child learns various things by experiences Ex- To play a game, ✓Game - Environment, ✓Moves of an agent at each step define states ✓Goal of the agent is to get a high score. It is a type of ML method where an intelligent agent (computer program) interacts with the environment and learns to act within that. Unit-2 14 It is a core part of AI, and all AI agent works on the concept of reinforcement learning. Do not need to pre-program the agent, as it learns from its own experience without any human intervention. The agent continues doing 3 things: ✓take action ✓change state/remain in the same state and ✓get feedback Reinforcement He learns and explores environment. Learning Agent learns that what actions lead to +ve feedback (rewards) & what actions lead to -ve feedback (penalty). Applications: Video Games: AlphaGO, Resource Management, Robotics, Self driven car RL Algorithms Example- Q-learning, Monte Carlo & Policy Unit-2 gradients to find best policy 15 1. Agent: An entity that can explore the environment and act upon it. 2. Environment: Physical world in which the agent operates 3. State: Current situation of the agent at specific given time. 4. Reward: Feedback given by an environment to the agent for a particular action. It can be a numeric value. They are given according to the good and bad actions taken by Key the agent. Main objective is to maximize the total number of Components of rewards for good actions. RL Reward can change the policy, such as if an action selected by the agent leads to low reward, then the policy may change to select other actions in the future. R(S) = Reward for simply being in the state S. R(S,a) = Reward for being in a state S and taking an action a. R(S, a, S’) = Reward for being in a state S, taking an action a and Unit-2 ending up in a state S’. 16 5. Action: The moves taken by an agent within the environment. 6. Policy: Policy is a strategy applied by the agent for the next action based on the current state. 7. Value: Future reward that an agent would receive by taking an action in a particular state 8. Q-value() or Action value: It is mostly similar to the value, but it takes one additional parameter as a current action (a). where Q stands for Quality. Initially in Action(a) X State(s) Key Table, Q-value for each pair(s,a) is 0. Components of Find optimal policy that maximize expected cumulative rewards over time. That process is called Value Iteration/Policy RL Iteration. Example ✓Agent- Self driving Car System ✓Environment- Road, Traffic Signal, Vehicles ✓State- Traffic Signal shows Red then Stop the car, Start the car, reverse the car etc. Unit-2 ✓Action- When it is green, it should start to move 17 1. No supervisor or pre-existing knowledge. Agent learns through its own interactions with the environment. It is based on the trial & Error process. 2. Sequential Decision making – RL involves making a series of decisions in a dynamic world. Sometimes The agent may get a delayed feedback. Key Features 3. The agent takes the next action and changes states (Characteristic) according to the feedback of the previous action. of RL 4. The environment is stochastic(random), and the agent needs to explore it to reach to get the maximum positive rewards. 5. Time plays a major role in reinforcement problems. 6. The following data it receives is determined by the agent’s actions. Unit-2 18 Behavior of an agent at a given time. Maps the states of the environment with actions. Core element/heart of the RL. It is a selection of which action to take, based on the current state. Find the optimal policy that maximizes expected cumulative future rewards. 2 Types of Policy: 1. Deterministic policy Approaches to π : S → A For each state s ∈ S, it takes the action a ∈ A that the agent will choose while in state s. implement RL For any state, the same action is produced by the policy π. 2. Stochastic(Random) policy Every action has a certain probability, which is determined by 1. Policy Based π: S X A → [0,1]. For each state s∈S and action a∈A, it takes the probability π(a∣s) that the agent chooses action a while in state s. For any state, each control is selected with possibly non-zero probability. Ex- Policy gradient methods-Reinforce and Proximal Policy Unit-2 Optimization(PPO) 19 The reward function R thus looks like this: R(No fruit) = -1 R(Pear) = +5 R(Apple) = +10 Example Policy & Reward in RL Unit-2 20 π1 = down, right, right → Pear π2 = right, right, right, down, down, down → Apple The agent then has to select between the two policies. π1 = -1-1+5 = +3 π2 = -1-1-1-1-1+10 = +5 Example Policy & Reward in RL Unit-2 21 Prediction of future reward. A reward - the immediate signal for each good and bad action, The goal of estimating values is to achieve more rewards. REWARD- IMMEDIATE RESPONSE VALUE - LONG TERM RESPONSE Approaches to Here, we try to maximize the value(Optimum) at a state implement RL under any policy. The value function gives information about how good the situation and action are and how much reward an agent can 2. Value Based expect & achieve from given state action pair. Value=Expected Cumulative Reward Ex- Q-learning, Deep Q-Network(DQN) A DQN approximates a state-value function in a Q-Learning framework with a neural network. Unit-2 22 As per Markov Property, If the agent is present in the current state S1, performs an action a1 and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states Markov Such as in a Chess game, the players only focus on the current state and do not need to remember past actions or states. Decision MDP contains a tuple of four elements (S, A, Pa, Ra): Process(MDP)* ✓A set of finite States S ✓A set of finite Actions A ✓Rewards received after transitioning from state S to state S', due to action a. ✓Probability Pa. *For Reference only Unit-2 23 It is a RL algorithm that finds an optimal action- Q-Learning* selection policy for any finite Markov decision process (MDP). It is a popular model-free RL algorithm. Learn the policy which can inform the agent that what actions should be taken for maximizing the reward under what circumstances. Q-table: A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e., [s, a], and initializes the values to zero. After each action, the table is updated, and the q- values are stored within the table. Q = Quality of an action. Unit-2 24 A model predicts what environment will do next. With the help of the model, one can get idea about how the environment will behave. Such as, if a state and action are given, then a model can predict the next state and reward. Used for planning. Considering all future situations before actually experiencing those situations. Approaches to Virtual model is created for the environment, and the agent implement RL explores that environment to learn it. Example Algorithms 3. Model Based Monte Carlo Tree Search(MCTS): - It figures out the best move out of a set of moves by Selecting → Expanding → Simulating → Updating the nodes in tree to find the final solution. Model Predictive Control (MPC): - Used to control a process that provides predictive output while satisfying a set of Unit-2 constraints. 25 a) Model-based algorithms It is used in situations where we have complete knowledge Approaches to about an environment and the outcome of the actions in that environment. implement RL Suitable for fixed or static environments. b) Model-free algorithms 3. Model Based The agent carries out multiple actions multiple times and learns from the outcomes. Based on the learning experience, it tries to decide a policy or a strategy to carry out actions with an aim to get optimal reward points. Applied to environments with a dynamic nature. Unit-2 26 Hybrid approach. Uses both Value based(Critic) & Policy based(Actor) methods. The actor decides which action should be taken & Critic inform the actor how good was the action and how it should adjust. Approaches to Ex- Advantage Actor-Critic(A2C) Algorithm- It introduces the concept of the advantage function.[A(s,a)] measures the implement RL advantage of taking action a in state s over the expected value of the state under the current policy. It measures how much better an action is compared to the average action in a given state. 4. Actor-Critic Asynchronous Advantage Actor-Critic with Generalized Methods Advantage Estimation(A3C-GAE) Algorithms- In A2C, a single actor-critic pair interacts with the environment and updates its policy based on the experiences it gathers. While A3C utilizes multiple actor-critic pairs operating simultaneously. Each pair interacts with a separate copy of the environment, collecting data independently. Unit-2 27 Agent must balance between Exploration & Exploitation. Help the agent to build online decision making in better way. Exploration - Agents primarily focus on improving their knowledge about each action instead of getting more rewards so that they can get long-term benefits. Approaches to Exploitation - Greedy approach in which agents try to get more implement RL rewards by using estimated value but not the actual value. So, Agents make the best decision based on current information. Explore environment & gather new information while exploiting 5. Exploration- the current knowledge. Exploitation Ex- e-greedy (Epsilon-Greedy) exploitation method: - To balance exploration and exploitation by choosing between Techniques exploration and exploitation randomly. Epsilon refers to the probability of choosing to explore, exploits most of the time with a small chance of exploring. Upper Confidence Bound(UCB), Thomson Sampling Methods Unit-2 28 1. Positive Reinforcement Adding something to increase the tendency that expected behavior would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior. Maximizes Performance Sustain Changes for a long period of time, but too much +ve Types of RL reinforcement may lead to an overload of states that can reduce the results. 2. Negative Reinforcement It is opposite to the +ve reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition. It can be more effective than the +ve reinforcement depending on situation and behavior, but it provides reinforcement only to Unit-2 meet minimum behavior. 29 Comparison Unit-2 30 Unit-2 31