Machine Learning - Blida 1 University

Blida 1 University M2 COMPUTER & NETWORK ENGINEERING Advanced Topics in Computer Systems and Networks Introduction to Machine Learning Contents ▪ Definition of machine Learning ▪ Basic concepts and terms ▪ Types of machine learning algorithms ▪ Supervised learning ▪ Classification ▪ Regression ▪ Unsupervised learning ▪ Clustering ▪ Reinforcement learning Definition of Machine Learning Machine learning algorithms (1) ◆ Machine learning (including deep learning) is a study of learning algorithms. A computer program is said to learn from experience 𝐸 with respect to some class of tasks 𝑇 and performance measure 𝑃 if its performance at tasks in 𝑇, as measured by 𝑃, improves with experience 𝐸. Learning Data Understanding algorithm (Experience) (Task) (Performance) Machine learning algorithms (2) Experience Historical data Induction Training Input Prediction Input Prediction New New Unknown problem Law Future data Model attribute Basic concepts and terms (1) Dataset Training set Test set Basic concepts and terms (2) ◆ Terms: Predicted 𝑃: positive examples, involving major interesting classes 𝑁: negative examples (other examples) Observed Total TP: true positive examples (positive examples that are correctly classified by the classifier) 𝑇𝑁: true negative examples (negative examples that are correctly classified by the classifier) 𝐹𝑃: false positive examples (negative examples that are incorrectly Total marked as positive examples) 𝐹𝑁: false negative examples (positive examples that are incorrectly Confusion matrix marked as negative examples) Basic concepts and terms (3) Measure Formula Accuracy and correct classification rate Error rate and false classification rate Specificity and true negative rate Basic concepts and terms : example We have trained a machine learning model to identify that whether the object in an image is a cat. Now we use 200 images to test its performance. Among 200 images, 170 contain cats and 30 do not contain cats. The identification result of the model is that 160 images contain cats and 40 do not contain cats. Predicted Total Observed 140 30 170 20 10 30 Total 160 40 200 Types of machine learning algorithms (1) Philosophy Supervised Unsupervised Semi-supervised Reinforcement learning learning learning learning Types of machine learning algorithms (1) Philosophy Types of machine learning algorithms (2) Supervised learning Algorithms Classification : Decision trees (1) Classification : Decision trees (2) Decision trees are supervised learning algorithms used for both, classification and regression tasks. The main idea of decision trees is to find the best descriptive features which contain the most information regarding the target feature and then split the dataset along the values of these features such that the target feature values for the resulting sub_datasets are as pure as possible. The descriptive feature which leaves the target feature most purely is said to be the most informative one. This process of finding the most informative feature is done until we accomplish a stopping criterion where we then finally end up in so-called leaf nodes. The leaf nodes contain the predictions we will make for new query instances presented to our trained model. This is possible since the model has kind of learned the underlying structure of the training data and hence can, given some assumptions, make predictions about the target feature value (class) of unseen query instances. Classification : Building a decision tree model (1) Classification : Building a decision tree model (2) Attribute selection measure (ASM): is a heuristic for selecting the splitting criterion that partition data into the best possible manner. Classification : Building a decision tree model (3) 1)Select a test for root node. Create branch for each possible outcome of the test. 2) Split instances into subsets. One for each branch extending from the node. 3) Repeat recursively for each branch, using only instances that reach the branch. 4) Stop recursion for a branch if all its instances have the same class Classification : Decision trees example (1) Classification : Decision trees example (2) Classification : Exploiting a decision tree learning Model Estimating Class Probabilities: A Decision Tree can also estimate the probability that an instance belongs to a particular class k: first it traverses the tree to find the leaf node for this instance, and then it returns the ratio of training instances of class k in this node. Classification : Exploiting a decision tree learning Model Best known ID3 & C4.5 (Quinlan) CART (Breiman, Friedman, Olshen & Stone) very fast to train and evaluate relatively easy to interpret but: accuracy often not state-of-the-art (compared to other classifiers Classification : Naïve Bayes classifier Bayesian classifiers use Bayes theorem to find the more likely class of an object assuming that all the attributes are independent (not usually true) Just an approximation we make to be able to make predictions This is called the “Naive Bayes” assumption, hence the name Classification : Naïve Bayes (bayes’ Theorem) Classification : Naïve Bayes (bayes’ Theorem ) Classification : Naïve Bayes (bayes’s Theorem for classification) Classification : Naïve Bayes (bayes’s Theorem for classification) Training Naïve Bayes, is estimating parameters Classification : Naïve Bayes classifier Very fast Works well with text and problems where attributes are independent Can do well where attributes >> number of instances Unsupervised Algorithms Unsupervised Algorithms: Clustering Process of grouping similar items together ─ Clusters should be very similar to each other, but… ─ Should be very different from the objects of other clusters/ other clusters ─ Intra-cluster similarity between objects is high and inter-cluster similarity is low ─ Important human activity --- used from early childhood in distinguishing between different items such as cars and cats, animals and plants etc. Unsupervised Algorithms: Clustering Unsupervised Algorithms: Clustering Clustering Evaluation ❑Manual inspection ❑ Benchmarking on existing labels ❑Cluster quality measures ❑–distance measures ❑–high similarity within a cluster, low across ❑clusters Unsupervised Algorithms: Clustering Distance functions Simplest case: one numeric attribute A – Distance(X,Y) = A(X) – A(Y) Several numeric attributes: – Distance(X,Y) = Euclidean distance between X,Y Are all attributes equally important? – Weighting the attributes might be necessary Unsupervised Algorithms: Clustering Algorithm K-Means k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean Strengths Simple iterative method User provides “K” Weaknesses Often too simple → bad results Difficult to guess the correct “K” Unsupervised Algorithms: Clustering K-Means: Cluster Center K-means Clustering Basic Algorithm: Step 0: select K Step 1: randomly select initial cluster seeds Seed 1 650 Seed 2 200 K-means Clustering An initial cluster seed represents the “mean value” of its cluster. In the preceding figure: – Cluster seed 1 = 650 – Cluster seed 2 = 200 K-means Clustering Step 2: calculate distance from each object to each cluster seed. What type of distance should we use? – Squared Euclidean distance Step 3: Assign each object to the closest cluster K-means Clustering Seed 1 Seed 2 K-means Clustering Step 4: Compute the new centroid for each cluster Cluster Seed 1 708.9 Cluster Seed 2 214.2 K-means Clustering Iterate: – Calculate distance from objects to cluster centroids. – Assign objects to closest cluster – Recalculate new centroids Stop based on convergence criteria – No change in clusters – Max iterations K-means Issues Distance measure is squared Euclidean – Scale should be similar in all dimensions Rescale data? – Not good for nominal data. Why? Approach tries to minimize the within-cluster sum of squares error (WCSS) – Implicit assumption that SSE is similar for each group WCSS The over all WCSS is given by: σ𝑘𝑖=1 σ𝑥∈𝐶𝑖 𝑥 − 𝜇𝑖 2 The goal is to find the smallest WCSS Does this depend on the initial seed values? Possibly. Reinforcement Learning Reinforcement learning (RL) is a machine learning approach that deals with sequential decision-making, aiming to map situations to actions in a way that maximizes the associated reward. Unlike supervised learning, where explicit instructions are given after each system action, in the RL framework, the learner, known as an agent, is not provided with explicit guidance on which actions to take at each timestep 𝑡. The RL agent must explore through trial and error to determine which actions yield the highest rewards. The standard theoretical framework for RL is based on a Markov Decision Process (MDP), which extends the concept of a Markov process and is used to model decision- making based on states, actions, and rewards. Reinforcement Learning Algorithms Reinforcement Algorithms: Q-Learning Q-learning algorithm pseudocode Input: a set of states s and actions a Agents: Entities that operate within an environment, making decisions and Output: final state taking actions. BEGIN For each state s and action a States: Variables that identify an agent’s Q(si, ai) = 0 //Initialize Q-Table by zero current position in the environment. End For Actions: Operations undertaken by the Randomly choose an initial state s agent in specific states. While the termination criterion is not reached do Rewards: Positive or negative Choose the best action at for the current state st from Q-Table responses provided to the agent based Execute the chosen action at and get the intermediate reward R on its actions. Get the new state st+1 when executing at Episodes: Instances where an agent Get the maximum Q-value for st+1 from Q-table concludes its actions, marking the end Update the Q-table by the Bellman equation () of an episode. Update the state st to st+1 Q-values: Metrics used to evaluate End While actions at specific states. END Output: Return the final state Reinforcement Algorithms: Q-Learning The Q-table : is a repository of rewards associated with optimal actions for each state in a given environment. It serves as a guide for the agent, helping it determine which actions are likely to yield the best outcomes. As the agent interacts with the environment, the Q-table is dynamically updated to reflect the agent’s evolving understanding, enabling more informed decision-making. Methods for Determining Q-Values: There are two methods for determining Q-values Temporal Difference: Calculated by comparing the current state and action values with the previous ones. Bellman’s Equation: A recursive formula invented by Richard Bellman in 1957, used to calculate the value of a given state and determine its optimal position. It provides a recursive formula for calculating the value of a given state in a Markov Decision Process (MDP) and is particularly influential in the context of Q-learning and optimal decision-making. Reinforcement Algorithms: Q-Learning Bellman’s Equation is expressed as 𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑄 𝑠𝑡 , 𝑎𝑡 + 𝜆 𝑅𝑡+1 + 𝛾 𝑚𝑎𝑥𝑎 𝑄 𝑠𝑡+1 , 𝑎 − 𝑄 𝑠𝑡 , 𝑎𝑡 Where, Q(s,a) is the Q-value for a given state-action pair R(s,a) is the immediate reward for taking action a in state s. gamma is the discount factor, representing the importance of future rewards. maxaQ(s′,a) is the maximum Q-value for the next state ′s′ and all possible actions. Thanks

Machine Learning - Blida 1 University

Document Details

Tags

Related

Summary

Full Transcript