Introduction to Machine Learning Lecture Notes PDF

Document Details

StylishSpessartine

Uploaded by StylishSpessartine

University of Science and Technology

2023

Noureldien Abdelrahman Noureldien

Tags

machine learning unsupervised learning machine learning algorithms data science

Summary

These lecture notes cover unsupervised learning in machine learning. Topics include clustering, dimensionality reduction, and anomaly detection. The notes are from the University of Science and Technology, and were created in 2023.

Full Transcript

**University of Science and Technology** **Faculty of Computer Science and Information Technology** **Department of Computer Science.... Semester 8** **Subject: Introduction to Machine Learning** Lecture (2): Basic Definitions and Concepts **Instructor: Prof. Noureldien Abdelrahman Noureldien D...

**University of Science and Technology** **Faculty of Computer Science and Information Technology** **Department of Computer Science.... Semester 8** **Subject: Introduction to Machine Learning** Lecture (2): Basic Definitions and Concepts **Instructor: Prof. Noureldien Abdelrahman Noureldien Date: 8-3-2023** **1.4.2 Unsupervised learning** In unsupervised learning, as you might guess, the training data is unlabeled (Figure 1-7). The system tries to learn without a teacher. a. ***[Unsupervised Tasks]*** Unsupervised learning tasks can be categorized under the following broad areas of ML tasks relevant to unsupervised learning. Clustering Virtualization - Dimensionality reduction Anomaly detection Association rule-mining 1. **Clustering** Clustering are machine learning tasks that try to find patterns of similarity and relationships among data samples in our dataset and then cluster these samples into various groups, such that each group or cluster of data samples has some similarity, based on the inherent attributes or features. These methods are completely unsupervised because **[they try to cluster data by looking at the data features without any prior training]**, supervision, or knowledge about data attributes, associations, and relationships. 2. **Visualization** **Visualization** algorithms are algorithms that present data in a visualization 2D or 3D form. **[They feed with a lot of complex and unlabeled data, and they output a 2D or 3D representation of the data that can easily be plotted]**. These algorithms try to preserve as much structure as they can (e.g., trying to keep separate clusters in the input space from overlapping in the visualization), so you can understand how the data is organized and perhaps identify unsuspected patterns. **3-Dimensionality reduction** **The learning algorithms attempt *[to simplify the data without losing too much information]*.** One way to do this is to merge several correlated features into one. For example, a car's mileage may be much correlated with its age, so the dimensionality reduction algorithm will merge them into one feature. **[This is called feature extraction.]** It is often a good idea to try to reduce the dimension of **[the training data using a dimensionality reduction algorithm before feeding it to another Machine Learning algorithm]** (such as a supervised learning algorithm). *It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform better*. There are multiple popular algorithms available for dimensionality reduction like Principal Component Analysis (PCA), Nearest Neighbors, and Discriminant Analysis. Another important unsupervised task is anomaly detection---**for example, detecting unusual credit card transactions to prevent fraud**, catching manufacturing defects, **[or automatically removing outliers from a dataset before feeding it to another learning algorithm]**. ***The system is shown mostly normal instances during training, so it learns to recognize them and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly*** (see Figure 1-10). ![](media/image2.png) The process of anomaly ***[detection is also termed as outlier detection]*, w**here we are interested in finding out occurrences of rare events or observations that typically do not occur normally based on historical data samples. Unsupervised learning methods can be used for anomaly detection such that we train the algorithm on the training dataset having normal, non-anomalous data samples. Once it learns the necessary data representations, patterns, and relations among attributes in normal samples, for any new data sample, it would be able to identify it as anomalous or a normal data point by using its learned knowledge. Finally, another [common unsupervised task is association rule learning], in which the goal is to dig into large amounts of data ***[and discover interesting relations between attributes]***. [For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who *purchase barbecue sauce and potato chips also tend to buy steak*.] Thus, you may want to place these items close to each other. **Association rule-mining is also often termed as *[market basket analysis]***, which is used to analyze customer shopping patterns. **Association rules help in detecting and predicting transactional patterns based on the knowledge it gains from training transactions**. Using this technique, we can answer questions like what items do people tend to buy together, thereby indicating frequent item sets. b. ***[Unsupervised Algorithms ]*** Here are some of the most important unsupervised learning algorithms (most of these are covered in Chapter 8 and Chapter 9): - **Clustering** --- K-Means --- DBSCAN --- Hierarchical Cluster Analysis (HCA) - **Anomaly detection** --- One-class SVM --- Isolation Forest - **Visualization and dimensionality reduction** --- Principal Component Analysis (PCA) --- Kernel PCA --- Locally-Linear Embedding (LLE) --- t-distributed Stochastic Neighbor Embedding (t-SNE) - **Association rule learning** --- Apriori --- Eclat **1.4.3 Semi supervised learning** Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data. This is called semi supervised learning. (Figure 1-11). **1.4.4. Reinforcement Learning** Reinforcement Learning is a very different. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return. It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation. 1.5 Batch **and Online Learning** Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data. **1.5.1 Batch learning** In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. **[This will generally take a lot of time and computing resources, so it is typically done offline]**. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. **[This is called offline learning]**. If you want **a batch learning system** to know about new data (such as a new type of spam), you need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data), then stop the old system and replace it with the new one. **1.5.2 Online learning** In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives (see Figure 1-13). ![](media/image4.png) Online learning is great for systems **[that receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly.]** It is also a good option if you have limited computing resources: once an online learning system has learned about new data instances, it does not need them anymore, so you can discard them (unless you want to be able to roll back to a previous state and "replay" the data). This can save a huge amount of space. Online learning [algorithms can also be used to train systems on huge datasets that cannot fit in one machine's main memory] (**[this is called out-of-core learning]**). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data (see Figure 1-14). One [important parameter of online learning systems is how fast they should adapt to changing data: this is called] the **[learning rate]**. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data. Conversely, if you set a low learning rate, the system will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of non-representative data points (outliers). A big challenge ***with online learning is that if bad data is fed to the system, the system's performance will gradually*** decline. If we are talking about a live system, your clients will notice. For example, bad data could come from a malfunctioning sensor on a robot, or from someone spamming a search engine to try to rank high in search results. To reduce this risk, you need to monitor your system closely and promptly switch learning off if you detect a drop in performance. You may also want to monitor the input data and react to abnormal data (e.g., using an anomaly detection algorithm). **1.6 Instance-Based Versus Model-Based Learning** **1.6.1 Instance Based Learning** A system is called to be learning by instance when it learns by heart from the data provided to it and thus generalizes or predicts on the basis of some similarity measure or similar feature that it found occurring on every data or some data. **1.6.2 Model Based Learning** A system is called model based when it learns from the data and creates a model, which has some parameters and it predicts the output by using this data trained model. I would not get into the mathematics but for better understanding you can imagine a model as a equation and the parameter (theta) and the input data (x) as variables in it. And by using optimizing techniques like Gradient Descent we find an optimal value of theta. Thus when we substitute both the optimized parameter and the input value i.e the test data into the model or the equation and we get the best output or prediction.

Use Quizgecko on...
Browser
Browser