Chapter 1: The Machine Learning Landscape PDF
Document Details
Uploaded by FaithfulSymbol
Tags
Summary
This chapter provides an introduction to machine learning, covering its objectives, key concepts, applications, and various types of machine learning. It explores supervised learning, including tasks like classification and regression. Additionally, unsupervised learning techniques are introduced, highlighting clustering and anomaly detection.
Full Transcript
The Machine CHAPTER 1 Learning Landscape HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 1 2/ED Objectives ◦What is ML ? ◦Why is ML? ◦Types of ML Systems HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 2 2/ED...
The Machine CHAPTER 1 Learning Landscape HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 1 2/ED Objectives ◦What is ML ? ◦Why is ML? ◦Types of ML Systems HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 2 2/ED What is Machine Learning? Machine Learning is the science (and art) of programming computers so they can learn from data. [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. A computer —Arthur Samuel,program 1959 is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997 HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 3 2/ED Why Use Machine Learning? Consider how you would write a spam filter using traditional programming techniques. 1. First you would look, at what spam typically looks like. You might notice that some words or phrases(such as “4U”, “credit card”, “free”, and “amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and so on. 2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected. 3. You would test your program, and repeat steps 1 and 2 until it is good enough. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 4 2/ED Why Use Machine Learning? The Traditional Approach HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 5 2/ED Why Use Machine Learning? The Machine Learning Approach HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 6 2/ED Why Use Machine Learning? Automatically Adapting to Changes HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 7 2/ED Why Use Machine Learning? Machine Learning can help humans learn HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 8 2/ED Why Use Machine Learning? To summarize, Machine Learning is great for: Problems for which existing solutions require a lot of hand-tuning or lon lists of rules: one Machine Learning algorithm can often simplify code and perform better. Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. Fluctuating environments: a Machine Learning system can adapt to new data Getting insights about complex problems and large amounts of data. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 9 2/ED Types of Machine Learning Systems There are so many different types of Machine Learning systems that it i useful to classify them in broad categories based on: Whether or not they are trained with human supervision (supervised, unsupervised, Semi-supervised, and Reinforcement Learning) Whether or not they can learn incrementally on the fly (online versus batch learning) Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning) HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 10 2/ED Supervised/Unsupervised Learning Machine Learning systems can be classified according to the amount and type of supervision they get during training. There are four major categories: supervised learning, unsupervised learning, semi-supervised learning, and Reinforcement Learning. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 11 2/ED Supervised learning In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 12 2/ED Supervised learning ◦ A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails. ◦ Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is called regression HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 13 2/ED Supervised learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 14 2/ED Supervised learning Here are some of the most important supervised learning algorithms k-Nearest Neighbors Linear Regression Logistic Regression Support Vector Machines (SVMs) Decision Trees and Random Forests Neural networks2 HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 15 2/ED Unsupervised learning In unsupervised learning, as you might guess, the training data is unlabeled The system tries to learn without a teacher. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 16 2/ED Unsupervised Learning Here are some of the most important unsupervised learning algorithms : Clustering ◦K-Means ◦DBSCAN ◦Hierarchical Cluster Analysis (HCA) Anomaly detection and novelty detection ◦One-class SVM ◦Isolation Forest HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 17 2/ED Unsupervised Learning Clustering HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 18 2/ED Unsupervised Learning Anomaly Detection HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 19 2/ED Semi-supervised learning Some algorithms can deal with partially labeled training data, usua a lot of unlabeled data and a little bit of labeled data. This is called semisupervised learning. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 20 2/ED Reinforcement learning Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form if negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given solution.. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 21 2/ED Reinforcement learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 22 2/ED Batch and Online Learning Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 23 2/ED Batch Learning In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 24 2/ED Online Learning In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 25 2/ED Batch Learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 26 2/ED Online Learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 27 2/ED Offline Learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 28 2/ED Instance Based vs Model Based Learning One more way to categorize Machine Learning systems is by how they generalize. Most Machine Learning tasks are about making predictions. This means that given instance- based training examples, the system needs to be able to generalize to examples it has never seen before. Having a good performance measure on the training data is good, but insufficient; the true goal is to perform well on new instances. There are two main approaches to generalization: instance- based learning and model based learning... HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 29 2/ED Instance Based learning The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them), using a similarity measure. HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 30 2/ED Model Based learning Another way to generalize from a set of examples is to build a mod of these examples, then use that model to make predictions. This is called model-base learning HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 31 2/ED Main Challenges of Machine Learning In short, since your main task is to select a learning algorithm and train it on some data, the two things that can go wrong are “bad algorithm” and “bad data”. Let’s start with examples of bad data. 1. Insufficient Quantity of Training Data 2. Nonrepresentative Training Data 3. Poor Quality Data 4. Irrelevant Features 5. Overfitting the Training Data 6. Underfitting the Training Data 7. Stepping Back 8. Testing and Validating HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW- 32 2/ED