Machine Learning Overview

Study Notes

Machine Learning Overview

Machine learning is often combined with deep learning methods to study and observe AI algorithms.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Machine Learning Algorithms

Machine learning algorithms are used to solve complex problems, or those involving a large amount of data whose distribution function cannot be determined.

Differences Between Machine Learning Algorithms and Traditional Rule-based Methods

Rule-based methods use explicit programming to solve problems, whereas machine learning algorithms automatically learn rules from data.

When to Use Machine Learning

Machine learning provides solutions to complex problems, or those involving a large amount of data whose distribution function cannot be determined.
Consider using machine learning when task rules change over time, or data distribution changes over time and programs need to adapt to new data constantly.

Rationale of Machine Learning Algorithms

The objective function f is unknown, and the learning algorithm cannot obtain a perfect function f.
Hypothesis function g approximates function f, but may be different from function f.

Main Problems Solved by Machine Learning

Machine learning can solve many types of tasks, including classification, regression, and clustering.

Types of Machine Learning

Supervised learning: the program takes a known set of samples and trains an optimal model to generate predictions.
Unsupervised learning: the program builds a model based on unlabeled input data.
Semi-supervised learning: the program trains a model through a combination of a small amount of labeled data and a large amount of unlabeled data.
Reinforcement learning: the learning system learns behavior from the environment to maximize the value of reward (reinforcement) signal function.

Machine Learning Process

The machine learning process involves data preparation, model training, model evaluation, and model deployment.

Important Machine Learning Concepts

Dataset: a collection of data used in machine learning tasks, where each piece of data is called a sample.
Training set: dataset used in the training process, where each sample is called a training sample.
Test set: dataset used in the testing process, where each sample is called a test sample.

Data Overview

A typical dataset consists of features and labels.

Importance of Data Processing

Data is crucial to models and determines the scope of model capabilities.
Data preprocessing involves data filtering, data loss handling, handling of possible error or abnormal values, merging of data from multiple sources, and data consolidation.

Data Cleansing

Data preprocessing involves the following operations: data filtering, data loss handling, handling of possible error or abnormal values, merging of data from multiple sources, and data consolidation.

Dirty Data

Raw data usually contains data quality problems, including incompleteness, noise, and inconsistency.

Data Conversion

Preprocessed data needs to be converted into a representation suitable for machine learning models.
Data conversion involves encoding categorical data into numerals, converting numeric data into categorical data, and feature engineering.

Necessity of Feature Selection

Feature selection is necessary to simplify models, shorten training time, and improve model generalization.

Feature Selection Methods

Filter methods are independent of models during feature selection.### Feature Selection Methods
Filter methods evaluate each feature by scoring them using a statistics measurement and then sort them by score.
Filter methods can preserve or eliminate specific features.
Common filter methods include:
- Pearson correlation coefficient
- Chi-square coefficient
- Mutual information
Limitations of filter methods:
- Tend to select redundant variables because they do not consider relationships between features.

Wrapper Methods

Wrapper methods use a prediction model to score a feature subset and treat feature selection as a search issue.
Wrapper methods evaluate and compare different combinations of features.
Common wrapper method:
- Recursive feature elimination
Limitations of wrapper methods:
- Train a new model for each feature subset, which can be computationally intensive.
- Provide high-performance feature sets for a specific type of model.

Embedded Methods

Embedded methods treat feature selection as a part of the modeling process.
Regularization is the most common type of embedded method.
Regularization methods introduce additional constraints into the optimization of a predictive algorithm to bias the model toward lower complexity and reduce the number of features.
Common embedded method:
- LASSO regression

Supervised Learning Example

The learning phase involves training a classification model to determine whether a person is a basketball player based on specific features.
Features (attributes) include:
- Service data
- Name
- City
- Age
Target (label) is "yes" or "no" indicating whether a person is a basketball player.
The model is trained on a training set and evaluated on a test set.
In the prediction phase, the model is applied to new data to determine whether a person is a basketball player.
The model uses each feature or set of features to provide a judgment basis for the prediction.

Machine Learning Overview

Choose a study mode

Podcast

Questions and Answers

Which of the following are three common types of tasks that machine learning can solve?

What is the main difference between classification and regression tasks in machine learning?

Which type of learning involves building a model based on unlabeled input data?

In supervised learning, the program maps all inputs to outputs without training a model first.

Regression tasks aim to discover the dependency between attributes by expressing the sample mapping relationship using a ____________.

What is the purpose of regularization methods in a predictive algorithm?

What is a common method of an embedded method process?

In supervised learning, what does the training set consist of?

What is the purpose of the test set in supervised learning?

In the prediction phase of supervised learning, what is the label for the data for Marine from Miami with an age of 45?

What is the purpose of reinforcement learning?

What is the dataset used in the training process called?

_______ is crucial to models and determines the scope of model capabilities.

Features in machine learning models are usually numeric representations of input variables.

Match the feature selection method with its description:

Study Notes

Machine Learning Overview

Machine Learning Algorithms

Differences Between Machine Learning Algorithms and Traditional Rule-based Methods

When to Use Machine Learning

Rationale of Machine Learning Algorithms

Main Problems Solved by Machine Learning

Types of Machine Learning

Machine Learning Process

Important Machine Learning Concepts

Data Overview

Importance of Data Processing

Data Cleansing

Dirty Data

Data Conversion

Necessity of Feature Selection

Feature Selection Methods

Wrapper Methods

Embedded Methods

Supervised Learning Example

Studying That Suits You

Related Documents

More Like This

Machine Learning Fundamentals

ML Algorithms in Data Science and AI

DSAI2201: ML Algorithms Lecture

Foundations and Sub-areas of AI