Podcast
Questions and Answers
What does the kernel function determine in the mean shift algorithm?
What does the kernel function determine in the mean shift algorithm?
Which of the following is the formula for the FLAT KERNEL?
Which of the following is the formula for the FLAT KERNEL?
For which type of clustering algorithms is the GAUSSIAN kernel typically used?
For which type of clustering algorithms is the GAUSSIAN kernel typically used?
What determines the size of the region over which the mean shift algorithm calculates the local density?
What determines the size of the region over which the mean shift algorithm calculates the local density?
Signup and view all the answers
What are the two hyperplanes in hard margin classification known as?
What are the two hyperplanes in hard margin classification known as?
Signup and view all the answers
Which method is used to predict the correct class label with enough margin in soft margin classification?
Which method is used to predict the correct class label with enough margin in soft margin classification?
Signup and view all the answers
What is the hinge loss function associated with?
What is the hinge loss function associated with?
Signup and view all the answers
Which kernel function is defined as $k(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$?
Which kernel function is defined as $k(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$?
Signup and view all the answers
Which kernel has a constant output value within a specified bandwidth?
Which kernel has a constant output value within a specified bandwidth?
Signup and view all the answers
When dealing with linearly separable data, what is selected to separate the data as much as possible?
When dealing with linearly separable data, what is selected to separate the data as much as possible?
Signup and view all the answers
Study Notes
Machine Learning
- Machine learning (ML) involves building mathematical models to understand data and make predictions or decisions based on that data.
- "Learning" in ML refers to the ability of a model to adapt to observed data and make predictions or decisions based on that data.
Categories of Machine Learning
-
Supervised Learning: models learn from labeled data to predict labels or outcomes for new data.
- Classification: labels are discrete categories (e.g., spam vs. not spam emails).
- Regression: labels are continuous quantities (e.g., predicting a person's height).
-
Unsupervised Learning: models learn from unlabeled data to identify patterns or relationships.
- Clustering: identifying distinct groups of data.
- Dimensionality Reduction: finding more concise representations of data.
- Semi-supervised Learning: combines supervised and unsupervised learning, using labeled and unlabeled data.
- Reinforcement Learning: a model learns from interactions with a dynamic environment to achieve a goal.
Scikit-Learn
- Scikit-learn is a Python library for machine learning, providing efficient algorithms for predictive data analysis.
- Features:
- Classification, regression, and clustering algorithms.
- Support for supervised and unsupervised learning.
- Reinforcement learning requires GPUs for efficient computing.
Dataset
- A dataset is a collection of data, often organized as tabular data.
- Each row represents a sample, and each column represents a feature or attribute.
- Features can be numerical, categorical, or other types of data.
- A target variable is a feature whose values are used to make predictions.
Scikit-Learn's Estimator API
- Consistent interface for all objects.
- Inspectable parameters.
- Limited object hierarchy.
- Composition: many algorithms are composed of more fundamental algorithms.
- Sensible defaults: default values for parameters.
Dataset Loaders and Generators
- Scikit-learn provides dataset loaders and generators.
- Dataset loaders: load popular datasets from online repositories.
- Dataset generators: generate artificial datasets of controlled size and complexity.
Supervised Learning
- Supervised learning involves learning from labeled data to predict labels or outcomes.
- Goals: learn a function that maps input features to output labels.
- Supervised learning algorithms: learn from labeled data and make predictions on new data.
Classification
- Classification: predicting a discrete label or category.
- Classification algorithms: predict a label or category based on input features.
Regression
- Regression: predicting a continuous quantity or value.
- Regression algorithms: predict a continuous value based on input features.
Fitting, Regression, and Least Squares
- Linear regression: finds a linear relationship between input features and output values.
- Least squares: minimizes the sum of squared differences between predicted and actual values.
- Ordinary least squares (OLS): a common method for linear regression.
Nearest Neighbors Regression
- Nearest neighbors regression: predicts a value based on the nearest neighbors in a dataset.
- k-NN regression: predicts a value based on the k nearest neighbors.
- Radius-based regression: predicts a value based on neighbors within a fixed radius.
Regression Metrics
- R2 (coefficient of determination): measures the goodness of fit of a regression model.
- Score: computes the coefficient of determination (R2) of a regression model.
Classification Metrics
- Accuracy score: computes the accuracy of a classification model.
- Confusion matrix: a table that summarizes the performance of a classification model.
- Precision, recall, and F1 score: metrics that evaluate the performance of a classification model.### Distance Metrics
- Euclidean distance metric for continuous variables
- Hamming (Coverlap) distance metric for discrete variables
K-Nearest Neighbors (KNN)
- Drawback: Skewed distribution affects KNN performance
- Can be used with correlation coefficient (Pearson, Spearman) and assigned weights (1/k)
- Value of k: A larger value reduces noise effect, but makes boundaries less distinct
Training Examples
- Store feature vectors in multidimensional space with class labels
- Used in classification to assign labels to unlabeled data
Multiclass SVM
- 2-class classifier: One vs. the rest, One vs. one
Clustering
- Unsupervised learning: No target values, no labels, and no prior knowledge of classes
- Clustering algorithms: Various algorithms with different understanding of clusters
- Cluster models: Centroid, Connectivity, Distribution, Density
K-Means
- Clustering algorithm: Partitioning n observations into k clusters
- K-means with Sklearn: Compute k-means clustering, predict cluster index, and compute cluster centers
-
Attributes:
cluster_centers
,labels
,inertia
K-Median
- Variation of K-means: Calculates median instead of mean
- More noise-tolerant: Minimizes sum of distances, not maximum distance
Mean Shift
- Mode-seeking algorithm: Assigns datapoints to clusters based on density
- Non-parametric method: No prior knowledge of number of clusters or shape of clusters
Clustering Metrics
- Homogeneity: Each cluster contains only one class
- Completeness: All members of a class are assigned to the same cluster
- Mutual Information: Measures agreement between two assignments, ignoring permutations
Resampling
- Resampling to lower frequency: Involves aggregation operation
- Resampling to higher frequency: Involves interpolation or data filling methods
-
pandas
resample()
method: SplitsDatetimeIndex
into time bins and groups data by time bin
Rolling Windows
- Splitting data into time windows: Aggregates data with a function (e.g., mean, median, sum)
- Overlapping windows: 'Roll' along at the same frequency as the original time series
SVM - Classification
- Supervised learning model: Binary linear classifier for classification, regression, and outliers detection
- Maximal margin classifier: Finds optimal separating hyperplane, maximizing the margin
Kernel Trick
- Non-linear classification: Implicitly maps inputs into high-dimensional feature spaces
- Large margin classification: Fits the widest possible margin between two classes
Kernel Function
- Determines weight of nearby points: For re-estimation of the mean in mean shift
- Common kernel profiles: FLAT KERNEL and GAUSSIAN
Hard Margin
- Linearly separable data: Selects two hyperplanes that separate the data
Soft Margin
- Non-separable data: Maximizes the margin with a hinge loss function
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Learn about resampling methods in pandas, including aggregation and interpolation techniques, and how to use the DataFrame.resample() method to change frequency.