Machine Learning Concepts and Types

Hypothesis Class

A hypothesis class represents the set of all possible functions that a learning algorithm can output
It defines the space of potential solutions to the learning problem
Examples include linear models, decision trees, neural networks

Types of Regression

A supervised learning technique used for predicting continuous target variables
Linear Regression: Assumes a linear relationship between the independent and dependent variables
Logistic Regression: Predicts the probability of a binary outcome (0 or 1)
Polynomial Regression: Uses a polynomial function to fit the data, allowing for more complex relationships
Ridge Regression: Utilizes L2 regularization to prevent overfitting
Lasso Regression: Employs L1 regularization to perform feature selection by shrinking coefficients of irrelevant features to zero

Types of Machine Learning Techniques

Supervised Learning: Trains on labeled data to make predictions on unseen data
- Regression: Predicts continuous values (e.g., price, temperature)
- Classification: Predicts categorical values (e.g., spam or not spam, cat or dog)
Unsupervised Learning: Discovers patterns in unlabeled data
- Clustering: Groups data points into clusters based on similarity (e.g., customer segmentation)
- Dimensionality Reduction: Reduces the complexity of data by extracting essential features
Reinforcement Learning: Trains an agent to learn optimal actions by interacting with an environment
- Trial and Error: Agent learns through rewards and punishments for its actions
- Applications: Robotics, game playing

Roles of Vectors and Matrices in ML

Vectors: Represent data points as ordered lists of numbers
- Facilitates mathematical operations on data
- Enables efficient storage and retrieval of data
Matrices: Store and manipulate multi-dimensional data
- Representing data sets: Rows - instances, Columns - features
- Linear transformations: Matrix multiplication allows for feature scaling and rotation
- Solving linear equations: Essential in gradient descent and other optimization algorithms

Mean, Mode, and Median

Mean: The average of a set of numbers
Mode: The most frequent value in a set of numbers
Median: The middle value of a sorted dataset
Examples:
- List of natural numbers: Mean, mode, and median can be calculated directly
- List of random numbers: Mean, mode, and median may not be representative of the dataset due to randomness

Statistical Measures for Evaluating ML Performance

Accuracy: Proportion of correct predictions
Precision: Proportion of correctly predicted positive instances out of all predicted positive instances
Recall: Proportion of correctly predicted positive instances out of all actual positive instances
F1-score: Harmonic mean of precision and recall
Specificity: Proportion of correctly predicted negative instances out of all actual negative instances
Example: In a spam detection system, high precision means a low rate of false positive emails being classified as spam, while high recall means a low rate of actual spam emails not being detected.

Representations of Input Data Sets

Tabular Data: Data organized in rows and columns (e.g., CSV files)
Text Data: Unstructured data like documents, emails, and social media posts
Image Data: Pixel values representing images
Audio Data: Sound waves converted into numerical data
Time Series Data: Data collected over time, often with a temporal dependency

Null Hypothesis and Alternative Hypothesis

Null hypothesis (H0): A statement that there is no relationship between variables or no difference between groups
Alternative hypothesis (H1): A statement that there is a relationship or difference
Example: In a medical study, H0: The new drug does not improve patient outcomes. H1: The new drug improves patient outcomes.

Supervised and Unsupervised Learning

Supervised Learning: Trains a model on labeled data, with the goal of making predictions on new data.
- Regression aims to predict continuous values (e.g., house prices), while classification predicts categorical values (e.g., spam or not spam).
Unsupervised Learning: Discovers patterns in unlabeled data without any prior knowledge.
- Clustering groups similar data points together (e.g., customer segmentation) while dimensionality reduction simplifies data by extracting essential features.

Reinforcement Learning

An agent learns to perform an action in an environment to maximize rewards.
The agent learns by trial and error, receiving rewards for desirable actions and punishments for undesirable actions.
Examples include playing games, controlling robots, and optimizing resource allocation.

Working Procedure of a ML System

Data Collection and Preparation: Gather, cleanse, and prepare data for training.
Model Selection and Training: Choose a suitable model (e.g., linear regression, decision tree) and train it on the data.
Model Evaluation: Assess the performance of the trained model using metrics like accuracy, precision, and recall.
Model Deployment: Deploy the trained model to make predictions on new data.
Model Monitoring and Maintenance: Continuously monitor and update the model to maintain its performance.

Statistical Concepts in ML

Probability: A measure of the likelihood of an event occurring.
Distributions: Mathematical functions describing the distribution of data (e.g., normal distribution).
Hypothesis Testing: Used to determine if there is enough evidence to reject the null hypothesis.
Statistical Significance: A measure of the likelihood that the observed results are due to chance.
Confidence Intervals: A range of values that is likely to contain the true population parameter.
Bayesian Statistics: A framework for updating beliefs in the face of new evidence.

Exploratory Data Analysis

Analyzing and visualizing data to gain insights and understand its characteristics.
Examples:
- Histograms: Show the distribution of a single variable.
- Scatter plots: Investigate the relationship between two variables.
- Box plots: Show the distribution of data in terms of quartiles.
- Correlation matrix: Visualize the relationships between multiple variables.
- Pair plots: Display scatter plots for all pairs of variables in a dataset.

Mathematical Operations in Vectors and Matrices

Vector addition: Add corresponding elements of two vectors.
Vector subtraction: Subtract corresponding elements of two vectors.
Scalar multiplication: Multiply each element of a vector by a scalar.
Matrix addition: Add corresponding elements of two matrices.
Matrix subtraction: Subtract corresponding elements of two matrices.
Scalar multiplication: Multiply each element of a matrix by a scalar.
Matrix multiplication: Requires specific dimensions, multiplying rows of the first matrix by columns of the second matrix.
Transpose: Switch rows and columns of a matrix.
Inverse: Multiplied by the original matrix results in the identity matrix (only for square matrices).
Determinant: A scalar value representing the "scaling factor" of a linear transformation.

### Imbalanced Data

Datasets where one class has significantly more samples than other classes.
It can bias model training and lead to poor performance on the minority class.
Strategies to balance datasets:
- Oversampling: Duplicating instances of the minority class.
- Undersampling: Removing instances of the majority class.
- Synthetic Minority Oversampling Technique (SMOTE): Creating synthetic instances of the minority class.
- Cost-sensitive learning: Weighing the cost of misclassifying minority class instances higher.

Multi-class Classification

Classifying instances into more than two classes.
Methods:
- One-vs-rest: Training separate binary classifiers for each class, comparing the probability scores.
- One-vs-one: Training a binary classifier for each pair of classes.
- Softmax Regression: A generalization of logistic regression for multi-class classification, where the predictions are probabilities for each class.
- Example: Identifying different flower species (Iris setosa, Iris versicolor, Iris virginica).

Confusion Matrix and Performance Metrics

Confusion Matrix: A table summarizing the classification performance by showing the number of correct and incorrect predictions for each class.
True Positives (TP): Correctly predicted positive instances.
True Negatives (TN): Correctly predicted negative instances.
False Positives (FP): Incorrectly predicted positive instances (Type I error).
False Negatives (FN): Incorrectly predicted negative instances (Type II error).
Example: A spam detection system correctly identifies 100 spam emails and also correctly identifies 200 non-spam emails. But it also misidentifies 10 non-spam emails as spam (FP) and wrongly identifies 5 spam emails as non-spam (FN).

Probably Approximately Correct (PAC) Learning

A formal framework for analyzing the learnability of concepts.
A concept is considered PAC learnable if there exists an algorithm that can learn a hypothesis that is close to the true concept with high probability.
Key factors:
- Sample complexity: The number of training examples required.
- Computational complexity: Time and resources required to learn the concept.
- Error tolerance: The maximum allowed error between the learned hypothesis and the true concept.

### Comparison

L1 regularization vs L2 regularization:
- L1 (Lasso): Promotes sparsity by shrinking coefficients of irrelevant features to exactly zero.
- L2 (Ridge): Penalizes large coefficients, shrinking them towards zero but rarely reaching zero.
Binary classifier vs Multi-class classifier:
- Binary classifier: Distinguishes between two classes (e.g., spam or not spam).
- Multi-class classifier: Differentiates between more than two classes (e.g., cat, dog, bird).
Overfitting vs Underfitting:
- Overfitting: A model learns the training data too well, resulting in poor generalization to unseen data.
- Underfitting: A model fails to capture the underlying patterns in the data and performs poorly on both training and testing data.
Feature selection vs Feature extraction:
- Feature selection: Choosing a subset of the original features.
- Feature Extraction: Transforming the original features into a new set of features.
Dependence vs Independence event:
- Dependent events: The outcome of one event influences the outcome of another event.
- Independent events: The outcome of one event does not influence the outcome of another event.

Definitions

Conditional probability: The probability of an event occurring given that another event has already occurred.
Bias: Refers to the difference between the average prediction of a model and the true value.
Variance: Measures the variability of the model's predictions for different training sets.
Teacher noise: Errors or inconsistencies in the labels of the training data.
L1 norm: The sum of the absolute values of a vector's elements.
L2 norm: The square root of the sum of squares of a vector's elements.