IMG_1024.jpeg

# Machine Learning Cheat Sheet ## Supervised Learning Given labeled training data, predict labels on new data. ### Regression Predict continuous values. **Common Algorithms:** * Linear Regression * Polynomial Regression * Decision Tree Regression * Random Forest Regression * Support Vector Regression (SVR) **Evaluation Metrics:** * Mean Squared Error (MSE): $\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2$ * R-squared: $1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}$ ### Classification Predict discrete categories. **Common Algorithms:** * Logistic Regression * K-Nearest Neighbors (KNN) * Support Vector Machine (SVM) * Decision Tree Classification * Random Forest Classification * Naive Bayes **Evaluation Metrics:** * Accuracy: $\frac{\text{Number of correct predictions}}{\text{Total number of predictions}}$ * Precision: $\frac{\text{True Positives}}{\text{True Positives + False Positives}}$ * Recall: $\frac{\text{True Positives}}{\text{True Positives + False Negatives}}$ * F1-Score: $2 * \frac{\text{Precision * Recall}}{\text{Precision + Recall}}$ * Confusion Matrix * AUC-ROC Curve ## Unsupervised Learning Discover patterns in unlabeled data. ### Clustering Group similar data points. **Common Algorithms:** * K-Means * Hierarchical Clustering * DBSCAN **Evaluation Metrics:** * Silhouette Score * Davies-Bouldin Index ### Dimensionality Reduction Reduce the number of features while preserving important information. **Common Algorithms:** * Principal Component Analysis (PCA) * t-distributed Stochastic Neighbor Embedding (t-SNE) ## Reinforcement Learning An agent learns to make decisions in an environment to maximize a reward. **Key Concepts:** * Agent: The learner. * Environment: The world the agent interacts with. * State: The current situation of the agent. * Action: A choice made by the agent. * Reward: Feedback from the environment. * Policy: A strategy for choosing actions. * Value Function: Predicts future reward. **Common Algorithms:** * Q-Learning * SARSA * Deep Q-Network (DQN) ## Model Selection and Evaluation ### Cross-Validation * Split data into multiple folds. * Train on some folds, validate on the remaining fold. * Repeat, rotating the validation fold. * Average the validation scores. ### Hyperparameter Tuning * Grid Search: Test all combinations of hyperparameters. * Random Search: Randomly sample hyperparameter combinations. ## Important Notes ### Bias-Variance Tradeoff * **High Bias:** Underfitting (model is too simple). * **High Variance:** Overfitting (model is too complex). ### Regularization Techniques to prevent overfitting (e.g., L1, L2 regularization). ### Feature Scaling Standardize or normalize features to improve algorithm performance. ### Imbalanced Data Use techniques like oversampling or undersampling to handle imbalanced datasets.

Document Details

Related

Full Transcript