Machine Learning Workflow

ImaginativeQuasar avatar
ImaginativeQuasar
·
·
Download

Start Quiz

Study Flashcards

24 Questions

The test set is used to select the final model.

False

Hyperparameters are learned from the training data.

False

Accuracy is the only metric used to evaluate a classification model.

False

The confusion matrix is used to evaluate regression models.

False

The validation set is used to evaluate the model's performance on unseen data.

False

True Positives are when you predict negative and it's true.

False

False Negatives are when you predict positive and it's false.

False

Precision is a metric used to evaluate regression models.

False

The accuracy of the model can be calculated from the given confusion matrix.

True

The sensitivity of a classifier is the ratio of correctly predicted negative observations to all actual negative observations.

False

The F-score is a measure of precision only.

False

The holdout method is a type of cross-validation.

True

Cross-validation is a method for training a model.

False

The precision of a classifier is the ratio of true positives to all positive predictions.

True

Bootstrap is a method for constructing a training set and testing set from the original dataset.

True

The error rate of a model is the same as its accuracy.

False

The training set is used to evaluate the function approximator's performance.

False

K-Fold Cross Validation is a method that eliminates selection bias.

True

In Leave-one-out Cross Validation, the training set is always larger than the testing set.

True

Bootstrap is a resampling technique without replacement.

False

The average error rate on the test set is used to estimate the true error in K-Fold Cross Validation.

True

Leave-one-out Cross Validation is a computationally efficient method.

False

K-Fold Cross Validation is a method that ensures the training set is always the same size.

False

Bootstrap is a method used to evaluate the performance of a classifier.

True

Study Notes

Data Sets

  • Train, Validation (Dev), and Test Sets are used in the workflow of training and evaluating a model
  • Train set: used to train the model
  • Validation set: used to select the best model from many trained models
  • Test set: used to evaluate the final model on unseen data

Mismatch

  • Dev and Test sets should come from the same distribution

Metrics for Evaluating Classifier Performance

  • Evaluation metrics quantify the performance of a machine learning model
  • Accuracy: percentage of correct classifications
  • Calculation: (correct predictions / total predictions) * 100
  • Example: actual outputs = [0,0,1,1,0,0,0,1,0,1,1,0], predicted outputs = [0,1,1,0,1,0,0,1,0,1,1,1]

Confusion Matrix

  • A table that describes the performance of a classification model
  • Elements:
    • True Positive (TP): predicted positive and true
    • True Negative (TN): predicted negative and true
    • False Positive (FP): predicted positive and false
    • False Negative (FN): predicted negative and false

Sensitivity and Specificity

  • Sensitivity (Recall or True Positive Rate): ratio of correctly predicted positive observations to all actual positive observations
  • Calculation: TP / (TP + FN)
  • Specificity (True Negative Rate): ratio of correctly predicted negative observations to all actual negative observations
  • Calculation: TN / (TN + FP)

Precision and F-score

  • Precision: fraction of relevant examples (true positives) among predicted positives
  • Calculation: TP / (TP + FP)
  • F-score (F1 score): balances precision and recall in one number
  • Calculation: 2 * (precision * recall) / (precision + recall)

Model Evaluation Methods

  • Goal: choose a model with the smallest generalization error
  • Methods to construct training and testing sets:
    • Holdout
    • Leave-one-out Cross Validation
    • Cross Validation (K-Fold)
    • Bootstrap

Holdout Method

  • Simplest kind of cross-validation
  • Divide dataset into two sets: training set and testing set
  • Train model on training set and evaluate on testing set

Cross Validation: K-Fold

  • Divide dataset into k subsets
  • Repeat holdout method k times, using each subset as the test set and the other subsets as the training set
  • Calculate average error across all trials

Leave-one-out Cross Validation

  • Use n-1 examples for training and the remaining example for testing
  • Repeat this process n times, calculating the average error rate on the test set
  • Disadvantage: computationally expensive

Bootstrap

  • Resampling technique with replacement
  • Randomly select examples from the dataset with replacement
  • Use selected examples for training and the remaining examples for testing
  • Repeat this process for a specified number of folds (k)

This quiz covers the workflow of machine learning, including training, evaluation, and model selection. Learn about the different stages involved in building a machine learning model.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser