Bias in Predictions

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following is NOT considered a primary source of error in forming predictions?

  • Bias
  • Variance
  • Noise
  • Overfitting (correct)

A more complex model invariably leads to better performance on testing data.

False (B)

What term describes the irreducible aspect of data error that one can never eliminate?

noise

___________ refers to the error due to the model's inability to capture the true relationship in the data.

<p>Bias</p> Signup and view all the answers

In the context of machine learning, what does 'variance' refer to?

<p>The extent to which different training datasets lead to different model fits (D)</p> Signup and view all the answers

Bias is independent of the model used and is solely determined by the data.

<p>False (B)</p> Signup and view all the answers

If a model exhibits 'high bias,' what characteristic does it likely possess?

<p>low complexity</p> Signup and view all the answers

The difference between the average of the estimated function and the true function is known as ________.

<p>bias</p> Signup and view all the answers

Which scenario exemplifies high bias in a model?

<p>The model consistently underperforms, even on training data. (D)</p> Signup and view all the answers

Models with high complexity generally exhibit low bias.

<p>True (A)</p> Signup and view all the answers

Explain how 'high variance' affects the stability of a model's predictions.

<p>unstable predictions</p> Signup and view all the answers

Over all possible size N training sets; what do I expect my fit to be; is part of _______ contribution.

<p>bias</p> Signup and view all the answers

What is the primary characteristic of a model with low variance?

<p>It produces consistent predictions across different datasets. (B)</p> Signup and view all the answers

Models with low complexity generally exhibit low variance.

<p>True (A)</p> Signup and view all the answers

In terms of model complexity, how does high variance typically manifest?

<p>high complexity</p> Signup and view all the answers

The squared deviation of $\hat{f_w}$ from its expected value $\overline{f_w}$ over different realizations of training data.

<p>variance</p> Signup and view all the answers

Which issue is more likely to be addressed by increasing the size of the training dataset?

<p>High variance (B)</p> Signup and view all the answers

Regularization techniques are primarily used to combat high bias.

<p>False (B)</p> Signup and view all the answers

What term describes the situation when a model performs well with training data, but poorly with new data?

<p>overfitting</p> Signup and view all the answers

The class of models can't fit the data exhibits ____________, while the models which could fit but don't because it is hard to fit exhibits ________________.

<p>bias, variance</p> Signup and view all the answers

Match the following scenarios with whether they describe high bias or high variance:

<p>Model performs poorly, even on training data = High Bias Model fits training data very well, but generalizes poorly to unseen data = High Variance</p> Signup and view all the answers

In parallel universes where only Niantic knows f, what serves as training data to find f*?

<p>Data from Pokemon caught (B)</p> Signup and view all the answers

In the context of estimator, f* is the true function known to everyone.

<p>False (B)</p> Signup and view all the answers

According to the Pokemon Case Study, if we average all the estimated functions f*, is it close to true function f?

<p>bias</p> Signup and view all the answers

In the bias variance trade off, E[f*] - f equals ______.

<p>Bias(f*)</p> Signup and view all the answers

Which method can introduce bias, but may effective to combat large variance?

<p>Regularization (C)</p> Signup and view all the answers

More data can always solve an issue of large variance.

<p>False (B)</p> Signup and view all the answers

When there is an issue with high bias, is it an issue with overfitting or underfitting, and why?

<p>underfitting</p> Signup and view all the answers

When is Underfitting diagnosised, when the model doesn't even fit the _____________.

<p>training examples</p> Signup and view all the answers

Match the following diagnoses with their corresponding solution to combat this effect:

<p>Underfitting = A more complex model Overfitting = More Data</p> Signup and view all the answers

In the bias-variance tradeoff, what typically happens to bias as model complexity increases?

<p>Bias decreases (C)</p> Signup and view all the answers

Having both low bias and low variance guarantees perfect model performance in real-world scenarios.

<p>False (B)</p> Signup and view all the answers

How does the amount of training data typically impact a model's variance?

<p>decreases</p> Signup and view all the answers

A model with high variance is likely to perform well on the ______ dataset but poorly on the ______ dataset.

<p>training, testing</p> Signup and view all the answers

Which of the following strategies is MOST directly aimed at reducing overfitting?

<p>Simplifying the model (C)</p> Signup and view all the answers

A model with high bias is likely to capture noise in the data.

<p>False (B)</p> Signup and view all the answers

What are the effects of noise; and can it be combatted?

<p>irreducible error, no</p> Signup and view all the answers

In the equation for Expected Prediction Error: EPE(x) = _________² + bias² + variance

<p>noise</p> Signup and view all the answers

Match each scenario with the appropriate action to take:

<p>Model cannot fit training examples = A more complex model/diagnose for underfitting Model can fit training examples; but large error on testing data = Add more features as input/diagnose overfitting</p> Signup and view all the answers

Which term describes when a model fits training data well, but poorly with new data?

<p>Overfitting (C)</p> Signup and view all the answers

Underfitting occurs when there is high variance in the data.

<p>False (B)</p> Signup and view all the answers

Flashcards

What is Bias?

Error in predictions due to oversimplified assumptions in the learning algorithm.

What is Variance?

Error in predictions due to the model's sensitivity to small fluctuations in the training data.

What is Noise?

Error due to randomness or inherent variability in the data that cannot be reduced by any model.

What is Irreducible Error?

An error that we cannot reduce due to the inherent noise in the data.

Signup and view all the flashcards

What is Underfitting?

Models are too simple to capture the underlying patterns in the data, leading to high bias and poor performance on both training and test data.

Signup and view all the flashcards

What is Overfitting?

Models that fit the training data too well, capturing noise and outliers, leading to low bias but high variance and poor generalization to new data.

Signup and view all the flashcards

What is Regularization?

The process of preventing a model from overfitting by adding a penalty term to the loss function.

Signup and view all the flashcards

Low complexity vs. Variance

A model with low complexity generally has low variance because it is less sensitive to the specific training data used. Meaning different training sets produce similar models.

Signup and view all the flashcards

High complexity vs. Bias

A model with high complexity generally has low bias because it is flexible enough to fit the training data closely.

Signup and view all the flashcards

What is Bias of Estimator?

The difference between this estimator's expected value and the true value of the parameter being estimated.

Signup and view all the flashcards

What is Variance of Estimator?

How far, on average, the collection of estimates are from the expected value of the estimates.

Signup and view all the flashcards

Bias of function estimator

The difference between the average value of prediction (over different realization of training data) to the true underlying function.

Signup and view all the flashcards

Variance of Function Estimator

The squared deviation of from its expected value over different realizations of training data.

Signup and view all the flashcards

What is the formula for Expected Prediction Error?

Expected Prediction Error equals noise squared + bias squared + variance

Signup and view all the flashcards

What is y (the target)?

The true function + noise

Signup and view all the flashcards

Study Notes

  • In forming predictions, there are 3 sources of error, which are noise, bias, and variance. Noise is an aspect of the data that can never be eliminated.
  • The average error on testing data can be attributed to "bias" and "variance". A more complex model does not always lead to better performance on testing data.

Data Inherently Noisy

  • Data is inherently noisy, and this is represented by the equation Yi = fw(true) (xi)+Ci, where €i represents the variance of noise. The error from noise is called irreducible error and is an aspect of the data you can never beat.

Bias Contribution

  • Bias is inherent to the model. The bias of a model can be represented as Bias(x) = fw(true)(x) - fw. In this equation, fw(true) represents the true function and fw represents the model's prediction.
  • Bias results in error in predictions if a model is not flexible enough to capture the true function, fw(true). If you fit a constant function, like a horizontal line, to data with an upward trend, there will be a high bias. Low complexity in a model leads to high bias.
  • The average estimated function is represented as fÌ„w(x), and the true function is represented as fw(x). The average estimated function is the function you expect to get over all possible size N training sets.
  • Bias of a function estimator is bias(fÌ„w(xt)) = fw(xt) - fÌ„w(xt). This equation represents the difference between the true underlying function fw and the average value of prediction fÌ„w over different realization of training data.

High Bias

  • High bias occurs when the true function f(x) cannot be accurately modeled by the chosen model.

Variance Contribution

  • Variance shows how much specific fits vary from the expected fit, fÌ„w. Low complexity leads to low variance.

Variance of High-Complexity Models

  • Applying a high-order polynomial results in a large space for variance.

Variance of Function Estimator

  • The variance of a function estimator is var(fw(xt)) = Etrain[(fw(train)(xt) - fÌ„w(xt))^2]
  • Variance is the squared deviation of fw from its expected value fÌ„w over different realizations of training data.

High Variance

  • High variance occurs when the data is not accurately modeled on the true function f(x), resulting in a variance.

Relationship of Bias and Variance

  • With low bias and variance, the data points are close to each other and to the center of the target. As the bias increases, the distances between the predicted dot location and the targeted dot location increase. As the variance increases, the predicted dot location becomes more dispersed.

Pokemon Case study

  • Niantic knows the true function f, but from training data we can find estimator f*.

Parallel Universes

  • In all universes, 10 Pokémons are being collected to find f*.
  • In different universes, the same model is used but different f* is obtained.

f* in 100 Universes

  • It is possible to have different polynomial function orders.

Bias of Estimators

  • Bias is when we average all the f*, is it close to f?: E[f*] = fÌ„
  • Bias is Bias(f*) = E[f*] - f
  • The differences between the training of a large bias versus a small bias are that the dots are more dispersed, and the average is further away from the target as the bias increases.

Variance of Estimators

  • Variance(f*) = E[(f* – E(f*))²]
  • Simpler models are less influenced by the sampled data.

Bias and Variance of Estimator

  • Bias is the difference between this estimator's expected value and the true value of the parameter being estimated: Bias(f*, f) = Bias(f*) = E[f*] − f = E[f* - f]
  • Variance is how far, on average, the collection of estimates are from the expected value of the estimates: Variance(f*, f) = Variance(f*) = E[(f - E(f*))²]

Bias v.s. Variance

  • As bias increases, the variance decreases. A balance can be obtained to get the proper model.

Bias v.s. Variance Graph

  • The overall error can be broken down into errors from bias and variance.
  • Underfitting occurs when there is a large bias with a small variance.
  • Overfitting occurs when there is a small bias with a large variance.

Handling Bias

  • Diagnosis using underfitting means the model cannot even fit the training examples, which means they have large bias
  • For bias, redesign the model. Add more features as input, and/or use a more complex model.

Handling Variance

  • With large variance, more data is needed, which increases the bias.
  • Regularization can be used, which may increase bias.

Summary

  • Bias means the class of models can't fit the data.
  • Variance means the class of models could fit the data but it's too hard to fit.

Summary Cat Classification

  • In this cat classification scenario, the proper balance yields validation of 1%.

Expected Prediction Error

  • Ep {(y - f*(x))} represents the expected prediction error and is expressed as Ep {(f(x) + € - f*(x))2}. An example may be, drawing size n sample D = {(x1, y1), ..., (Xn yn)}.

Bias and Variance of Estimator

  • EPE(x) represents the expected prediction error and can be defined as the sum of noise², bias², and variance
  • A Bias-Variance decomposition exists.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

AI Bias and Fairness Quiz
7 questions

Quiz on Fairness of AI Solutions

CostEffectivePrudence7779 avatar
CostEffectivePrudence7779
Statistics Exam Prediction Techniques
5 questions
Bias-Free Language: History and Usage
24 questions
Use Quizgecko on...
Browser
Browser