Untitled

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In the augmented linear regression model, what value does $x_0$ typically hold for all samples?

  • A random number
  • 1 (correct)
  • The mean of all other features
  • 0

What is another common term for the intercept term 'b' in a linear regression model?

  • Slope
  • Residual
  • Variance
  • Bias parameter (correct)

In the augmented linear regression model $y = w^T x$, solving the machine learning problem involves determining what?

  • The predicted value y
  • The feature vector x
  • The bias parameter b
  • The weight vector w (correct)

What does the augmented design matrix include to account for the bias in a linear regression model?

<p>The intercept (A)</p> Signup and view all the answers

What is the purpose of finding the stationary point of a function?

<p>To find where the function's derivative is zero (C)</p> Signup and view all the answers

What does $ŷ$ represent in the augmented linear regression model?

<p>The predicted value (D)</p> Signup and view all the answers

In the equation $ŷ = b + w_1x_1 + w_2x_2 + ... + w_nx_n$, what does 'b' represent?

<p>The y-intercept (D)</p> Signup and view all the answers

In the augmented linear regression model, the weight vector w includes $w_0$. What does $w_0$ represent?

<p>The bias parameter (B)</p> Signup and view all the answers

What does 'Tp' stand for in the context of model performance?

<p>True positive (B)</p> Signup and view all the answers

What does a confusion matrix primarily help to summarize?

<p>Classifier performance (A)</p> Signup and view all the answers

Which formula correctly calculates accuracy?

<p>$(Tp + Tn) / (Tp + Fp + Fn + Tn)$ (A)</p> Signup and view all the answers

What is the formula for calculating sensitivity (recall)?

<p>$Tp / (Tp + Fn)$ (D)</p> Signup and view all the answers

What is the formula for calculating precision?

<p>$Tp / (Tp + Fp)$ (C)</p> Signup and view all the answers

What type of task is typically associated with logistic regression?

<p>Classification (C)</p> Signup and view all the answers

What is the formula for calculating specificity?

<p>$Tn / (Tn + Fp)$ (B)</p> Signup and view all the answers

What might a model with low capacity struggle to do?

<p>Fit the training set. (C)</p> Signup and view all the answers

What is the hypothesis space in machine learning?

<p>The set of functions a learning algorithm can select. (D)</p> Signup and view all the answers

What is a binary classifier?

<p>A classifier with exactly two classes. (B)</p> Signup and view all the answers

What does a high bias typically indicate?

<p>High training error. (C)</p> Signup and view all the answers

In the context of machine learning models, a 'high gap' is most likely referring to which of the following?

<p>The difference in performance between the training and testing datasets. (A)</p> Signup and view all the answers

What is the primary goal of gradient descent?

<p>To find the minimum of a function. (B)</p> Signup and view all the answers

What is the effect of high capacity on a model?

<p>The model can memorize training set properties. (B)</p> Signup and view all the answers

What is the negative class typically labeled as in a binary classifier?

<p>Class 0 (C)</p> Signup and view all the answers

What type of classifier is logistic regression when distinguishing between two classes?

<p>Binary classifier (A)</p> Signup and view all the answers

In logistic regression, if $P(x \in Class1) = 0.3$, what is $P(x \in Class0)$?

<p>0.7 (C)</p> Signup and view all the answers

What is the range of the probability output by a logistic regression model?

<p>[0, 1] (D)</p> Signup and view all the answers

What type of machine learning algorithm is logistic regression?

<p>Supervised (C)</p> Signup and view all the answers

What mathematical tool is used to find the optimal value of w that minimizes the Mean Squared Error (MSE) in linear regression?

<p>Vector Calculus (C)</p> Signup and view all the answers

In the context of logistic regression, what does the sigmoid function do?

<p>Transforms the input into a value between 0 and 1 (B)</p> Signup and view all the answers

For logistic regression with one feature, what is the formula for t?

<p>$t = b + w_1x$ (B)</p> Signup and view all the answers

In single-variable calculus, what is the first step in finding the extrema of a function f(x)?

<p>Find the zeroes of f'(x). (A)</p> Signup and view all the answers

If $f''(x) \geq 0$ on the real numbers, what does this indicate about the function $f(x)$?

<p>$f(x)$ has a global minimum. (A)</p> Signup and view all the answers

In logistic regression, what is the purpose of finding the 'best distribution'?

<p>To find the parameters that best predict the labels (B)</p> Signup and view all the answers

What does 'argmin' represent in the equation $\mathbf{w}{min} = \text{argmin } MSE{train}(\mathbf{w})$?

<p>The argument $\mathbf{w}$ that minimizes the $MSE_{train}(\mathbf{w})$. (D)</p> Signup and view all the answers

What is the formula for the sigmoid function $\sigma(t)$?

<p>$\sigma(t) = \frac{1}{1 + e^{-t}}$ (B)</p> Signup and view all the answers

In the context of linear regression, what does $MSE_{train}(\mathbf{w})$ represent?

<p>The mean squared error on the training data with weights $\mathbf{w}$. (A)</p> Signup and view all the answers

Which of the following is the formula for $MSE_{train}(\mathbf{w})$?

<p>$MSE_{train}(\mathbf{w}) = \frac{1}{N} \sum{(\mathbf{y}<em>{train} - \hat{\mathbf{y}}</em>{train})^2}$ (D)</p> Signup and view all the answers

In the equation $MSE_{train}(\mathbf{w}) = \frac{1}{N} ||\mathbf{Xw} - \mathbf{y}_{train}||^2 $, what does $\mathbf{X}$ represent?

<p>The features of the training data. (C)</p> Signup and view all the answers

What is the significance of finding where the gradient of $(\hat{\mathbf{y}}{train} - \mathbf{y}{train})^2$ with respect to $\mathbf{w}$ equals zero?

<p>It finds the $\mathbf{w}$ that minimizes the squared difference. (D)</p> Signup and view all the answers

What is the primary purpose of a test set in machine learning?

<p>To measure the performance of the machine learning system. (A)</p> Signup and view all the answers

Which type of machine learning algorithm uses labeled data for training?

<p>Supervised learning (A)</p> Signup and view all the answers

What kind of data is typically used in unsupervised learning?

<p>Unlabeled data (B)</p> Signup and view all the answers

In the context of machine learning datasets, what is a 'feature'?

<p>A characteristic or attribute of an example. (D)</p> Signup and view all the answers

What is a design matrix commonly used for?

<p>Describing the features of a dataset. (A)</p> Signup and view all the answers

In a design matrix, what does each row typically represent?

<p>A sample (A)</p> Signup and view all the answers

What does a label or target provide in supervised learning?

<p>Guidance on what the machine learning system should do (D)</p> Signup and view all the answers

In the Iris dataset example, what do the features $X_{i,1}$ and $X_{i,2}$ represent?

<p>Sepal length and sepal width of plant <em>i</em>. (A)</p> Signup and view all the answers

Flashcards

Training Set

Data used to train a machine learning model.

Test Set

Data used to evaluate the performance of a trained machine learning model.

Unsupervised Learning

Learning without labeled data, discovering patterns on its own. Learns from unlabeled data.

Supervised Learning

Learning with labeled data, guided by an instructor. Learns from input-output pairs.

Signup and view all the flashcards

Design Matrix

A structured way to organize a dataset, with each row representing a sample and each column representing a feature.

Signup and view all the flashcards

Sample

A single data point in a dataset, described by a set of features.

Signup and view all the flashcards

Feature

A measurable property or characteristic of a sample.

Signup and view all the flashcards

Label/Target

The desired output or category associated with a sample in supervised learning.

Signup and view all the flashcards

Augmented Linear Regression Model

A linear regression model that includes an intercept term (bias) to account for a non-zero output when all inputs are zero.

Signup and view all the flashcards

Augmented Feature Vector (𝒙)

A vector containing feature values for a single data point, with an added '1' as the first element (x₀) to represent the bias.

Signup and view all the flashcards

Augmented Weight Vector (𝒘)

A vector containing the weights assigned to each feature, including a weight (b) for the intercept (bias) term.

Signup and view all the flashcards

Bias Parameter (b)

The intercept term (b) in a linear regression model; represents the output value when all input features are zero.

Signup and view all the flashcards

Linear Regression Equation

The equation 𝑦ො = 𝒘ᵀ𝒙, representing the predicted output, where 𝒘 is the weight vector and 𝒙 is the feature vector.

Signup and view all the flashcards

Solving Linear Regression

The machine learning solution that involves finding the optimal values for the weight vector 𝒘 in a linear regression model.

Signup and view all the flashcards

Stationary Point

A point at which the derivative of a function is equal to zero (f'(x) = 0).

Signup and view all the flashcards

Local Minimum

A point where the second derivative of a function (f''(x)) is positive, indicating a local minimum.

Signup and view all the flashcards

Minimizing MSE with Vector Calculus

Using vector calculus to find the specific value of 'w' that results in the smallest possible Mean Squared Error (MSE) on the training dataset.

Signup and view all the flashcards

Critical/Stationary Points

Points where the derivative of a function equals zero. These points are potential locations of maximum or minimum values of the function.

Signup and view all the flashcards

Global Maximum Condition

If the second derivative is less than/equal to zero across all real numbers, the critical point is a global maximum.

Signup and view all the flashcards

Global Minimum Condition

If the second derivative is greater than/equal to zero across all real numbers, the critical point is a global minimum.

Signup and view all the flashcards

argmin MSEtrain(w)

The value of 'w' that minimizes the MSE. It's found by determining the 'w' that results in the smallest difference between predicted and actual values.

Signup and view all the flashcards

Role of Vector Calculus

Vector calculus minimizes the difference between predicted and actual values, focusing on the gradient to reduce error and improve the model's accuracy.

Signup and view all the flashcards

True Positive (Tp)

Indicates when the model correctly identifies the positive class.

Signup and view all the flashcards

True Negative (Tn)

Indicates when the model correctly identifies the negative class.

Signup and view all the flashcards

False Positive (Fp)

Indicates when the model incorrectly identifies the positive class.

Signup and view all the flashcards

False Negative (Fn)

Indicates when the model incorrectly identifies the negative class.

Signup and view all the flashcards

Confusion Matrix

A table that summarizes the performance of a classification model by showing the counts of true positive, true negative, false positive, and false negative predictions.

Signup and view all the flashcards

Accuracy

The ratio of correctly classified instances to the total number of instances.

Signup and view all the flashcards

Sensitivity (Recall)

The ability of a classifier to find all the positive samples.

Signup and view all the flashcards

Specificity

The ability of a classifier to avoid labeling negative samples as positive.

Signup and view all the flashcards

Model Capacity

Models with low capacity may not fit the training data well. Models with high capacity may overfit by memorizing the training data.

Signup and view all the flashcards

Hypothesis Space

The set of functions a learning algorithm is allowed to select as a solution.

Signup and view all the flashcards

Underfitting vs. Overfitting

Model performs poorly on the training data. Model performs very well on training data but poorly on unseen data.

Signup and view all the flashcards

Bias vs. Variance

Measures how much the model's prediction misses the true value on training data. Measures how much the model's prediction varies for different datasets.

Signup and view all the flashcards

High Bias vs. High Variance

Training error is high. Gap between training and testing error is high.

Signup and view all the flashcards

Gradient Descent

An optimization algorithm used to find the minimum of a function. It iteratively moves in the direction of the steepest descent as defined by the negative of the gradient.

Signup and view all the flashcards

Binary Classifier

A classifier that distinguishes between two classes. Each input belongs to one class or the other, not both.

Signup and view all the flashcards

Positive Class vs. Negative Class

A class representing the condition we're trying to detect. A class representing the condition that is not the one we are trying to detect.

Signup and view all the flashcards

Logistic Regression Goal

Given an input (x), estimates the probability of belonging to a certain class.

Signup and view all the flashcards

Performance Measure Goal

Used to evaluate the performance of predicted probabilities; penalizes bad predictions and rewards accuracy.

Signup and view all the flashcards

Sigmoid Curve

S-shaped curve used in logistic regression to model the probability of a data point belonging to a particular class.

Signup and view all the flashcards

Logistic Sigmoid Function

A mathematical function that maps any real value into a value between 0 and 1. Represented as 𝜎(t) = 1 / (1 + e^-t).

Signup and view all the flashcards

t = w^T * x

In logistic regression, this is the formula after features are augmented: t = w^T * x

Signup and view all the flashcards

Find the best distribution

The aim is to identify the parameter values that best fit the observed data, enabling accurate predictions.

Signup and view all the flashcards

Study Notes

  • The session aims to teach about Machine Learning problems, linear and logistic regression, design matrix creation, and Gradient Descent.

Learning Algorithms

  • A machine learning algorithm is able to learn from data.
  • To learn, a computer program needs experience (E) for a class of tasks (T) and performance measure (P).
  • If performance at tasks in T, measured by P, improves with experience E, then learning has occurred.
  • T, P, and E need definition for every machine learning algorithm.

Task "T"

  • The process of learning is not the task itself.
  • Machine learning tasks are described by how the system processes an example.
  • An example is quantitatively measured features from an object or event.
  • An example is represented as a vector x ∈ Rn, where each entry xi is a feature.
  • Pixel values in an image are its features.

Common Machine Learning Tasks include

  • Classification which specifies the category an input belongs to.
    • Object recognition such as pedestrians, cars, buses is an example.
  • Regression which predicts a numerical value from a given input.
    • Predicting the claim amount an insured person will make is an example.
  • Transcription which transcribes unstructured data into discrete, textual form.
    • Optical character and speech recognition are examples.
  • Machine translation.
  • Synthesis and sampling which generates new examples similar to training data.
    • Automatically generating textures for video games is an example.
  • Imputation of missing values where a machine learning algorithm is given a new example x ∈ Rn, but with some missing entries, xᵢ of x.
  • Denoising.

Performance Measure "P"

  • A quantitative measure must be designed to evaluate a machine learning algorithm's abilities.
  • Performance measure P is specific to the task T being carried out.
  • Accuracy measures the accuracy of the model for classification and transcription; this can also be measured as error rate
  • Algorithms should perform well on data they have not seen before.
  • A training set is used to train the machine learning system.
  • A test set measures the performance of the machine learning system.

Experience "E"

  • Machine learning algorithms can be broadly categorized as Unsupervised & Supervised
  • Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset.
    • The data is unlabelled and used in clustering methods.
    • There is no instructor or guide, the algorithm must make sense of the data.
  • Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.
    • Classifying iris plants into three different species based on their measurements is an example.
    • A teacher shows the machine learning system what to do.

Datasets

  • Most machine learning algorithms experience a dataset.
  • A dataset is a collection of examples, which are collections of features.
  • A common way of describing a dataset is with a design matrix.
    • The samples go per row.
    • The features go per column.
  • Datasets can also use the opposite i.e.
    • One feature per row
    • One sample per column

Iris Dataset

  • Contains 150 samples and 4 features.
  • The design matrix is X ∈ R150×4 where X₁,₁ is the sepal length, X₁,₂ is the sepal width of plant (sample) i etc.
  • Datasets can be described with a set containing m elements {x(1), x(2),...,x(m)}

Supervised Learning Datasets

  • The example in supervised learning contains a label/target and a collection of features.
  • Object recognition from photographs needs specification of the appearing object in each photo.
    • Numeric code can be used whereby 0 means person, 1 means car and 2 means cat etc.
  • Given feature observations X, a vector of labels y is also given, with yᵢ providing the label for example i.

Linear Regression

  • The task of linear regression is determining the value of the weights w to predict the value of the output scalar y ∈ R given the input vector x ∈ Rn.
  • Output y is assumed to be a linear function of input x.
  • Let ŷ be the value that the model predicts given x; the true value is written y.
  • ŷ = w²x where w ∈ Rn is a vector of parameters or weights to be determined.
  • ŷ = w₁x₁ + w₂x₂ + ··· + wnxn in its expanded form.
  • ŷ = 0 when x = 0, which is a strong assumption.

Augmented LR Model

  • Linear regression often refers to a more general model with an intercept term b.
  • ŷ = w²x + b where b ∈ R.
  • With affine functions, the plot of the model's predictions is a line that passes through the origin.
  • This model augments x by a new feature that is always set to 1 for every sample vs adding the bias parameter b
  • The intercept term b is the bias parameter for the linear regression model.
  • This terminology comes since the output is biased toward being b in the absence of input.
  • With an extended linear model, ŷ = w²x.
    • The feature vector x = {x₀,...xₙ} and x₀ = 1 for all samples.
    • The weight vector w = {w₀,...wₙ} and w₀=b (bias parameter).
    • ŷ = w₀x₀ + w₁x₁ + w₂x₂ + … + wnxn or
    • ŷ = b + w₁x₁ + w₂x₂ + … + wnxn.
  • Solving the machine learning problem determines the weight vector w.

Recall: Finding the Minimum of a Function

  • To find the minimum of a function there are a number of options to use
    • Find the unique stationary point x₀ of the function i.e., that satisfies f’(x₀) = 0
    • You can find all Stationary points (local maximum/minimum/saddle)
    • Then determine their nature

Recall: Calculating Norms

  • For a vector x = (x₁, x₂,...,xₙ) ∈ ℝⁿ the norm 2 of x is given by
    • ||x||₂ = √Σᵢ₌₁ⁿ xᵢ² = √(x₁² + x₂² + ....xₙ²) and is a non negative real number.
  • One result used later is that xx = (x₁, x₂...xₙ) (x₁, x₂,...,xₙ)ᵀ = x₁² + x₂² + ... +xₙ²
  • ||x||₂² i.e. xx = ||x||₂²

Notation: Euclidean Difference

  • For 2 points in ℝⁿ, say A = (a₁, a₂,..., aₙ)ᵀ and B = (b₁, b₂,..., bₙ)ᵀ, the Euclidean difference between them is given by
    • d(A,B) = || AB ||₂ where AB is the vector with origin A and head B

Training the Linear Regression Model

  • The mean squared error (MSE) of the model is computed to measure its performance on a test set.
  • Since training is done on training data, what must be minimized is:
    • εᵢ= ŷ(train)ᵢ − y(train)ᵢ for all sample i, where εᵢ is the residual or error between the predicted output ŷ(train)ᵢ and its true value y(train)ᵢ.
  • To minimise the following function we need to jointly minimise εᵢ for every sample but εᵢ can be either positive or negative such that
    • MSEtrain(w) = 1/N Σᵢ₌₁ᴺ (ŷ(train)ᵢ - y(train)ᵢ )²
    • N = number of samples in training set
  • In linear regression the model is trained to minimise ei or yᵢ^{train} – yᵢ^{train} for all samples where E¡ is the residual or error between value and the true value y¡^{train}

To what is X(train). w equal to?

  • X(train). w is the prediction of the model
  • (where the predicted labels can be calculated from all the design matrix and the weight vector).
  • Squared error loss is expressed as the sum of X (train) and y^{train}, which is the training data: W vector is adjusted to minimise mean squared error.
  • An method to find the minimum could be to let w take different values and evaluate MSEtrain(w).
  • The smaller MSEtrain(w), the better, but the challenge is to find best possible fit.

Linear Regression Models

  • Vector Calculus determines w's exact value that minimises MSEtrain, as opposed to a random search.
  • For one variable, let f be a function of variable x.
    • Find the zeroes of f'(x) (aka critical/stationary points).
    • xo is only value such that f'(x₀)=0
      • If f"(x) ≤ 0 on R, then f(x₀) is the global maximum at x₀.
      • If f"(x) ≥ 0 on R, then f(x₀) is the global minimum at x₀.
  • May not have f"(x) to be always negative or positive, but local extrema can be found.
    • If f"(x₀) ≤ 0, then f(x₀) is a local maximum at x₀..
    • If f"(x₀) ≥ 0, then f(x₀) is a local minimum at x₀

Training Linear Regression using Vector Calculus

  • MSEtrain(w) is a function N + 1 variables but with the same approach.
  • MSEtrain(w) = 1 N ||ŷ(train) − y(train)||² = 1/N||X(train). w − y(train)||²
  • Which is mathematically written as
  • wmin = argminw MSEtrain(w), which is to find the argument w that minimises MSEtrain(w). wmin = argminw MSEtrain = argmin
  • Whether or not ‘N’ has role in minimisation MSEtrain (w) is not necessarily obvious from this formulation

Closed Form Solution

  • Vector Calculus determines w's exact value that minimises ||ŷ(train) – y(train) ||².
  • Compute the gradient of ||ŷ(train) – y(train) ||² with respect to w.
    • ||ŷ(train) – y(train) ||² = (ŷ(train) – y(train))ᵀ. (ŷ(train) – y(train))
  • Expanding yields ||ŷ(train) – y(train)||² = ŷ(train)ᵀ . ŷ(train) - 2 . ŷ(train). y(train) + y(train)ᵀ . y(train)
  • As ŷ(train) = X(train).w and (A . B)ᵀ = Bᵀ. Aᵀ ,this gives:
    • ||ŷ(train)-y(train)||² = wᵀX(train)ᵀ X(train) w - 2 wᵀX(train) y(train) + y(train)ᵀ . y(train)
  • Using the gradient properties where ∇w(wᵀA w) = (A + Aᵀ)w and ∇w(wᵀA) = A, the gradient can be calculated
  • ∇w(w |ŷ(train) - y(train)|²) = 2 X(train)ᵀ X(train) w - 2X(train)ᵀ y(train)
  • ∇w|| ŷ(train) – y(train)||² = 0, which means that for w the solution yields for training with Y
    • wmin = (X(train)ᵀX(train))⁻¹ X(train)ᵀ y(train)
  • The above is the only solution provided that X(train)ᵀX(train) is invertible.
  • The above equality is known as the normal equations and gives the analytical solution of the linear regression problem
  • For value of w = wmin, is proven that MSEtrain is a global minimum.

Evaluating

  • The model is used for training and evaluating how the model performs: x(test) ∈ ℝ ^M x n
  • Training a regression target with exact value y for example; the test set y(test) ∈ ℝ ᵐ
  • MSEtest is the prediction of the model on the test set ŷ(test) ∈ ℝ ᵐ
    • MSEtest = 1/Μ Σᵢ₌₁^Μ (ŷ^(test)ᶦ – y^(test)ᵢ)² where a lower MSEtest value indicates the model generalises and performs in previously unseen inputs

Convexity

  • Convexity study of a function is about determining function is: Convex, Concave Or Neither.
  • In mathematics; a Real Value Function is called Convex or Concave - If a line segment between two points of a function lies above/below the grade, between the two points.
  • A function that is not convex is not concave.
  • Most Functions are Neither
    • Ex: Convex Functions - f(x) = x squared and f(x) = exp(x)
    • Ex: Concave functions - f (x) = square root of x or f(x) = ln(x)
    • Ex: Neither - f(x) = x cubed, and f(x) = cos x
  • MSE Train Convex function vs all ML optimisation problems are Convex Concave
  • Deep Neural networks loss functions are typical of - Not being convex or conave
  • Some functions can be said not generous all to all ML Problems

MSEtrain Convex function vs all ML optimisation problems

  • Convexity ensures that solving minimisation is simpler from the point of optimisation or learning
    • It allows us to conclude whether a local minimum (maximum) is a global minimum (maximum).
    • A local minima is a global minima in R (if it is convex) and a global maxima (if it is concave)
  • Local Min can be found in the gradient Descent of MSTrain or vanish the we need to show it is a local Min
  • It is not easy

Linear Regression

  • Central Challenge in Machine Learning
    • Machine Learning is that you are working with data and only one set of trained data.
    • Finding MSE train, minimising MSTrain to a minimum to the learning dataset
  • Can be guaranteed with minimal good test for MSTest?
  • To test
    • MSTest will be dependent the statisticial or the Traning
    • It will also be depedent on how well is captured by the data to use for the Phenomenonal enterst.
  • Numerically Approximation of Solution
    • Minimum or May find is locval min
    • Local Main can defend on implement algo

Training-Test and error

  • The ability to perform well on previously un observed imports is called Generalisation
  • As model is Trained ( Through numerical methods) the track of 2 errors Is the training error ( Which need to minimize) Test and Train
  • itertivley to Minimise, is key not to confuse the training and learning
  • 3 Factor ( Training error ,Small, The Data, and Testing, Error)

Determining ML Training Capacity, Overfitting and Underfitting

  • Occurs Under fitting ,Model is not able to have a sufficently lo error value of a trained seat
  • Capacity measures ablity to fil wide variety of functions by alteing caocity
  • Low and High Capcity can often times struggle a fit traain, but memortise with high capcity with properties do Serve themwell or the test test.
  • The set functions are with learning algo with solution

Hypothesis space

  • set functions the learning algo is allowed to get
  • What mean for linear or regression means set all functions of space
    • Polynomials can genralize Linear regression like the follow example models T= B-wx and for x's x0 = 1 and w w 0'=' (b - Linear W1 W = 1 + wy w2 x squared the hupothesis can larger thar befrore
    • T2 + v1y = w2x
    • The model can be larget its hypothsis space is larger

Bias/Variance

  • High/ Variance.Empirically ,Training models can have a irreducible Large training between small testing errors.
  • Small Training error is between high testing error
  • acceptable Low traninfg betweeen traninand teesting eeror
  • High bis then tranibg eeriring is hi
  • High variance the traninhing in eerotesting er testing is high

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Untitled
44 questions

Untitled

ExaltingAndradite avatar
ExaltingAndradite
Untitled
6 questions

Untitled

StrikingParadise avatar
StrikingParadise
Untitled
49 questions

Untitled

MesmerizedJupiter avatar
MesmerizedJupiter
Untitled
121 questions

Untitled

NicerLongBeach3605 avatar
NicerLongBeach3605
Use Quizgecko on...
Browser
Browser