Machine Learning Basics and Applications
40 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is an example of classification in machine learning?

  • Forecasting stock prices
  • Identifying a dog in an image (correct)
  • Calculating heart disease severity
  • Predicting temperature changes
  • Which of the following defines regression in machine learning?

  • Classifying documents into genres
  • Identifying skin problems
  • Categorizing emotions expressed in tweets
  • Predicting a continuously varying quantity (correct)
  • In the context of machine learning, what does the output of classification represent?

  • Continuous numerical values
  • A probability distribution
  • A ranked list of items
  • Distinct, predefined categories (correct)
  • How does the measurement of error differ between classification and regression?

    <p>Classification measures error in a discrete setting while regression uses a continuous approach.</p> Signup and view all the answers

    What aspect do recommender systems focus on?

    <p>Suggesting a ranked list of items</p> Signup and view all the answers

    What represents the scalar output in the linear regression model?

    <p>𝑦𝑦</p> Signup and view all the answers

    In the model $y(i) = \beta^T x(i) + \beta_0$, what does the term $\beta_0$ represent?

    <p>The intercept of the model</p> Signup and view all the answers

    What must be defined to measure how wrong a linear regression model is?

    <p>The loss function</p> Signup and view all the answers

    What geometric shape is represented in 2D in linear regression?

    <p>A line</p> Signup and view all the answers

    What happens when one dimension is added to the input feature vector in linear regression?

    <p>It incorporates the intercept into the model</p> Signup and view all the answers

    Which statement is true regarding the parameters $\beta$ in the linear regression model?

    <p>They are the coefficients affecting the output $y$</p> Signup and view all the answers

    What is the key characteristic of the function fitted in linear regression?

    <p>It is linear</p> Signup and view all the answers

    In a higher-dimensional context, what term is used for the shape that models the relationship in linear regression?

    <p>Hyperplane</p> Signup and view all the answers

    What does the parameterization of the model represent?

    <p>The relationship between output and an input-dependent parameter.</p> Signup and view all the answers

    What distribution do the ground truth labels y follow?

    <p>Gaussian distribution with unit standard deviation.</p> Signup and view all the answers

    Which symbol represents the output of the model as a function of the input?

    <p>f.</p> Signup and view all the answers

    In the context of maximum likelihood estimation, which expression is minimized?

    <p>The difference between observed and predicted values squared.</p> Signup and view all the answers

    What is the role of ε in the equation for y?

    <p>It represents noise drawn from a Gaussian distribution.</p> Signup and view all the answers

    What is the standard deviation used in the Gaussian distribution described?

    <ol> <li></li> </ol> Signup and view all the answers

    Linear regression can be understood as maximum likelihood estimation under which condition?

    <p>If the label contains noise from a Gaussian distribution.</p> Signup and view all the answers

    What does 𝜇* represent in the context of maximum likelihood estimation?

    <p>The parameter that maximizes the likelihood.</p> Signup and view all the answers

    What mathematical operation is performed to derive the expression for maximum likelihood estimation?

    <p>Taking the logarithm of the likelihood.</p> Signup and view all the answers

    Which of the following is part of the expression for the Gaussian probability P?

    <p>The exponential function of the squared differences.</p> Signup and view all the answers

    What does the gradient direction indicate in relation to a function's value?

    <p>It identifies the direction along which the function value changes the fastest.</p> Signup and view all the answers

    What happens to the function value along a level set?

    <p>The function value remains constant.</p> Signup and view all the answers

    What is the relationship between the gradient of a differentiable function and the level set at a point?

    <p>The gradient is either zero or perpendicular to the level set.</p> Signup and view all the answers

    What role does the learning rate (𝜂) play in gradient descent optimization?

    <p>It controls how much we move at each step during optimization.</p> Signup and view all the answers

    What is a critical consideration when selecting a learning rate in gradient descent?

    <p>The learning rate needs to be small to ensure reliability of the gradient approximation.</p> Signup and view all the answers

    Which statement accurately describes the loss surface in 2D?

    <p>It is represented by contour diagrams or level sets.</p> Signup and view all the answers

    In the context of gradient descent on convex functions, what happens if the learning rate is too high?

    <p>The optimization may diverge and fail to reach a solution.</p> Signup and view all the answers

    What characteristic of a level set is most emphasized in its definition?

    <p>It consists of points where the function value remains constant.</p> Signup and view all the answers

    What does the formula for $L_{XE}$ represent in terms of probabilities?

    <p>The cross-entropy loss between two probability distributions.</p> Signup and view all the answers

    How is the Monte Carlo estimation of an expectation formulated?

    <p>Using samples drawn from the probability distribution.</p> Signup and view all the answers

    What does the term 'cross-entropy' refer to in the provided context?

    <p>The difference between the true distribution and estimated distribution.</p> Signup and view all the answers

    What does the KL divergence measure in information theory?

    <p>How similar two probability distributions are.</p> Signup and view all the answers

    In the equation $H(P, Q) = -E_P(x) ext{log} Q(x)$, what does $H(P, Q)$ represent?

    <p>The cross-entropy between distributions.</p> Signup and view all the answers

    What is the role of the indicator function $δ$ in the context provided?

    <p>To indicate whether a condition is true or false.</p> Signup and view all the answers

    What implication does minimizing the loss have with respect to the distributions $P$ and $Q$?

    <p>It minimizes the distance between the ground truth distribution and the estimated distribution.</p> Signup and view all the answers

    What is required to approximate the integral in the expectation formula using Monte Carlo methods?

    <p>Sampling from the distribution $P(Y = y)$.</p> Signup and view all the answers

    What does the approximation $E_P(f(y)) hickapprox - ext{log} Q_Y(x_i, eta)$ suggest about the relationship of expectations and probabilities?

    <p>There is a connection between log probabilities and expectations.</p> Signup and view all the answers

    Study Notes

    Machine Learning Basics

    • A function y = 𝑓𝑓(𝑥𝑥) is approximated to produce an output.
    • Example: Image classification, where pixel values (𝑥𝑥) are used to identify categories like dog, cat, truck, airplane, etc. (𝑦𝑦).
    • Example: Tweet emotion recognition, where the text of a tweet (𝑥𝑥) determines the associated emotion (𝑦𝑦) like fear, anger, joy, sadness, etc.

    Classification vs. Regression

    • Classification: Output (𝑦𝑦) is discrete and represents distinct categories, for example, image or emotion categories.

    MNIST Classification

    • Handwritten digit classification.

    Classification for Skin Problems

    • Image classification for identifying skin problems.

    Classification vs. Regression

    • Regression: Output (𝑦𝑦) is a continuously varying quantity, typically a real number, for example, stock price or heart disease severity.
    • Key difference between classification and regression: Measurement of error.

    Ranking

    • Recommender systems provide a list of recommended items where the order of items matters.

    Linear Regression

    • A p-dimensional feature vector 𝒙𝒙 = (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑝𝑝 ) and a scalar output 𝑦𝑦 are used to create a linear model: 𝑦𝑦 = 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑝𝑝 + 𝛽𝛽0.
    • In vector form: 𝑦𝑦 = 𝜷𝜷⊤ 𝒙𝒙 + 𝛽𝛽0.

    Linear Regression

    • With 𝑛𝑛 data points (𝒙𝒙 1 , 𝑦𝑦 1 , 𝒙𝒙 2 , 𝑦𝑦 2 , … , 𝒙𝒙 𝑛𝑛 , 𝑦𝑦 𝑛𝑛), the model is assumed to be consistent across all points: 𝑦𝑦 (𝑖𝑖) = 𝜷𝜷⊤ 𝒙𝒙(𝑖𝑖) + 𝛽𝛽0.
    • The goal of machine learning is to determine the parameters 𝜷𝜷 and 𝛽𝛽0.

    One Small Tweak …

    • Adding 1 to 𝒙𝒙 and 𝛽𝛽0 allows the model to be represented as 𝑦𝑦 (𝑖𝑖) = 𝜷𝜷⊤ 𝒙𝒙(𝑖𝑖).

    Geometric Intuition

    • The goal is to find the hyperplane that is closest to the observed data points.
    • Key point: The fitted function is linear. A unit change in 𝑥𝑥𝑖𝑖 always influences the 𝑦𝑦 by the magnitude of 𝛽𝛽𝑖𝑖, regardless of 𝒙𝒙 or 𝑦𝑦.

    The Loss Function

    • Defines the error of the model.

    Probabilistic Perspective

    • The model is parameterized by 𝜷𝜷 and takes input 𝒙𝒙, producing an output 𝑓𝑓𝜷𝜷 𝒙𝒙.
    • 𝑓𝑓𝜷𝜷 𝒙𝒙 becomes the (input-dependent) parameter 𝜇𝜇 for a Gaussian distribution with a standard deviation of 1 (𝜎𝜎 = 1).
    • The ground truth 𝑦𝑦 𝑖𝑖 is drawn from this distribution: 𝑦𝑦 𝑖𝑖 ~𝒩𝒩 𝑓𝑓𝜷𝜷 𝒙𝒙 𝑖𝑖 , 1.
    • Equivalently: 𝑦𝑦 𝑖𝑖 = 𝑓𝑓𝜷𝜷 𝒙𝒙 𝑖𝑖 + 𝜖𝜖, 𝜖𝜖 ~ 𝒩𝒩(0, 1).

    Maximum Likelihood for Gaussian

    • Given data 𝑦𝑦 , … , 𝑦𝑦 𝑁𝑁

    • The Gaussian probability is: 𝑃𝑃 𝑦𝑦 𝜇𝜇, 𝜎𝜎 = exp − 𝑁𝑁 𝑁𝑁 𝑖𝑖 2 𝑖𝑖 1 𝑦𝑦 − 𝜇𝜇

                                                                                2𝜋𝜋𝜎𝜎                        2𝜎𝜎 2
                                      𝑖𝑖=1                          𝑖𝑖=1
      
    • Taking the log and removing unrelated terms: 𝜇𝜇 ∗ = argmax log exp − 𝑁𝑁 𝑖𝑖 2 𝑁𝑁 𝑖𝑖 2 1 𝑦𝑦 − 𝜇𝜇 𝑦𝑦 − 𝜇𝜇 𝜇𝜇 2𝜋𝜋𝜎𝜎 2𝜎𝜎 2 𝜇𝜇 2𝜎𝜎 2 𝑖𝑖=1 𝑖𝑖=1 𝑁𝑁 2 𝜇𝜇∗ = argmin 𝑦𝑦 𝑖𝑖 − 𝜇𝜇 𝜇𝜇 𝑖𝑖=1

    Plugging in …

    • 𝜇𝜇 (𝑖𝑖) = 𝑦𝑦 𝑖𝑖 = 𝑓𝑓𝜷𝜷 𝒙𝒙
    • 𝜷𝜷 ∗ = argmin 𝑦𝑦 − 𝑓𝑓𝜷𝜷 𝒙𝒙 𝑁𝑁 2 ∗ 𝑖𝑖 𝑖𝑖 𝜷𝜷 𝑖𝑖=1
    • Linear regression can be understood as MLE assuming the label contains noise from the Gaussian distribution.

    Cross Entropy

    • 𝐿𝐿XE = − 𝑦𝑦 log 𝑦𝑦 + 1 − 𝑦𝑦 log 1 − 𝑦𝑦 𝑁𝑁 𝑖𝑖=1 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖
    • 𝐻𝐻 𝑃𝑃, 𝑄𝑄 = 𝐸𝐸𝑃𝑃 − log 𝑄𝑄(𝑋𝑋) 𝑁𝑁 𝑖𝑖=1

    Monte Carlo Expectation

    • 𝐸𝐸𝑃𝑃 𝑓𝑓 𝑦𝑦 = 𝑓𝑓 𝑦𝑦 𝑃𝑃 𝑌𝑌 = 𝑦𝑦 𝑑𝑑𝑦𝑦
    • Integral can be approximated by drawing 𝑦𝑦 1 , … , 𝑦𝑦 𝐾𝐾 from 𝑃𝑃 𝑌𝑌 𝐾𝐾 1 𝑖𝑖 𝐸𝐸𝑃𝑃 𝑓𝑓 𝑦𝑦 ≈ 𝑓𝑓 𝑦𝑦 𝐾𝐾 𝑖𝑖=1

    Why the Name?

    • Cross entropy: 𝐻𝐻 𝑃𝑃, 𝑄𝑄 = 𝐸𝐸𝑃𝑃 − log 𝑄𝑄(𝑋𝑋)
    • 𝑦𝑦 (i) is drawn from an unknown distribution − 𝑦𝑦 i log 𝑦𝑦 i + 1 − 𝑦𝑦 i log 1 − 𝑦𝑦 i 𝑁𝑁 𝑃𝑃 𝑌𝑌 𝑿𝑿 𝑖𝑖=1
    • 𝑦𝑦 (i) 𝑖𝑖 is the probability 𝑄𝑄 𝑌𝑌 = 1 𝒙𝒙 , 𝜷𝜷 1 i = −δ 𝑦𝑦 = 1 log𝑄𝑄 𝑌𝑌 = 1 𝒙𝒙 𝑖𝑖 , 𝜷𝜷 𝑁𝑁 𝑖𝑖=1
    • 1 − 𝑦𝑦 (i) is the probability 𝑄𝑄 𝑌𝑌 = 0 𝒙𝒙 𝑖𝑖 , 𝜷𝜷 −δ 𝑦𝑦 i = 0 log𝑄𝑄 𝑌𝑌 = 0 𝒙𝒙 𝑖𝑖 , 𝜷𝜷

    Information Theoretical Perspective

    • The cross-entropy is related to the KL divergence 𝐻𝐻 𝑃𝑃, 𝑄𝑄 = −𝐸𝐸𝑃𝑃 𝑥𝑥 log 𝑄𝑄 𝑥𝑥 = 𝐻𝐻 𝑃𝑃 + 𝐾𝐾𝐾𝐾(𝑃𝑃||𝑄𝑄)
    • Minimizing the loss minimizes the distance between the GT distribution 𝑃𝑃 𝑦𝑦 𝑖𝑖 𝒙𝒙 𝑖𝑖 and estimated distribution 𝑄𝑄 𝑦𝑦 𝑖𝑖 𝒙𝒙 𝑖𝑖 , 𝜷𝜷.

    2D Functions

    • Loss surface in 2D = contour diagrams / level sets 𝐿𝐿𝑎𝑎 𝑓𝑓 = 𝒙𝒙 𝑓𝑓 𝒙𝒙 = 𝑎𝑎}

    • The gradient direction is the direction along which the function value changes the fastest (for a small change of 𝒙𝒙 in Euclidean norm).

    • Along the level set, the function value doesn’t change.

    • For a differentiable function 𝑓𝑓(𝒙𝒙), its gradient of at any point is either zero or perpendicular to the level set at that point.

    Gradient Descent on Convex Functions

    • 𝜷𝜷𝑡𝑡 = 𝜷𝜷𝑡𝑡−1 − 𝜂𝜂
      d𝑓𝑓 𝑥𝑥, 𝜷𝜷 d𝜷𝜷 𝜷𝜷 100 𝑡𝑡−1 90 80
    • The learning rate 𝜂𝜂 determines how much we move at each step. We cannot move too much because the gradient is a local approximation of the function. Thus, the learning rate is usually small.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Lecture 3 ML Foundations PDF

    Description

    This quiz covers fundamental concepts of machine learning, including the distinction between classification and regression. It explains practical applications such as image classification, tweet emotion recognition, and recommender systems. Test your understanding of these key topics in machine learning.

    More Like This

    Use Quizgecko on...
    Browser
    Browser