Machine Learning Concepts Quiz
47 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a consequence of using a polynomial with a high degree in regression?

  • It will simplify the model.
  • It will always underfit the data.
  • It reduces the likelihood of overfitting.
  • It will fit the data exactly. (correct)
  • Which condition is necessary for the dual method of ridge regression to be recommended?

  • d > n (correct)
  • d < n
  • d = 0
  • d = n
  • What is a necessary feature of the kernel function k(x, z) = Φ(x) · Φ(z)?

  • It must be a linear function.
  • It must be positive semidefinite. (correct)
  • It must be negative definite.
  • It must be symmetric only.
  • What happens if the primal perceptron algorithm terminates?

    <p>The kernel perceptron algorithm might or might not terminate.</p> Signup and view all the answers

    When using a matrix M for projection in the kernel perceptron, which of the following statements is true?

    <p>Raw data may be linearly separable, but projected data might not be.</p> Signup and view all the answers

    Which type of splits does a decision tree perform?

    <p>Binary splits to maximize information gain.</p> Signup and view all the answers

    What characterizes the kernel matrix K when using the feature map Φ(X) = XM >?

    <p>It is always symmetric.</p> Signup and view all the answers

    What is the implication of using a dual algorithm with a high-dimensional feature set?

    <p>It increases the risk of overfitting.</p> Signup and view all the answers

    What is the risk of a classification rule r(X) when P(X) is not known?

    <p>The risk cannot be computed without data distribution.</p> Signup and view all the answers

    If P(X = x) changes but P(Y = y|X = x) remains the same, what can be concluded about r(X)?

    <p>r(X) will minimize the risk regardless of the change in P(X).</p> Signup and view all the answers

    Under what conditions can LDA and QDA classifiers be considered identical?

    <p>When class means µ̂C and µ̂D are equal.</p> Signup and view all the answers

    What statement is true regarding the posterior probability P(Y = C|X = x) if LDA and QDA classifiers are identical?

    <p>It is linear in terms of x.</p> Signup and view all the answers

    What can be inferred about the QDA decision function if the covariance matrices are different?

    <p>The function becomes nonlinear.</p> Signup and view all the answers

    Which kernel is used in dual ridge regression as described?

    <p>k(Xi , X j ) = (XiT X j + 1)^p</p> Signup and view all the answers

    In dual ridge regression with the polynomial kernel, how does regularization (λ > 0) affect the model?

    <p>It reduces the risk of overfitting.</p> Signup and view all the answers

    What can be concluded if Σ̂C = I (the identity matrix) and Σ̂D = 5I?

    <p>The classifiers will yield different decision boundaries.</p> Signup and view all the answers

    What does a hard-margin SVM require about the data for it to create a decision boundary?

    <p>Data must be linearly separable.</p> Signup and view all the answers

    In a soft-margin SVM, what role does the hyperparameter C play?

    <p>Balances between maximizing margin and minimizing classification error.</p> Signup and view all the answers

    Which of the following characterizes the decision boundary learned by Linear Discriminant Analysis (LDA)?

    <p>It is always a linear function of the features.</p> Signup and view all the answers

    What is the relationship between a kernel function and a feature map in kernel PCA?

    <p>The kernel represents inner products of the feature map's transformations.</p> Signup and view all the answers

    Which statement is true regarding the margin in SVM classification?

    <p>Soft-margin SVM allows for wider margins than hard-margin SVM.</p> Signup and view all the answers

    What is indicated by a nonzero eigenvalue in the context of principal component analysis?

    <p>The direction is significant and reflects variance in the data.</p> Signup and view all the answers

    Kernel Principal Components Analysis primarily differs from standard PCA in what aspect?

    <p>It uses a feature map to enable non-linear transformations.</p> Signup and view all the answers

    In the equation $X^TXw = \lambda w$, what does the symbol $\lambda$ represent?

    <p>A scalar eigenvalue associated with the eigenvector $w$.</p> Signup and view all the answers

    What is the role of the number k in clustering algorithms?

    <p>It represents the number of clusters to form.</p> Signup and view all the answers

    Which statement about complete linkage in clustering is accurate?

    <p>Complete linkage can work with any distance function.</p> Signup and view all the answers

    How does single linkage clustering differ from complete linkage?

    <p>Single linkage is more sensitive to outliers than complete linkage.</p> Signup and view all the answers

    What characterizes the Fiedler vector in spectral clustering?

    <p>It corresponds to the second smallest eigenvalue of the Laplacian matrix.</p> Signup and view all the answers

    What does the relaxed optimization problem for partitioning a graph involve?

    <p>Minimizing the Rayleigh quotient of the Laplacian matrix and an indicator vector.</p> Signup and view all the answers

    Which statement is true regarding the Laplacian matrix in spectral clustering?

    <p>The Laplacian matrix is never invertible because 1 is always in the nullspace.</p> Signup and view all the answers

    What is a feature of AdaBoost concerning decision trees?

    <p>The metalearner in AdaBoost gives a classification while estimating posterior probabilities.</p> Signup and view all the answers

    Which statement is false regarding the application of AdaBoost?

    <p>AdaBoost is limited to specific types of classifiers.</p> Signup and view all the answers

    What is the inner product representation of the matrix K?

    <p>K = Φ(X)&gt;Φ(X)</p> Signup and view all the answers

    Which of the following correctly represents the first principal component direction?

    <p>The vector w that maximizes w&gt; Φ(X)Φ(X) w</p> Signup and view all the answers

    What are the matrices B and C defined in the context of the generalized Rayleigh quotient?

    <p>B = K^2, C = K</p> Signup and view all the answers

    What effect does the balance constraint have on the indicator vector y in the minimum bisection problem?

    <p>It requires that the sum of the elements equals zero.</p> Signup and view all the answers

    Which of the following matrices represents the Laplacian matrix L for the given graph?

    <p>L = [[2, -1, 0, 0], [-1, 3, -1, -1], [0, -1, 3, -1], [0, 0, -1, 2]]</p> Signup and view all the answers

    What is the strict binary constraint in the context of the minimum bisection problem?

    <p>Each element of y must be either 1 or -1.</p> Signup and view all the answers

    Which expression is maximized in the generalized Rayleigh quotient formulation?

    <p>a&gt; Ba</p> Signup and view all the answers

    What is the expression for ∇µ1 ` based on the provided content?

    <p>$ rac{1}{n} imes (X_i - heta) au_i$</p> Signup and view all the answers

    Which of the following correctly represents ∇µ2 `?

    <p>$ rac{1}{n} (1 - au_i)(X_i - µ2)$</p> Signup and view all the answers

    What is the objective of the k-means-like algorithm described?

    <p>To estimate parameters µ1, µ2, and θ using fixed τi values</p> Signup and view all the answers

    What does the notation τi represent in the algorithm?

    <p>The membership indicator for data points</p> Signup and view all the answers

    What parameter values must be initialized in the k-means-like algorithm?

    <p>τ1, τ2,..., τn to arbitrary values</p> Signup and view all the answers

    What is the relationship between the parameters µ1, µ2, and θ?

    <p>They can be derived from τi values</p> Signup and view all the answers

    Which method is used to update Gaussian cluster parameters in the algorithm?

    <p>By fixing the τi and computing new means</p> Signup and view all the answers

    What is the purpose of the repeated steps in the k-means-like algorithm?

    <p>To converge the τi values towards a maximum likelihood</p> Signup and view all the answers

    Study Notes

    Exam Instructions

    • Do not open the exam before instructed.
    • Electronic devices (phones, iPods, headphones, laptops) are prohibited.
    • Ensure all 12 pages and 6 questions are present.
    • Write initials at top right of each page after the first.
    • Exam is closed-book, closed-notes, two cheat sheets allowed.
    • Exam duration: 3 hours.
    • Answer on exam paper only.
    • Total points: 150.
    • Multiple choice questions (26): 3 points each.
    • Written questions (5): 72 points total.
    • Multiple answer questions: all correct choices must be marked.
    • No partial credit for multiple answer questions.

    Exam Details

    • Exam covers Introduction to Machine Learning
    • Spring 2019
    • Exam is final
    • Student should bring two cheat sheets.
    • A total of 150 points are available.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your understanding of critical concepts in machine learning, including regression techniques, kernel functions, and decision trees. This quiz covers theoretical aspects and practical implications of using various algorithms in different scenarios.

    Use Quizgecko on...
    Browser
    Browser