Machine Learning Concepts Quiz

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is a consequence of using a polynomial with a high degree in regression?

It will simplify the model.
It will always underfit the data.
It reduces the likelihood of overfitting.
It will fit the data exactly. (correct)

Which condition is necessary for the dual method of ridge regression to be recommended?

d > n (correct)
d < n
d = 0
d = n

What is a necessary feature of the kernel function k(x, z) = Φ(x) · Φ(z)?

It must be a linear function.
It must be positive semidefinite. (correct)
It must be negative definite.
It must be symmetric only.

What happens if the primal perceptron algorithm terminates?

The kernel perceptron algorithm might or might not terminate. (A) Signup and view all the answers

When using a matrix M for projection in the kernel perceptron, which of the following statements is true?

Raw data may be linearly separable, but projected data might not be. (C) Signup and view all the answers

Which type of splits does a decision tree perform?

Binary splits to maximize information gain. (D) Signup and view all the answers

What characterizes the kernel matrix K when using the feature map Φ(X) = XM >?

It is always symmetric. (B) Signup and view all the answers

What is the implication of using a dual algorithm with a high-dimensional feature set?

It increases the risk of overfitting. (A) Signup and view all the answers

What is the risk of a classification rule r(X) when P(X) is not known?

The risk cannot be computed without data distribution. (B) Signup and view all the answers

If P(X = x) changes but P(Y = y|X = x) remains the same, what can be concluded about r(X)?

r(X) will minimize the risk regardless of the change in P(X). (D) Signup and view all the answers

Under what conditions can LDA and QDA classifiers be considered identical?

When class means µ̂C and µ̂D are equal. (A), When the covariance matrices Σ̂C and Σ̂D are identical. (D) Signup and view all the answers

What statement is true regarding the posterior probability P(Y = C|X = x) if LDA and QDA classifiers are identical?

It is linear in terms of x. (C) Signup and view all the answers

What can be inferred about the QDA decision function if the covariance matrices are different?

The function becomes nonlinear. (D) Signup and view all the answers

Which kernel is used in dual ridge regression as described?

k(Xi , X j ) = (XiT X j + 1)^p (B) Signup and view all the answers

In dual ridge regression with the polynomial kernel, how does regularization (λ > 0) affect the model?

It reduces the risk of overfitting. (B) Signup and view all the answers

What can be concluded if Σ̂C = I (the identity matrix) and Σ̂D = 5I?

The classifiers will yield different decision boundaries. (A) Signup and view all the answers

What does a hard-margin SVM require about the data for it to create a decision boundary?

Data must be linearly separable. (B) Signup and view all the answers

In a soft-margin SVM, what role does the hyperparameter C play?

Balances between maximizing margin and minimizing classification error. (B) Signup and view all the answers

Which of the following characterizes the decision boundary learned by Linear Discriminant Analysis (LDA)?

It is always a linear function of the features. (C) Signup and view all the answers

What is the relationship between a kernel function and a feature map in kernel PCA?

The kernel represents inner products of the feature map's transformations. (B) Signup and view all the answers

Which statement is true regarding the margin in SVM classification?

Soft-margin SVM allows for wider margins than hard-margin SVM. (A) Signup and view all the answers

What is indicated by a nonzero eigenvalue in the context of principal component analysis?

The direction is significant and reflects variance in the data. (C) Signup and view all the answers

Kernel Principal Components Analysis primarily differs from standard PCA in what aspect?

It uses a feature map to enable non-linear transformations. (A) Signup and view all the answers

In the equation $X^TXw = \lambda w$, what does the symbol $\lambda$ represent?

A scalar eigenvalue associated with the eigenvector $w$. (D) Signup and view all the answers

What is the role of the number k in clustering algorithms?

It represents the number of clusters to form. (C) Signup and view all the answers

Which statement about complete linkage in clustering is accurate?

Complete linkage can work with any distance function. (C) Signup and view all the answers

How does single linkage clustering differ from complete linkage?

Single linkage is more sensitive to outliers than complete linkage. (C) Signup and view all the answers

What characterizes the Fiedler vector in spectral clustering?

It corresponds to the second smallest eigenvalue of the Laplacian matrix. (D) Signup and view all the answers

What does the relaxed optimization problem for partitioning a graph involve?

Minimizing the Rayleigh quotient of the Laplacian matrix and an indicator vector. (A) Signup and view all the answers

Which statement is true regarding the Laplacian matrix in spectral clustering?

The Laplacian matrix is never invertible because 1 is always in the nullspace. (A) Signup and view all the answers

What is a feature of AdaBoost concerning decision trees?

The metalearner in AdaBoost gives a classification while estimating posterior probabilities. (D) Signup and view all the answers

Which statement is false regarding the application of AdaBoost?

AdaBoost is limited to specific types of classifiers. (A) Signup and view all the answers

What is the inner product representation of the matrix K?

K = Φ(X)>Φ(X) (D) Signup and view all the answers

Which of the following correctly represents the first principal component direction?

The vector w that maximizes w> Φ(X)Φ(X) w (A) Signup and view all the answers

What are the matrices B and C defined in the context of the generalized Rayleigh quotient?

B = K^2, C = K (C) Signup and view all the answers

What effect does the balance constraint have on the indicator vector y in the minimum bisection problem?

It requires that the sum of the elements equals zero. (C) Signup and view all the answers

Which of the following matrices represents the Laplacian matrix L for the given graph?

L = [[2, -1, 0, 0], [-1, 3, -1, -1], [0, -1, 3, -1], [0, 0, -1, 2]] (A) Signup and view all the answers

What is the strict binary constraint in the context of the minimum bisection problem?

Each element of y must be either 1 or -1. (A) Signup and view all the answers

Which expression is maximized in the generalized Rayleigh quotient formulation?

a> Ba (B) Signup and view all the answers

What is the expression for ∇µ1 ` based on the provided content?

$rac{1}{n} imes (X_i - heta) au_i$ (C) Signup and view all the answers

Which of the following correctly represents ∇µ2 `?

$rac{1}{n} (1 - au_i)(X_i - µ2)$ (B) Signup and view all the answers

What is the objective of the k-means-like algorithm described?

To estimate parameters µ1, µ2, and θ using fixed τi values (D) Signup and view all the answers

What does the notation τi represent in the algorithm?

The membership indicator for data points (A) Signup and view all the answers

What parameter values must be initialized in the k-means-like algorithm?

τ1, τ2,..., τn to arbitrary values (C) Signup and view all the answers

What is the relationship between the parameters µ1, µ2, and θ?

They can be derived from τi values (A) Signup and view all the answers

Which method is used to update Gaussian cluster parameters in the algorithm?

By fixing the τi and computing new means (D) Signup and view all the answers

What is the purpose of the repeated steps in the k-means-like algorithm?

To converge the τi values towards a maximum likelihood (A) Signup and view all the answers

Flashcards

Dual Ridge Regression Overfitting

When n is very large, this dual algorithm is more likely to overfit than the primal algorithm with degree-p polynomial features.

Primal vs. Dual Ridge Regression

Both primal and dual ridge regression give the same solution, no matter the size of n.

Kernel Matrix Calculation

The kernel matrix is XM > MX > , where XM is the matrix multiplication of the design matrix X and the matrix M.