Machine Learning Concepts Quiz
34 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the effect of increasing the sample size M on the accuracy of error estimates?

  • The error estimates become less variable (correct)
  • The error estimates become zero
  • The error estimates become more variable
  • The error estimates remain unchanged
  • Increasing the number of training samples (N) while keeping the test set unchanged will likely decrease the test error.

    True

    What is the primary disadvantage of non-parametric models?

    They are computationally expensive and may overfit the training dataset.

    The expected error does not depend on M, but it does depend on ______.

    <p>N</p> Signup and view all the answers

    Match the following terms with their definitions:

    <p>Parametric Model = Dimensionality is constant with respect to dataset Non-parametric Model = Uses entire training dataset to make predictions Decision Tree = Axis aligned boundaries for decision making Regression Tree = Used for predicting continuous variables</p> Signup and view all the answers

    What is a significant advantage of using the ReLU activation function in neural networks?

    <p>Mitigates the vanishing gradient problem</p> Signup and view all the answers

    The use of GPU processing has made training large neural networks impractical.

    <p>False</p> Signup and view all the answers

    What innovative architectural feature does GoogLeNet utilize to improve training of deep networks?

    <p>Bottlenecks and inception modules</p> Signup and view all the answers

    The process by which gradients shrink as they travel back through layers is called the __________ problem.

    <p>vanishing gradient</p> Signup and view all the answers

    Match the following concepts with their descriptions:

    <p>ReLU Activation Function = Faster convergence in training deep networks ImageNet = Dataset containing millions of labeled images Bottlenecks = Inception modules with multiple filters and pooling operations Multiple Stages of Supervision = Enhances gradient flow to earlier layers during training</p> Signup and view all the answers

    What makes Naive Bayes a 'naive' classifier?

    <p>It assumes complete independence between features.</p> Signup and view all the answers

    Naive Bayes generally provides better generalization performance than more sophisticated learning methods.

    <p>False</p> Signup and view all the answers

    What is the primary computational method used in Naive Bayes to avoid numerical instability?

    <p>Log transformation</p> Signup and view all the answers

    The primary goal of the E-step in the Expectation-Maximization method is to estimate the expected value of _____ variables.

    <p>hidden</p> Signup and view all the answers

    What does high bias often indicate in a model's performance?

    <p>The model fails to capture underlying patterns.</p> Signup and view all the answers

    Increasing model complexity will always result in a decrease in training error.

    <p>True</p> Signup and view all the answers

    Match the following terms with their descriptions:

    <p>Naive Bayes = Probabilistic classifier based on feature independence E-M Algorithm = Iterative method for maximum likelihood estimates Log Transformation = Technique to handle small probability computations Gaussian Mixture Models = Statistical models for clustering data with hidden components</p> Signup and view all the answers

    What happens to validation error when a model starts to overfit?

    <p>Validation error starts to rise.</p> Signup and view all the answers

    In which of the following situations might Naive Bayes perform poorly?

    <p>When sufficient labeled data is available.</p> Signup and view all the answers

    ____ bias occurs when the model consistently misses the target due to being overly simplistic.

    <p>High</p> Signup and view all the answers

    The M-step is responsible for updating model parameters to maximize the expected log-likelihood function.

    <p>True</p> Signup and view all the answers

    Match the following types of bias and variance characteristics:

    <p>Low Bias, Low Variance = High accuracy and consistency Low Bias, High Variance = High sensitivity to training data High Bias, Low Variance = Predictable but consistently off-target High Bias, High Variance = Inaccurate and widely spread predictions</p> Signup and view all the answers

    What is one characteristic of the E-M algorithm?

    <p>It is widely applicable and iterative.</p> Signup and view all the answers

    What is the consequence of a model that has both high bias and high variance?

    <p>The model struggles with generalization.</p> Signup and view all the answers

    Overfitting leads to a decrease in training error but can increase test error.

    <p>True</p> Signup and view all the answers

    What effect does model complexity have on the accuracy of a test set after initially improving it?

    <p>It decreases after reaching an optimal point.</p> Signup and view all the answers

    What is the primary purpose of batch normalization?

    <p>To normalize feature distributions using their empirical mean and variance</p> Signup and view all the answers

    Batch normalization is performed over the entire dataset to ensure consistent feature learning.

    <p>False</p> Signup and view all the answers

    What does padding do in a convolutional neural network?

    <p>Padding allows the convolution to be applied over the image’s border.</p> Signup and view all the answers

    The ______ in a convolution layer indicates the amount of steps to move the filter across the input image.

    <p>stride</p> Signup and view all the answers

    Match each term with its correct description.

    <p>Data Augmentation = Changing an image by rotating or flipping it Fine-tuning = Adjusting the learning rate for existing weights Linear Probing = Using an encoder to train over new classes Warm Start = Freezing lower layers for some epochs</p> Signup and view all the answers

    Which statement about linear probing is true?

    <p>It utilizes an encoder to train a network over new classes.</p> Signup and view all the answers

    The 'freeze encoder method' involves updating existing encoder weights during training.

    <p>False</p> Signup and view all the answers

    What is meant by 'warm start' in the context of training neural networks?

    <p>Warm start refers to freezing lower layers temporarily and possibly starting with a small learning rate that is gradually increased.</p> Signup and view all the answers

    Study Notes

    Error Measurement

    • Increasing the number of test samples (M) improves the accuracy of the error measurement.
    • Increasing the number of training samples (N) decreases the expected test error.

    Model Complexity

    • Parametric models have fixed complexity, as the model's dimensionality is constant with respect to the dataset size.
    • Non-parametric models become more complex as the training dataset size increases.
    • Decision trees create axis-aligned boundaries for classification.

    Naive Bayes

    • It is a probabilistic classifier that assumes features are independent given the class label.
    • It uses class conditional probabilities and prior probabilities to estimate the posterior probability.
    • It is highly efficient for learning and prediction.
    • It may generalize poorly compared to more sophisticated methods.

    EM Algorithm

    • An iterative method for finding maximum likelihood estimates when data has missing or hidden components.
    • Useful for unlabeled clustering (Gaussian mixture models), bad annotators problem, foreground/background segmentation, and topic models.
    • Consists of three steps: initialization, expectation step (E-step), and maximization step (M-step).

    Bias, Variance, and Model Complexity

    • Bias: error due to approximating a complex problem with a simpler model. High bias indicates underfitting.
    • Variance: model's sensitivity to training data. High variance indicates overfitting.
    • As model complexity increases, the model's accuracy on the test set increases until a certain optimal point and then decreases due to overfitting.

    GoogLeNet

    • Key factors behind the breakthrough:
      • ReLU activation function: faster convergence and allows training deeper networks efficiently.
      • ImageNet dataset: millions of labeled images across thousands of categories.
      • GPU processing: enabled efficient training of large neural networks.
    • Solved the vanishing gradient problem with the following key features:
      • Bottlenecks (Inception modules): use multiple convolutional filters and pooling operations in parallel, reducing the number of channels before larger convolutions.
      • Multiple stages of supervision: smaller networks provide additional supervision to improve gradient flow to earlier layers, enhancing convergence and accuracy.

    Vanishing Gradients and Information Propagation

    • Vanishing gradients occur when gradients shrink significantly during backpropagation in deep networks, preventing optimization of early layers.
    • This is caused by multiple factors: long paths for early weights, the presence of zero gradients along these paths, and inability of early layers to learn due to small gradients.

    Batch Normalization

    • Normalizes features using empirical mean and variance, helping to mitigate the internal covariate shift problem.
    • Not applied to the entire dataset due to computational efficiency.
    • Normalization for each batch allows for more efficient training using SGD.

    Convolutional Neural Networks (CNNs)

    • Padding: allows convolutions to be applied at the borders of the image to capture information near the edges.
    • Stride: defines the step size for the convolution filter, essentially down-sampling the image.

    Data Augmentation

    • Does not necessarily generates the same type of features at every layer of the deep network.
    • Aims to generate diverse input data for the model to learn more robust features.

    Fine-tuning and Linear Probing

    • Linear probing: uses a pre-trained feature extractor to perform linear classification over new classes without training the encoder.
    • Freeze encoder method: freezes the weights of the pre-trained encoder and only trains the classifier for new classes.
    • Fine-tuning: adjusts encoder weights to improve performance for new classes with a smaller learning rate.
    • Warm start: freezes lower layers for a few epochs to allow the classification layer to learn, then gradually unfreezes layers to fine-tune them.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Midterm Notes 2 PDF

    Description

    Test your knowledge on key machine learning concepts such as error measurement, model complexity, Naive Bayes classifiers, and the EM algorithm. This quiz will help reinforce your understanding of these fundamental topics in machine learning.

    More Like This

    Machine Learning Algorithms Quiz
    11 questions
    Linear Classifiers and Naive Bayes Quiz
    16 questions
    Data Preprocessing for Bernoulli Naive Bayes
    21 questions
    Machine Learning B. Tech III-SEM -I
    27 questions
    Use Quizgecko on...
    Browser
    Browser