ELE888 - Final (Theory)

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following is NOT a primary operation within a Convolutional Neural Network (CNN)?

  • Pooling or sub-sampling
  • Backpropagation (correct)
  • Convolution
  • Non-linearity (ReLU)

What is the primary role of convolution in a CNN?

  • To classify the input image into various categories.
  • To introduce non-linearity into the network.
  • To reduce the dimensionality of the input image.
  • To extract features from the input image. (correct)

How does a larger stride affect the size of the feature maps in a convolutional operation?

  • It produces larger feature maps.
  • It produces smaller feature maps. (correct)
  • It does not affect the size of the feature maps.
  • It doubles the size of the feature maps.

What is the function of the ReLU non-linearity in a CNN?

<p>To introduce non-linearity, enabling the network to learn complex patterns. (D)</p> Signup and view all the answers

What is the purpose of spatial pooling in a CNN?

<p>To reduce the dimensionality of the feature maps while retaining important information. (B)</p> Signup and view all the answers

What type of layer is typically used for classification in a CNN?

<p>Fully Connected Layer (C)</p> Signup and view all the answers

Which of the following best describes the role of the Fully Connected layer in a CNN?

<p>Classifying the input image based on high-level features (B)</p> Signup and view all the answers

In the context of CNNs, what does 'zero padding' achieve?

<p>It controls the size of the feature map by adding padding around the input. (B)</p> Signup and view all the answers

What is the significance of preserving the spatial relationship between pixels during convolution?

<p>It improves the network's ability to understand object arrangements and structures. (A)</p> Signup and view all the answers

What do Convolution and Pooling layers act as?

<p>Feature Extractors (C)</p> Signup and view all the answers

What is the primary goal of Principal Component Analysis (PCA)?

<p>To reduce the number of features while preserving variance. (C)</p> Signup and view all the answers

Under what condition is PCA most effective?

<p>When the data contains correlations between the features (D)</p> Signup and view all the answers

What does a larger covariance between two measurements in a dataset indicate?

<p>A stronger correlation (D)</p> Signup and view all the answers

Why might adding more features negatively impact the performance of a machine learning algorithm?

<p>It complicates the model and may lead to overfitting. (B)</p> Signup and view all the answers

In the context of neural networks, what is the purpose of backpropagation?

<p>To update the weights based on the error between prediction and actual output. (B)</p> Signup and view all the answers

What is the role of the learning rate in the backpropagation algorithm?

<p>It controls how quickly the weights are adjusted. (A)</p> Signup and view all the answers

What is a potential disadvantage of backpropagation?

<p>It is susceptible to overfitting. (D)</p> Signup and view all the answers

What is the key assumption made by the Naive Bayes algorithm?

<p>Features are independent given the class label. (D)</p> Signup and view all the answers

For which tasks is Naive Bayes particularly suitable?

<p>Text classification tasks (B)</p> Signup and view all the answers

What should you try if a learning algorithm is suffering from high variance?

<p>Both B and C (D)</p> Signup and view all the answers

What is the primary purpose of convolution in CNNs?

<p>C) To extract features from the input image (C)</p> Signup and view all the answers

What does a feature map represent in CNNs?

<p>B) The result of the convolution operation (B)</p> Signup and view all the answers

Which operation in CNN helps to introduce non-linearity?

<p>C) ReLU (C)</p> Signup and view all the answers

What does pooling do in CNNs?

<p>B) It reduces the dimensionality of feature maps (B)</p> Signup and view all the answers

What is the primary purpose of the fully connected layer in CNNs?

<p>C) To classify the input image based on extracted features (C)</p> Signup and view all the answers

Which of the following is true about the convolution operation in CNNs?

<p>B) It works by learning and applying feature detectors to input images (B)</p> Signup and view all the answers

What is the core concept of backpropagation in neural networks?

<p>A) Minimize the error by adjusting the weights iteratively (A)</p> Signup and view all the answers

What role does the learning rate play in backpropagation?

<p>A) Determines how fast weights are updated (A)</p> Signup and view all the answers

Which of the following is an advantage of backpropagation?

<p>B) It is efficient in learning and adaptable (B)</p> Signup and view all the answers

What is the main disadvantage of backpropagation?

<p>B) It is computationally expensive and can lead to overfitting (B)</p> Signup and view all the answers

Which Naive Bayes classifier is used for continuous data?

<p>C) Gaussian Naive Bayes (C)</p> Signup and view all the answers

What assumption does Naive Bayes make about the features?

<p>B) Features are independent given the class label (B)</p> Signup and view all the answers

In which scenario is Naive Bayes particularly effective?

<p>B) For handling sparse datasets, such as in text classification (B)</p> Signup and view all the answers

Which of the following is a disadvantage of Naive Bayes?

<p>B) It assumes features are independent, which is often unrealistic (B)</p> Signup and view all the answers

What is a key advantage of Naive Bayes?

<p>B) It is computationally inexpensive and fast (B)</p> Signup and view all the answers

What is the primary purpose of PCA?

<p>B) To reduce the number of features while preserving variance (B)</p> Signup and view all the answers

PCA is most effective when:

<p>B) The features are highly correlated (B)</p> Signup and view all the answers

What is a disadvantage of PCA?

<p>B) It is sensitive to outliers (B)</p> Signup and view all the answers

What does the covariance matrix in PCA represent?

<p>C) The relationship between pairs of features (C)</p> Signup and view all the answers

Which type of data is PCA not well-suited for?

<p>C) Data with significant non-linear relationships (C)</p> Signup and view all the answers

What is a common strategy for dealing with high variance (overfitting) in a machine learning model?

<p>B) Use more training samples (B)</p> Signup and view all the answers

If a learning algorithm is suffering from high bias (underfitting), what should you try?

<p>B) Increase the model complexity (B)</p> Signup and view all the answers

What is one of the diagnostic tests used to improve the performance of a machine learning model?

<p>C) Test different combinations of hyperparameters (C)</p> Signup and view all the answers

Flashcards

Convolutional Neural Networks (CNNs)

Neural networks effective for image recognition and classification.

Purpose of Convolution in CNNs

Extract features from the input image.

Convolved Feature/Feature Map

Matrix output of a convolution operation.

Depth (Feature Map Size)

Number of filters used in convolution.

Signup and view all the flashcards

Stride (Feature Map Size)

Pixels filter slides over input matrix.

Signup and view all the flashcards

Zero Padding

Filter applied to bordering elements to control feature map size.

Signup and view all the flashcards

Non-linearity (ReLU)

Introduces non-linearity to the CNN.

Signup and view all the flashcards

Spatial Pooling

Reduces dimensionality of each feature map.

Signup and view all the flashcards

Purpose of Fully Connected Layer

Extract features; classify using training data.

Signup and view all the flashcards

Backpropagation

Adjust weights using gradient descent.

Signup and view all the flashcards

Gradient Descent

Iterative optimization algorithm for finding the minimum of a function.

Signup and view all the flashcards

Backpropagation's Use of Chain Rule

Calculates gradients of loss with respect to weights.

Signup and view all the flashcards

Importance of Proper Weight Initialization

Avoids vanishing or exploding gradients.

Signup and view all the flashcards

Naive Bayes

Uses Bayes' Theorem to classify data.

Signup and view all the flashcards

Key Assumption of Naive Bayes

Features are independent given the class label.

Signup and view all the flashcards

Principal Component Analysis (PCA)

Reduces features while preserving variance.

Signup and view all the flashcards

Principal Components

Directions of maximum variance in data.

Signup and view all the flashcards

PCA Use for Dimensionality Reduction

Reduces complexity and removes noise.

Signup and view all the flashcards

High Bias

Underfitting.

Signup and view all the flashcards

High Variance

Overfitting.

Signup and view all the flashcards

Study Notes

Convolutional Neural Networks (ConvNets or CNNs)

  • CNNs are neural networks effective for image recognition and classification.
  • CNNs have been successful in identifying faces, objects, and traffic signs, and in powering vision in robots and self-driving cars.
  • CNNs contain four main operations: Convolution, Non-linearity (ReLU), Pooling/Sub Sampling, and Classification (fully connected layer).
  • A channel is a component of an image; a standard digital camera image has three channels (red, green, blue), while a grayscale image has one channel.
  • Convolution in CNNs primarily extracts features from the input image.
  • Convolution preserves the spatial relationship between pixels using small squares of input data.
  • Filters act as feature detectors, with the image and filters represented as numeric matrices.
  • CNNs learn filter values during training, requiring specification of parameters like number of filters, filter size, and network architecture.
  • The output matrix of a convolution operation is called the Convolved Feature or Feature Map.
  • Feature Map size depends on depth (number of filters), stride (pixel shift over the input matrix; larger stride yields smaller feature maps), and zero padding.
  • Zero padding, also known as wide convolution, controls feature map size, whereas narrow convolution lacks zero padding.

Non-Linearity

  • Non-Linearity's output is the max of zero or the input.
  • The purpose of Non-Linearity is to introduce non-linearity to the CNN to prepare it for real-world problems.
  • Non-Linearity replaces all negative values in the feature map with zero.

Pooling Step

  • Spatial Pooling (subsampling/downsampling) reduces dimensionality while retaining key information and can be Max, Average, or Sum.
  • Pooling reduces the spatial size of the input representation.
  • Pooling makes input representations smaller, more manageable, reduces parameters/computations to control overfitting.
  • Pooling makes the network invariant to small transformations, distortions, and translations in the input image.
  • Pooling helps achieve a scale-invariant representation ("equivariant").

Fully Connected Layer

  • The Fully Connected layer is a Multi Layer Perceptron using a softmax activation function.
  • "Fully Connected" means every neuron in the previous layer connects to every neuron on the next layer.
  • Convolutional and pooling layers output high-level features. The Fully Connected layer classifies the input image into classes using these features.
  • Adding a fully-connected layer is a cheap way of learning non-linear feature combinations, and the Convolution + Pooling layers act as Feature Extractors while the Fully Connected layer classifies.
  • More convolution steps allow the network to recognize more complicated features.

Backpropagation

  • Backpropagation is a technique for training neural networks which updates weights using gradient descent to minimize the error (difference between prediction and actual output).
  • Gradient descent is an iterative optimization algorithm for finding the minimum of a function.
  • Neural network training finds weights that minimize prediction error.
  • Backpropagation uses gradient descent to adjust weights based on gradients of the loss function; the learning rate controls the weight adjustment amount in each iteration.
  • A high learning rate may cause instability, while a low rate may slow down training.
  • Backpropagation uses the chain rule to calculate gradients of the loss, enabling efficient calculation of how each weight change affects the network.
  • Non-linear functions like ReLU, Sigmoid, or Tanh introduce non-linearity, enabling the network to learn complex patterns, with proper initialization avoiding vanishing or exploding gradients.
  • Advantages include efficiency, optimization, flexibility, and adaptability, while disadvantages include high computational cost, susceptibility to overfitting and vanishing/exploding gradients, and the process getting stuck in local minima.

Naive Bayes

  • Naive Bayes uses Bayes Theorem, which applies conditional probability to classify data.
  • Types include Gaussian (normal distribution, continuous data), Multinomial (discrete data), and Bernoulli (binary/boolean features).
  • Naive Bayes suits text classification tasks like spam detection due to its ability to handle high-dimensional, sparse datasets.
  • The key assumption is that features are independent given the class label, which is often unrealistic but the model performs surprisingly well regardless.
  • Bayes theorem is the backbone of the algorithm, used to compute the probability.
  • The algorithm calculates priors (class probability) and likelihoods (probability of observing data given the class).
  • It is simple, interpretable, and often used as a baseline model before more complex classifiers.
  • It performs well with small datasets but struggles with highly correlated features and complex decision boundaries.
  • The main strengths are its simplicity, speed, and effectiveness with high-dimensional or small datasets.
  • Advantages include simplicity, speed, effectiveness with small datasets and many features, handling of continuous/categorical data, and low computational cost.
  • Disadvantages include poor performance with correlated features, the need for many training samples, and the inability to model interactions between features directly.

Principal Component Analysis (PCA)

  • PCA is a dimensionality reduction technique to reduce dataset features while retaining variance.
  • It transforms original features into principal components, which are linear combinations of the original features.
  • Principal components are the directions of maximum variance, with the first few capturing most of the data's variance.
  • PCA is a linear transformation and may not perform well on data with non-linear relationships.
  • PCA assumes variance indicates information and is effective when data contains correlated features.
  • Advantages include reduced dimensionality, improved model performance, removal of correlated features, enhanced visualization, and data compression.
  • Disadvantages include the inability to capture complex non-linear relationships, sensitivity to outliers, and high computational cost for large datasets.
  • It reduces computational complexity and noise by transforming data into orthogonal components.
  • It works best for linearly separable data and is most effective when features are correlated but is sensitive to outliers and less effective for significant non-linearities.
  • A Covariance matrix describes all relationships between pairs of measurements.
  • A larger covariance indicates a larger correlation, while 0 covariance signals entirely un-correlated data.

Applying Machine Learning Algorithms

  • When a learning algorithm yields large prediction errors, try getting more training samples or adding additional features.
  • A diagnostic test can provide insight into the algorithm's issues and how to improve performance, while being time-consuming, can benefit performance in the long run.
  • If a learning algorithm suffers from high bias, getting more training data will not help much.
  • If a learning algorithm suffers from high variance, getting more training data is likely to help.
  • If a learning algorithm makes large errors, try getting more training samples (fixes high variance), smaller sets of features (fixes high variance), or additional features (fixes high bias).
  • Small neural networks with fewer parameters are more prone to underfitting but are computationally cheaper.
  • Large, more complex neural networks with more parameters are more prone to overfitting and are computationally more expensive.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser