## 30 Questions

What is the primary issue with extracting only pixel values as features?

High dimensionality

Which type of dataset is logistic regression and softmax regression suitable for?

Linearly separable data

What is the primary advantage of a single perceptron?

Simplicity and powerful problem-solving capabilities

What is the primary limitation of decision trees and support vector machines for image classification?

Inability to handle unstructured data

What is the name of the neural network architecture described in the lecture?

Fully Connected Neural Network

What is the inspiration for the design of perceptrons?

Biological neurons

What is the limitation of using a single layer of perceptrons?

It is unable to find the optimal boundary that separates classes

What is the main advantage of using a multi-layer perceptron over a single layer perceptron?

It can learn non-linear decision boundaries

What is the characteristic of a fully connected neural network?

Every edge has its own weight value

Why is a single layer neural network called a single layer?

Because it has only one layer of neurons

What is the problem with using a single layer of perceptrons to solve the XOR problem?

It is unable to solve the XOR problem because of its linear nature

What is the condition for a perceptron to converge to an optimal solution?

The problem has a linearly separable solution

What is the one hot encoded representation of the image of a cat?

[1, 0, 0]

Why is the loss function high in the given example?

The model is confused between the image being a cat or a dog.

What is the purpose of computing gradients in a multi-layer perceptron?

To update the weights of the network

What is the role of forward and backward propagation in a multi-layer perceptron?

Forward and backward propagation are used together to update the weights.

What is the example application mentioned in the context of multi-layer perceptron?

Hand Written Digit Recognition

What is the formula for calculating the loss function?

𝑪 = − 𝒚𝒊 𝒍𝒐𝒈𝒊

What is the main purpose of the error function E in a neural network?

To define the error between the desired output and calculated output

What does the subscript k denote in a neural network?

The output layer

What is the main difference between stochastic gradient descent and batch gradient descent?

The amount of data used to compute the gradient

What is the trade-off in using more data to compute the gradient of the objective function?

Increased accuracy but slower convergence

What is the update rule for batch gradient descent?

𝒘 = 𝒘 - 𝜶𝜵𝒘 𝑱(𝜽)

What is the main advantage of using mini-batch gradient descent?

A trade-off between computation speed and accuracy

What is the purpose of annealing in machine learning?

To reduce the learning rate according to a pre-defined schedule

What is the recommended approach when the error stops decreasing during mini-batch learning?

Turn down the learning rate to reduce fluctuations in the final weights

What is the benefit of using a separate validation set to monitor the error during training?

It provides a more accurate estimate of the model's performance on unseen data

What is the approach described in the lecture for adjusting the learning rate during mini-batch gradient descent?

Guess an initial learning rate and adjust it based on the error

What is the purpose of reducing the learning rate towards the end of mini-batch learning?

To reduce the fluctuations in the final weights

What is the recommended strategy when the error is falling fairly consistently but slowly during training?

Increase the learning rate to speed up convergence

## Study Notes

### Challenges in Image Classification

- Challenge 1: Extraction of Features: High-dimensional dataset (e.g., 784 columns for 28x28 images) makes it difficult to extract relevant features.
- Challenge 2: Non-Linear Decision Boundary: Logistic Regression is limited to linearly separable data, and more advanced algorithms like Decision Trees and SVM are not suitable for unstructured datasets like images.

### Architecture of Neural Networks

- Fully Connected Neural Network
- Single Unit of Perceptron (Neuron): inspired by human neurons, with the same representation capacity as logistic regression.

### Limitations of Single Unit of Neuron

- Limited to linearly separable solutions, cannot find optimal boundary for non-linearly separable data
- XOR problem is an example of a limitation

### Multi-Layer Neural Network

- Putting neurons in layers to create a Multi-Layer Neural Network
- Single Layer Fully Connected Neural Network: every edge has its own weight value

### Neural Architecture: Single Layer Neural Network

- Fully Connected Network: each neuron is connected to every other neuron in the previous and next layers
- Single Layer: only one layer of neurons between input and output
- Number of neurons in each layer: depends on the problem and dataset

### Neural Architecture: Multi-Layer Neural Network

- Architecture: Multi-Layer Fully Connected Neural Network

### One Hot Encoding

- Representing categorical output labels as binary vectors (one hot encoding)

### Loss Function

- Calculates the difference between predicted output and actual output
- Goal: minimize the loss function

### Computing Gradients

- Forward and Backward Propagations: used to learn weights in Multi-Layer Networks
- Forward Propagation: computes the output of the network
- Backward Propagation: computes the gradients of the loss function with respect to the weights

### Backpropagation Algorithm

- Computes gradients of the loss function with respect to the weights
- Used to update the weights in the network

### Variants of Gradient Descent

- Batch Gradient Descent: uses the entire dataset to compute the gradient
- Stochastic Gradient Descent: uses a single data point to compute the gradient
- Mini-Batch Gradient Descent: uses a subset of the dataset to compute the gradient

### Batch Gradient Descent

- Computes the gradient of the cost function with respect to the parameters for the entire training set
- Update rule: subtracts the product of the learning rate and the gradient from the current weights
- Annealing: reducing the learning rate according to a pre-defined schedule or threshold

This quiz covers the challenges of training neural networks for image classification, including feature extraction and high-dimensional datasets. It focuses on multi-layer perceptrons and their applications in image recognition. Test your understanding of AI and machine learning concepts.

## Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free