Podcast
Questions and Answers
Given the signature descriptor of an unknown shape, where the distance from the centroid is constant for every angle, what shape is it?
Given the signature descriptor of an unknown shape, where the distance from the centroid is constant for every angle, what shape is it?
- Circle (correct)
- Ellipse
- Straight Line
- Square
Which transformation is used to extract features for measuring the smoothness, coarseness, and regularity of a region?
Which transformation is used to extract features for measuring the smoothness, coarseness, and regularity of a region?
- Gabor Transformation
- Both Gabor and Wavelet Transformation (correct)
- Wavelet Transformation
- Fourier Transformation
Suppose a shape's Fourier descriptor has K coefficients. If the last few coefficients are removed, using only the first m (m<K) to reconstruct the shape, what effect will this truncation have?
Suppose a shape's Fourier descriptor has K coefficients. If the last few coefficients are removed, using only the first m (m<K) to reconstruct the shape, what effect will this truncation have?
- The full shape will be reconstructed without information loss
- A smoothed boundary version of the shape (correct)
- Low-frequency components will be removed
- Only fine details of the boundary of the shape
When computing a polygonal descriptor of an arbitrary shape using a splitting technique, what is the typical starting guess?
When computing a polygonal descriptor of an arbitrary shape using a splitting technique, what is the typical starting guess?
Consider a two-class Bayes' Minimum Risk Classifier. The probabilities of classes W1 and W2 are P(W1) = 0.3 and P(W2) = 0.7, respectively. P(x) = 0.545, P(x|W1) = 0.65, P(x|W2) = 0.5, and the loss matrix values are given. If the classifier assigns x to class W1, which relationship must be true?
Consider a two-class Bayes' Minimum Risk Classifier. The probabilities of classes W1 and W2 are P(W1) = 0.3 and P(W2) = 0.7, respectively. P(x) = 0.545, P(x|W1) = 0.65, P(x|W2) = 0.5, and the loss matrix values are given. If the classifier assigns x to class W1, which relationship must be true?
Which of the following provides the correct formula for the Fourier transform $a(u)$ of a complex sequence $s(k)$ for $k = 0, ..., N-1$?
Which of the following provides the correct formula for the Fourier transform $a(u)$ of a complex sequence $s(k)$ for $k = 0, ..., N-1$?
For a given gray co-occurrence matrix C of an image, what does the maximum probability descriptor represent?
For a given gray co-occurrence matrix C of an image, what does the maximum probability descriptor represent?
Which of the following is considered a region descriptor, rather than a boundary descriptor?
Which of the following is considered a region descriptor, rather than a boundary descriptor?
Which type of information is typically extracted using a gray co-occurrence matrix?
Which type of information is typically extracted using a gray co-occurrence matrix?
If the larger values of a gray co-occurrence matrix are concentrated around the main diagonal, which statement is most likely to be true?
If the larger values of a gray co-occurrence matrix are concentrated around the main diagonal, which statement is most likely to be true?
If you are solving an n-class problem using discriminant functions, how many discriminant functions will you need?
If you are solving an n-class problem using discriminant functions, how many discriminant functions will you need?
If the discriminant function $g_i(x)$ is chosen to be a function $f$ of the posterior probability $p(w_i/x)$, i.e., $g_i(x) = f(p(w_i/x))$, which of the following cannot be the function $f$?
If the discriminant function $g_i(x)$ is chosen to be a function $f$ of the posterior probability $p(w_i/x)$, i.e., $g_i(x) = f(p(w_i/x))$, which of the following cannot be the function $f$?
Given that all classes have equal class probabilities, what will be the nature of the decision surface when the covariance matrices of different classes are identical but otherwise arbitrary?
Given that all classes have equal class probabilities, what will be the nature of the decision surface when the covariance matrices of different classes are identical but otherwise arbitrary?
The mean and variance of samples from two normally distributed classes, w1 and w2, are given. What is the value expression of decision boundary between these two classes if both classes have equal class probability 0.5, for the input sample x?
The mean and variance of samples from two normally distributed classes, w1 and w2, are given. What is the value expression of decision boundary between these two classes if both classes have equal class probability 0.5, for the input sample x?
For a two-class problem, the linear discriminant function is given by g(x)=a^Ty. If y is the augmented feature vector, what is the updating rule for finding the weight vector a?
For a two-class problem, the linear discriminant function is given by g(x)=a^Ty. If y is the augmented feature vector, what is the updating rule for finding the weight vector a?
Which condition must be satisfied for a minimum distance classifier?
Which condition must be satisfied for a minimum distance classifier?
Which of the following is the updating rule of the gradient descent algorithm, where ∇ is the gradient operator and η is the learning rate?
Which of the following is the updating rule of the gradient descent algorithm, where ∇ is the gradient operator and η is the learning rate?
In k-nearest neighbor's algorithm (k-NN), how do we classify an unknown object?
In k-nearest neighbor's algorithm (k-NN), how do we classify an unknown object?
What is the direction of the weight vector with respect to the decision surface for a linear classifier?
What is the direction of the weight vector with respect to the decision surface for a linear classifier?
What is the distance of point P = (-3, 1, 3) from the plane defined by (2x + 2y + 5z + 9 = 0)?
What is the distance of point P = (-3, 1, 3) from the plane defined by (2x + 2y + 5z + 9 = 0)?
What is the typical shape of the loss landscape during the optimization of an SVM?
What is the typical shape of the loss landscape during the optimization of an SVM?
How many local minima can be encountered while solving the optimization problem for maximizing the margin in an SVM?
How many local minima can be encountered while solving the optimization problem for maximizing the margin in an SVM?
Which of the following classifiers can be replaced by a linear SVM?
Which of the following classifiers can be replaced by a linear SVM?
Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>.
Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>.
For a 2-class problem, what is the minimum possible number of support vectors needed for an SVM, assuming there are more than 4 examples from each class?
For a 2-class problem, what is the minimum possible number of support vectors needed for an SVM, assuming there are more than 4 examples from each class?
Which of the following is a valid representation of hinge loss (of margin=1) for a two-class problem where y is the class label (+1 or -1) and p is the predicted value?
Which of the following is a valid representation of hinge loss (of margin=1) for a two-class problem where y is the class label (+1 or -1) and p is the predicted value?
Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1, y1) = (-1, -1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect to SVM?
Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1, y1) = (-1, -1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect to SVM?
If we employ SVM to realize two input logic gates, then which of the following will be true?
If we employ SVM to realize two input logic gates, then which of the following will be true?
In a max-margin linear SVM, what will happen to the margin length, if one of the non-support vector training examples is removed?
In a max-margin linear SVM, what will happen to the margin length, if one of the non-support vector training examples is removed?
A given cost function is of the form $J(\theta) = \theta^2 - \theta + 2$. With a learning rate α = 0.01, what is the weight update rule for gradient descent at step t+1?
A given cost function is of the form $J(\theta) = \theta^2 - \theta + 2$. With a learning rate α = 0.01, what is the weight update rule for gradient descent at step t+1?
In which of the following graphs will gradient descent not work correctly?
In which of the following graphs will gradient descent not work correctly?
The following two images are graphs from gradient descent optimization. One is a batch gradient, the other is stochastic gradient descent. Which correctly identifies the graph types.
The following two images are graphs from gradient descent optimization. One is a batch gradient, the other is stochastic gradient descent. Which correctly identifies the graph types.
For a cost function (J(\theta) = 0.25\theta^2) as shown in the graph below, where will a weight update be the greatest?
For a cost function (J(\theta) = 0.25\theta^2) as shown in the graph below, where will a weight update be the greatest?
Which logic functions can be performed using a 2-layered neural network?
Which logic functions can be performed using a 2-layered neural network?
Let X and Y be two features used to discriminate between two classes (values and classes given.) What is the minimum number of neuron-layers required to design the neural network classifier?
Let X and Y be two features used to discriminate between two classes (values and classes given.) What is the minimum number of neuron-layers required to design the neural network classifier?
Which is the range of values for a logistic function?
Which is the range of values for a logistic function?
The number of weights to be learned by a neural network can be estimated as: 3 inputs, 2 output classes, and a hidden layer.
The number of weights to be learned by a neural network can be estimated as: 3 inputs, 2 output classes, and a hidden layer.
When looking at the XNOR function shown, what will the otuput be when 1 is the input for (X_1) and 0 in the input for (X_2)?
When looking at the XNOR function shown, what will the otuput be when 1 is the input for (X_1) and 0 in the input for (X_2)?
Which activation function is more prone to vanishing gradient problem?
Which activation function is more prone to vanishing gradient problem?
Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is represented by a 3D feature vector and we have a binary classification problem. Calculate the number of parameters of the network. Consider there are NO bias nodes in the network
Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is represented by a 3D feature vector and we have a binary classification problem. Calculate the number of parameters of the network. Consider there are NO bias nodes in the network
For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the probability of belonging to class= -1 ?
For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the probability of belonging to class= -1 ?
What will be the output of the equation, [2,4,6], given it is run through the SoftMax activation?
What will be the output of the equation, [2,4,6], given it is run through the SoftMax activation?
A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be
A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be
Which of the following is NOT analytically differentiable for all real values of the given input for activation functions?
Which of the following is NOT analytically differentiable for all real values of the given input for activation functions?
Flashcards
Signature descriptor shape identification
Signature descriptor shape identification
Distance from centroid to boundary is same for every value of Ө; true for a circle.
Texture feature extraction
Texture feature extraction
Gabor and Wavelet transformations extract smoothness, coarseness, and regularity.
Fourier descriptor components
Fourier descriptor components
Low frequency components capture the general shape; high frequency captures finer details.
Polygonal descriptor starting guess
Polygonal descriptor starting guess
Signup and view all the flashcards
Non-boundary descriptor
Non-boundary descriptor
Signup and view all the flashcards
Texture information extraction.
Texture information extraction.
Signup and view all the flashcards
Gray co-occurrence concentration
Gray co-occurrence concentration
Signup and view all the flashcards
Discriminant function selection
Discriminant function selection
Signup and view all the flashcards
Identical Covariance Matrices decision surface
Identical Covariance Matrices decision surface
Signup and view all the flashcards
Linear discriminant update rule
Linear discriminant update rule
Signup and view all the flashcards
Minimum distance classifier
Minimum distance classifier
Signup and view all the flashcards
Gradient descent
Gradient descent
Signup and view all the flashcards
Unequal prior probabilities effect
Unequal prior probabilities effect
Signup and view all the flashcards
K-NN classification
K-NN classification
Signup and view all the flashcards
Weight vector direction
Weight vector direction
Signup and view all the flashcards
SVM objective
SVM objective
Signup and view all the flashcards
SVM: Minimum Support Vectors.
SVM: Minimum Support Vectors.
Signup and view all the flashcards
Support vector removal
Support vector removal
Signup and view all the flashcards
Non-support vector effects
Non-support vector effects
Signup and view all the flashcards
Weight update rule
Weight update rule
Signup and view all the flashcards
Gradient descent failure
Gradient descent failure
Signup and view all the flashcards
More weight updates?
More weight updates?
Signup and view all the flashcards
Two layer neural net
Two layer neural net
Signup and view all the flashcards
ReLU
ReLU
Signup and view all the flashcards
Denoising autoencoder use
Denoising autoencoder use
Signup and view all the flashcards
Undercomplete data
Undercomplete data
Signup and view all the flashcards
Not Analytically Differentiable
Not Analytically Differentiable
Signup and view all the flashcards
Impulse response
Impulse response
Signup and view all the flashcards
Auto encoder parameters.
Auto encoder parameters.
Signup and view all the flashcards
What 2 form 2?
What 2 form 2?
Signup and view all the flashcards
When to underfit
When to underfit
Signup and view all the flashcards
Which layer uses for helpful Batch.
Which layer uses for helpful Batch.
Signup and view all the flashcards
Study Notes
Week 1 - Assignment 1
- A signature descriptor where the distance from the centroid to the boundary is the same for every angle indicates a circle.
- The texture content in region descriptions help discern image properties such as smoothness, coarseness, and regularity.
- Gabor filters and Wavelet transformations are used to extract texture features.
- When using truncated Fourier descriptors to reconstruct a shape, removing high-frequency components results in a smoothed boundary version of the shape.
- In computing polygonal descriptors using splitting techniques, the starting guess is a vertex joining the two farthest points on the boundary.
- Bayes' Minimum Risk Classifier involves probabilities of classes and conditional probabilities.
- The classifier assigns x to class W1 if the condition λ21 – λ11 / λ12 – λ22 > 1.79 is true.
- The Fourier transformation of a complex sequence of numbers is expressed by a(u) = the sum from k=0 to N-1 of s(k)e^(-j2πuk/N)
- Maximum probability = max (cij) and cij is a normalized co-occurrence matrix.
- A histogram is a region descriptor, not a boundary descriptor.
- To determine the textural content of an image region, features from the gray co-occurrence matrix are used.
- If larger values of a gray co-occurrence matrix are concentrated around the main diagonal, the value of the element difference moment will be low.
Week 2 - Assignment 2
- For an n-class problem, n discriminant functions are needed.
- If gᵢ(x) = f(p(wᵢ/x)), f() should be a monotonic increasing function.
- With identical covariance matrices and equal class probabilities, the decision surface is a bisector of the line joining two means, not always orthogonal to the surfaces.
- The general case of the discriminant function for normal density are matrices μ₁ = [1 3]; Σ₁ = [1/2 0; 0 1/2] and μ₂ = [3 -2]; Σ₂ = [1/2 0; 0 1/2]
- The decision boundary is x₂ = 3.514 – 1.12x₁ + 0.187x₁².
- The updating rule for finding the weight vector a is a(k + 1) = a(k) + ηΣy using the augmented feature vector y.
- For a minimum distance classifier, all classes should have equal class probability.
- Gradient descent minimizes a function iteratively by moving in the direction of steepest descent as defined by a negative gradient. The updating rule for this algorithm is aₙ₊₁ = aₙ - η∇F(aₙ).
- With unequal prior probabilities, the optimal boundary hyperplane is shifted away from the more likely mean.
- In k-NN, an unknown object is classified by assigning the label most frequent among the k nearest training samples.
- The weight vector is normal to the decision surface for a linear classifier.
Week 3 - Assignment 3
- For the 3D point P = (-3, 1, 3) from the plane defined by 2x + 2y + 5z + 9 = 0. The distance is |-32 + 12 + 35 + 9|/√(22 + 22 + 55) = 3.5
- During SVM optimization, the loss landscape has a paraboloid shape.
- In SVM, the objective is to find a maximum margin-based hyperplane (W). Wᵀx + b = 1 for class = +1 otherwise Wᵀx + b = -1. The max-margin condition satisfies to minimize ||W||.
- Maximizing margin for SVM may be encountered while solving the optimization and can have 1 local minimum.
- A linear SVM can replace the logistic regression since the logistic regression framework belongs to the genre of linear classifiers. Decision boundaries can segregate classes only if linearly separable.
- The scalar projection of b onto vector a is given by the scalar value (b⋅a)/||a||
- For a 2-class problem, the minimum possible number of support vectors is 2. We need at least one example/support vector from each class to determine a separating hyperplane.
- Hinge loss yields a value of 0 if the predicted output (p) has the same sign as the class label meeting the margin condition |p|>1
- Having one feature x ∈ R and binary class y. The dataset contains three points. With SVM, maximum margin increases if the point p2 is removed from the training set.
- In employing SVM to realize two input logic gates, the margin for AND and OR gates is the same.
- If one non-support vector training example is removed, the length of a max-margin linear SVM will be unaltered. The separating hyperplanes are determined only by the training examples, the support vectors.
Week 4 - Assignment 4
- The weight update rule for gradient descent optimization at step t+1 of a cost function with the form J(θ) = θ² - θ + 2, with a learning rate α=0.01, is θ(t+1) = θt - 0.01(2θ - 1).
- Gradient descent can get stuck in the saddle point of the second graph.
- The graph of cost vs epochs is quite smooth for batch gradient descent because the algorithm averages over all the gradients of training data for a single step unlike the fluctuations in stochastic gradient descent.
- A red point on the Cost graph which is the furthest from the origin, indicates higher magnitude of weight update because the weight update is directly proportional to the magnitude of the gradient of the cost function. In this case, the gradient equation is ∂J(θ)/∂θ = 0.5θ.
- A two-layered neural network can be used for any type of logic gate (linear or non-linear) implementation.
- With two features X and Y to discriminate between two classes, the minimum number of neuron-layers required to design the neural network classifier 1 since single layer is able to do the classification task and feature points are linearly separable.
- The range for a logistic function is between 0 to 1.
- A neural network has 3 inputs and 2 classes and a hidden layer with 5 neurons the number of weights (including bias) to be learned by is 32. Formula:(#input+1(bias))x(#Hidden nodes) + (#Hidden Nodes+1(bias))x(#classes)
- The output for a XNOR function with X₁ = 1 and X₂ = 0 is 0, after considering the activation function f(x).
- Tanh activation function is more prone to the vanishing gradient problem.
Week 5 - Assignment 5
- A fully connected neural network has a single hidden layer with 30 nodes represented by a 3D feature vector with a binary classification problem. Considering there are no bias nodes in the network. The number of parameters are 120. Formula = (3 * 30) + (30 * 1)
- For a binary classification setting, if the probability of belonging to a class = +1 is 0.22, the probability of belonging to a class = -1 is 0.78. In the binary classification setting will keep a single output node which can denote the probability (p) of belonging to class= +1. So, probability of belonging to class= -1 is (1 - p) since the 2 classes are mutually exclusive.
- With the input to SoftMax activation function is [2,4,6]. The output is [0.016,0.120, 0.864] using the definition of softmax.
- A 3-input neuron has weights 1, 0.5, 2 with the constant of proportionality being equal to 2. The output will be 40 using the formula: 2*(12 + 0.520 + 2*4 ) = 40
- ReLU activation functions is NOT analytically differentiable for all real values of the given input. ReLu(x) is not differentiable at x = 0, where x is the input to the ReLu layer.
- The given perceptron realize a NOR function because in the above figure, when either il or i2 is 1, output is zero. When both i1 and i2 is 0, output is 1, this is NOR logic.
- Is the size of the weight matrices between any layer 1 and layer 2 is given by #nodes in layer 1 X #nodes in layer 2
- A fully connected neural network has input, one hidden layer, and output layer with 40, 2, 1 nodes respectively in each layer (no biases). the total number of learnable parameters are 82. Number of learnable parameters are weights and bias and for fully connected network where each node is connected its (402)+(21)
- A 10-class neural network classifier when classifying a cat image to its breed, Cross Entropy Loss is well suited for classification problems.
- There are (20)*10+ (10)104 + (10)*1 trainable parameters in a fully-connected neural network with five hidden layers, each of the layers having 10 hidden units. The input is 20-dimensional and the output is a scalar without bias.
Week 6 - Assignment 6
- In a neural network where X = a + b and Y = X * c , the gradient of Y with respect to a, b, and c is (-4, -4, 5).
- dy/da = 1 and dy/db = 0 for y = max(a, b) and a > b.
- PCA reduces the dimension by finding a few Orthogonal linear combinations.
- After centering X matrix and computing the unit-length principal component directions of X. Algorithm would choose option A or D.
- PCA works well with non-linear data but Autoencoders are best suited for linear data, Which of the following is FALSE about PCA and Autoencoders?
- Concerning the backpropagation rule, the gradient of the final layer weights are calculated first.
- For PCA, the following are true:
- Rotates the axes to lie along the principal components
- Is calculated from the covariance matrix
- Removes some information from the data
- A no-bias autoencoder will have the number of parameters = 10010+10100 if the inputs neurons=100, hidden neurons=10, and the hidden is one hidden layer.
- The two vectors can form the first two principal components because {2;3;1} and {3;1;-9} are the vectors are orthogonal
- Lets say vectors ả = {2; 4} and b = {n; 1} forms the first two principle components after applying PCA. Under such circumstances, -2 can be a possible value of n.
Week 7 - Assignment 7
- Introducing an information bottleneck without nodes reduction is sparse autoencoders and the idea is encourages a network to only activate a small number of neurons, it has Statement 1 is false, but statement 2 is true.
- The denosising Autoencoders has corrupted versions of input where The loss is between the original input and the reconstruction from a noisy version and can be used as a tool for feature extraction, both statements are true.
- The autoencoder methods uses a uses corrupted versions of the input as a Denoising Design.
- The autoencoder methods uses a hidden layer with fewer units than input layer as Undercomplete Design.
- Regarding autoencoders, the true statements are:
- possesses generalization capabilities
- to minimize the reconstruction loss so output is similar to input
- compresses the input into a latent space representation and then reconstruct the output from it
- The value of d(t − 34) * x(t + 56), being the delta function d(t) and * indicates the convolution. the convolution operation shifts accordingly x(t-22)
- The Linear And Time Invariant system has an impulse response is the output due to impulse at time=0.
- The impulse function is 1 when t=0
- Denoising autoencoder is the variant of autoencoder where row 1: Original Input, Row 2: Noisy input, Row 3: Reconstructed output.
- The reconstruct of original noise-free data from noisy input is the task of Denoising autoencoder
- Contractive autoencoders is penalizing instances where a small change in the input leads to a large change in the encoding space.
Week 8 - Assignment 8
- Which of the following is false about CNN there can be only 1 fully connected layer in CNN is the statement that is generally false about CNNs.
- Convolution Matrix Size = 60X60 with P=0, I=64, F=5 and S=1 The input image has been converted into a matrix of size 64 X 64 and a kernel/filter of size 5x5 with a stride of 1 and no padding.
- Output Matrix Size = 2X2 which a filter size of 3x3 is convolved matrix of size 4x4 (stride=1). To find the output use formula ((n − f +2P)/S+ x ((n − f +2P)/S+ 1)
- In a CNN, if you are to add these layers: layer 1 with Filter Size – 3 X 3, Number of Filters – 10, Stride - 1, Padding -0, Layer 2 with Filter Size – 5 X 5, Number of Filters – 20, Stride – 2, Padding -0, Layer 3 with Filter Size – 5 X5, Number of Filters – 40, Stride – 2, Padding -0
- layer 3 of the above is followed by a fully connected layer we give a 3-D image input of dimension 39 X 39 , the dimensions of fully connected network will be 1960.
- Given 64 convulutional kernels of size 3 × 3 with padding and stride 1 in the first layer of a convolutional neural network and feed dimension 1024x1024x in then dimensions the next layer gets will be *WXHXD1. In this case, 1022 X 1022X40.
- The last fully connected layer in CNN is used to classify an image has to undergo a Leaky ReLU.
- There’s three By three Color image and we use Convolutional layer with 5x5 the number of parameters will be 7500. Each with dimension = 5* 5 * =75 with no bias.
- Sigmoid activation function is one of the following that's causing vanishing gradients. Statement 1: Residua networks be a solution to vanishing gradient
- Statement 2 Residua networks have a direct connection through earlier layers hence both Statement 1 and 2 combined are correct hence they are a solution to the gradient problem .
- The soft max function which gives SoftMax is in the form of the expression of σ = exj /sum exl = 0.28 with formula with and input = 1
Week 9 - Assignment 9
- When having very small learning rate there will slow convergence.
- "Fraction which in dicates acceleration" What is true in reference with y as up date vector for optimizing momentum as Vt = yVt-1 + noy 0
- In the reference from true moment with optimizer statement = helps accelerating stochastic decent through the right and helps the preventions of the steps and help with knowing the next step= from the above
- When using gradient decent, the cost function with gradient update is 01-0a dJ0/dy where a is a learning
- the cost function /25+6? The weight will be updated as new 0 - 6a (25-D= where a is a leading
- The iteration of great descent is the function for Oth, 0,2, to increases what can cause The leaning rates.
- the0th, 02 to decreases what causes If global then the the, and , will go the same direction if if inilizated with great cent? The vales remain similer. Exploding gratdents are a Theupdate of very great values to a unstables network Calculate of error is the part to implement for
- The Theerror decreases which is why thelearning increases
Week 10 - Assignment 10
- An incorrect use of Batch Normalizatrion results from a Faster inference time. Increasing computational burden, which means inferences take longer.
1/3 [1/3] + [3/3] + [1,3] = To batch in one is what the neurons what is the value? The value would be, 2-3-2 / 4-3-3 /2/5/2 - Batch normalization in a hidden layer has 3 values at, Batch normalization mean gives value in all 3. = 2,32 4,3,3,
- preventing understating you need to Increase number of futures. increasing the samples to capture the date and fitting those values.
- The normalisation statistics uses testing for genallizaton from the of estimate and variance. = Generalise mean and variance
- A not advantages of drop out is to reduce Thetest time or compilation = Dropout makes random futures during raining but no testing down a advantage in general over batch from a recurrenct new networks
- To while a new networks forimage forimage recognization task a, plot graph and errors with early stopping. The best is to view is in the error validation A random shuffle a the pixie is is not a way to argue to data augment
- The best way to approach complex functions is to increase the layer numbers in a
- Batch normalization normalizes all at first before sending = Normal in general
Week 11 - Assignment 11
- which is correct? = What is most specific ? - ( One hot encoding) = There cannot two correct answer with classes.
- What is the signal if ? = One signal is is a filter to 5/4, what with the values be? What will be the of the non cropped signal? – Find the out puts of it
- The challages = The different elements to recognition that make the problems come. = limited data set size.
- What the is advantage of FCN? = Large receipt field
- Image = which is out of the output based on un-port. = Check values .
- It Has - = What Is the dis-adv for CNN = Fixed
- No bit = is hat if what are the number if 1 bit values.
- dice coificcent = What will be in between = check each the .
- Why for L2/ Is a use? -- To create hyper sphere in high space
Week 12 - Assignment 12
- Distribution to sample an latent = Distribution is important
- -What is the output for GAN generated the the the data = Its 50%/ Is to distinguish real images then from fake images = 5 Re-meter = to be sample the we can.
- Which is that can show
- Not be not = with
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.