Pattern Recognition: Descriptors and Classifiers

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Given the signature descriptor of an unknown shape, where the distance from the centroid is constant for every angle, what shape is it?

Circle (correct)
Ellipse
Straight Line
Square

Which transformation is used to extract features for measuring the smoothness, coarseness, and regularity of a region?

Gabor Transformation
Both Gabor and Wavelet Transformation (correct)
Wavelet Transformation
Fourier Transformation

Suppose a shape's Fourier descriptor has K coefficients. If the last few coefficients are removed, using only the first m (m<K) to reconstruct the shape, what effect will this truncation have?

The full shape will be reconstructed without information loss
A smoothed boundary version of the shape (correct)
Low-frequency components will be removed
Only fine details of the boundary of the shape

When computing a polygonal descriptor of an arbitrary shape using a splitting technique, what is the typical starting guess?

Vertex joining the two farthest points on the boundary (D)

Signup and view all the answers

Consider a two-class Bayes' Minimum Risk Classifier. The probabilities of classes W1 and W2 are P(W1) = 0.3 and P(W2) = 0.7, respectively. P(x) = 0.545, P(x|W1) = 0.65, P(x|W2) = 0.5, and the loss matrix values are given. If the classifier assigns x to class W1, which relationship must be true?

$\frac{\lambda_{21} - \lambda_{11}}{\lambda_{12} - \lambda_{22}} > 1.79$ (C)

Signup and view all the answers

Which of the following provides the correct formula for the Fourier transform $a(u)$ of a complex sequence $s(k)$ for $k = 0, ..., N-1$?

$a(u) = \sum_{k=0}^{N-1} s(k)e^{-j2\pi uk/N}$ (B)

Signup and view all the answers

For a given gray co-occurrence matrix C of an image, what does the maximum probability descriptor represent?

The most frequently occurring pair of gray levels and their spatial relationship (C)

Signup and view all the answers

Which of the following is considered a region descriptor, rather than a boundary descriptor?

Histogram (B)

Signup and view all the answers

Which type of information is typically extracted using a gray co-occurrence matrix?

Texture information (A)

Signup and view all the answers

If the larger values of a gray co-occurrence matrix are concentrated around the main diagonal, which statement is most likely to be true?

The value of the element difference moment will be low. (C)

Signup and view all the answers

If you are solving an n-class problem using discriminant functions, how many discriminant functions will you need?

n (A)

Signup and view all the answers

If the discriminant function $g_i(x)$ is chosen to be a function $f$ of the posterior probability $p(w_i/x)$, i.e., $g_i(x) = f(p(w_i/x))$, which of the following cannot be the function $f$?

$f(x) = a^{-x}$, where $a > 1$ (B)

Signup and view all the answers

Given that all classes have equal class probabilities, what will be the nature of the decision surface when the covariance matrices of different classes are identical but otherwise arbitrary?

Bisector of the line joining two means, but not always orthogonal to two surfaces (D)

Signup and view all the answers

The mean and variance of samples from two normally distributed classes, w1 and w2, are given. What is the value expression of decision boundary between these two classes if both classes have equal class probability 0.5, for the input sample x?

$x_2 = 3.514 - 1.12x_1 + 0.187x_1^2$ (D)

Signup and view all the answers

For a two-class problem, the linear discriminant function is given by g(x)=a^Ty. If y is the augmented feature vector, what is the updating rule for finding the weight vector a?

Adding the sum of all misclassified augmented feature vectors multiplied by the learning rate to the current weight vector (B)

Signup and view all the answers

Which condition must be satisfied for a minimum distance classifier?

All the classes should have equal class probability. (A)

Signup and view all the answers

Which of the following is the updating rule of the gradient descent algorithm, where ∇ is the gradient operator and η is the learning rate?

$a_{n+1} = a_n - η∇F(a_n)$ (A)

Signup and view all the answers

In k-nearest neighbor's algorithm (k-NN), how do we classify an unknown object?

Assigning the label which is most frequent among the k nearest training samples (D)

Signup and view all the answers

What is the direction of the weight vector with respect to the decision surface for a linear classifier?

Normal (A)

Signup and view all the answers

What is the distance of point P = (-3, 1, 3) from the plane defined by (2x + 2y + 5z + 9 = 0)?

3.5 (B)

Signup and view all the answers

What is the typical shape of the loss landscape during the optimization of an SVM?

Paraboloid (B)

Signup and view all the answers

How many local minima can be encountered while solving the optimization problem for maximizing the margin in an SVM?

1 (A)

Signup and view all the answers

Which of the following classifiers can be replaced by a linear SVM?

Logistic Regression (A)

Signup and view all the answers

Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>.

$\frac{4}{\sqrt{5}}$ (B)

Signup and view all the answers

For a 2-class problem, what is the minimum possible number of support vectors needed for an SVM, assuming there are more than 4 examples from each class?

2 (D)

Signup and view all the answers

Which of the following is a valid representation of hinge loss (of margin=1) for a two-class problem where y is the class label (+1 or -1) and p is the predicted value?

L(y, p) = max(0, 1 - yp) (B)

Signup and view all the answers

Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1, y1) = (-1, -1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect to SVM?

Maximum margin will increase if we remove the point p2 from the training set. (B)

Signup and view all the answers

If we employ SVM to realize two input logic gates, then which of the following will be true?

The margin length for AND-gate and OR-Gate will be same (A)

Signup and view all the answers

In a max-margin linear SVM, what will happen to the margin length, if one of the non-support vector training examples is removed?

The margin will be unaltered (C)

Signup and view all the answers

A given cost function is of the form $J(\theta) = \theta^2 - \theta + 2$. With a learning rate α = 0.01, what is the weight update rule for gradient descent at step t+1?

$θ_{t+1} = θ_t - 0.01(2θ_t - 1)$ (A)

Signup and view all the answers

In which of the following graphs will gradient descent not work correctly?

A graph with a saddle point (B)

Signup and view all the answers

The following two images are graphs from gradient descent optimization. One is a batch gradient, the other is stochastic gradient descent. Which correctly identifies the graph types.

Graph A: Batch gradient descent, Graph B: Stochastic gradient descent (B)

Signup and view all the answers

For a cost function (J(\theta) = 0.25\theta^2) as shown in the graph below, where will a weight update be the greatest?

A Red Point (Point 1), (θ = 10) (C)

Signup and view all the answers

Which logic functions can be performed using a 2-layered neural network?

All (B)

Signup and view all the answers

Let X and Y be two features used to discriminate between two classes (values and classes given.) What is the minimum number of neuron-layers required to design the neural network classifier?

1 (D)

Signup and view all the answers

Which is the range of values for a logistic function?

0 to 1 (B)

Signup and view all the answers

The number of weights to be learned by a neural network can be estimated as: 3 inputs, 2 output classes, and a hidden layer.

32 (D)

Signup and view all the answers

When looking at the XNOR function shown, what will the otuput be when 1 is the input for (X_1) and 0 in the input for (X_2)?

0 (B)

Signup and view all the answers

Which activation function is more prone to vanishing gradient problem?

Tanh (D)

Signup and view all the answers

Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is represented by a 3D feature vector and we have a binary classification problem. Calculate the number of parameters of the network. Consider there are NO bias nodes in the network

120 (A)

Signup and view all the answers

For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the probability of belonging to class= -1 ?

0.78 (A)

Signup and view all the answers

What will be the output of the equation, [2,4,6], given it is run through the SoftMax activation?

[0.016,0.120, 0.864] (B)

Signup and view all the answers

A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be

40 (C)

Signup and view all the answers

Which of the following is NOT analytically differentiable for all real values of the given input for activation functions?

ReLU (C)

Signup and view all the answers

Flashcards

Signature descriptor shape identification

Distance from centroid to boundary is same for every value of Ө; true for a circle.

Texture feature extraction

Gabor and Wavelet transformations extract smoothness, coarseness, and regularity.

Fourier descriptor components

Low frequency components capture the general shape; high frequency captures finer details.

Polygonal descriptor starting guess

Vertex joining the two farthest points on the boundary.

Signup and view all the flashcards

Non-boundary descriptor

Histogram is a region descriptor, not a boundary descriptor.

Signup and view all the flashcards

Texture information extraction.

Uses features from the gray co-occurrence matrix.

Signup and view all the flashcards

Gray co-occurrence concentration

The value of element difference moment will be low

Signup and view all the flashcards

Discriminant function selection

Monotonically increasing function

Signup and view all the flashcards

Identical Covariance Matrices decision surface

Bisector of the line joining two means, but not always orthogonal.

Signup and view all the flashcards

Linear discriminant update rule

Adding the sum of augmented feature vectors multiplied by the learning rate.

Signup and view all the flashcards

Minimum distance classifier

All classes have equal class probability

Signup and view all the flashcards

Gradient descent

Optimization algorithm that iteratively moves in the direction of steepest descent.

Signup and view all the flashcards

Unequal prior probabilities effect

Optimal boundary hyperplane is shifted away from the more likely mean.

Signup and view all the flashcards

K-NN classification

Assigning the most frequent label among the k nearest training samples.

Signup and view all the flashcards

Weight vector direction

Normal to the decision surface

Signup and view all the flashcards

SVM objective

Objective is to find the maximum margin based hyperplane (W).

Signup and view all the flashcards

SVM: Minimum Support Vectors.

The separating hyper-plane needs at least 1 vector from each class.

Signup and view all the flashcards

Support vector removal

Maximum margin will increase

Signup and view all the flashcards

Non-support vector effects

Maximum margin is determined only by training support vectors.

Signup and view all the flashcards

Weight update rule

Ot+1 = 0t – 0.01(20 – 1)

Signup and view all the flashcards

Gradient descent failure

Classical saddle point problem

Signup and view all the flashcards

More weight updates?

The weight update will be more for higher values of θ.

Signup and view all the flashcards

Two layer neural net

Used for any logic gate implementation.

Signup and view all the flashcards

ReLU

The activation function is not differentiable at x=0

Signup and view all the flashcards

Denoising autoencoder use

The input can contain noise.

Signup and view all the flashcards

Undercomplete data

What if the input has fewer units than input layers?

Signup and view all the flashcards

Not Analytically Differentiable

ReLU

Signup and view all the flashcards

Impulse response

Output of impulse system.

Signup and view all the flashcards

Auto encoder parameters.

With no bias nodes and single hidden layer divide into in 2.

Signup and view all the flashcards

What 2 form 2?

What if the vectors are othogonal

Signup and view all the flashcards

When to underfit

Increase the number of samples.

Signup and view all the flashcards

Which layer uses for helpful Batch.

Normalizes all input before sending data.

Signup and view all the flashcards

Study Notes

Week 1 - Assignment 1

A signature descriptor where the distance from the centroid to the boundary is the same for every angle indicates a circle.
The texture content in region descriptions help discern image properties such as smoothness, coarseness, and regularity.
Gabor filters and Wavelet transformations are used to extract texture features.
When using truncated Fourier descriptors to reconstruct a shape, removing high-frequency components results in a smoothed boundary version of the shape.
In computing polygonal descriptors using splitting techniques, the starting guess is a vertex joining the two farthest points on the boundary.
Bayes' Minimum Risk Classifier involves probabilities of classes and conditional probabilities.
The classifier assigns x to class W1 if the condition λ21 – λ11 / λ12 – λ22 > 1.79 is true.
The Fourier transformation of a complex sequence of numbers is expressed by a(u) = the sum from k=0 to N-1 of s(k)e^(-j2πuk/N)
Maximum probability = max (cij) and cij is a normalized co-occurrence matrix.
A histogram is a region descriptor, not a boundary descriptor.
To determine the textural content of an image region, features from the gray co-occurrence matrix are used.
If larger values of a gray co-occurrence matrix are concentrated around the main diagonal, the value of the element difference moment will be low.

Week 2 - Assignment 2

For an n-class problem, n discriminant functions are needed.
If gᵢ(x) = f(p(wᵢ/x)), f() should be a monotonic increasing function.
With identical covariance matrices and equal class probabilities, the decision surface is a bisector of the line joining two means, not always orthogonal to the surfaces.
The general case of the discriminant function for normal density are matrices μ₁ = [1 3]; Σ₁ = [1/2 0; 0 1/2] and μ₂ = [3 -2]; Σ₂ = [1/2 0; 0 1/2]
The decision boundary is x₂ = 3.514 – 1.12x₁ + 0.187x₁².
The updating rule for finding the weight vector a is a(k + 1) = a(k) + ηΣy using the augmented feature vector y.
For a minimum distance classifier, all classes should have equal class probability.
Gradient descent minimizes a function iteratively by moving in the direction of steepest descent as defined by a negative gradient. The updating rule for this algorithm is aₙ₊₁ = aₙ - η∇F(aₙ).
With unequal prior probabilities, the optimal boundary hyperplane is shifted away from the more likely mean.
In k-NN, an unknown object is classified by assigning the label most frequent among the k nearest training samples.
The weight vector is normal to the decision surface for a linear classifier.

Week 3 - Assignment 3

For the 3D point P = (-3, 1, 3) from the plane defined by 2x + 2y + 5z + 9 = 0. The distance is |-32 + 12 + 35 + 9|/√(22 + 22 + 55) = 3.5
During SVM optimization, the loss landscape has a paraboloid shape.
In SVM, the objective is to find a maximum margin-based hyperplane (W). Wᵀx + b = 1 for class = +1 otherwise Wᵀx + b = -1. The max-margin condition satisfies to minimize ||W||.
Maximizing margin for SVM may be encountered while solving the optimization and can have 1 local minimum.
A linear SVM can replace the logistic regression since the logistic regression framework belongs to the genre of linear classifiers. Decision boundaries can segregate classes only if linearly separable.
The scalar projection of b onto vector a is given by the scalar value (b⋅a)/||a||
For a 2-class problem, the minimum possible number of support vectors is 2. We need at least one example/support vector from each class to determine a separating hyperplane.
Hinge loss yields a value of 0 if the predicted output (p) has the same sign as the class label meeting the margin condition |p|>1
Having one feature x ∈ R and binary class y. The dataset contains three points. With SVM, maximum margin increases if the point p2 is removed from the training set.
In employing SVM to realize two input logic gates, the margin for AND and OR gates is the same.
If one non-support vector training example is removed, the length of a max-margin linear SVM will be unaltered. The separating hyperplanes are determined only by the training examples, the support vectors.

Week 4 - Assignment 4

The weight update rule for gradient descent optimization at step t+1 of a cost function with the form J(θ) = θ² - θ + 2, with a learning rate α=0.01, is θ(t+1) = θt - 0.01(2θ - 1).
Gradient descent can get stuck in the saddle point of the second graph.
The graph of cost vs epochs is quite smooth for batch gradient descent because the algorithm averages over all the gradients of training data for a single step unlike the fluctuations in stochastic gradient descent.
A red point on the Cost graph which is the furthest from the origin, indicates higher magnitude of weight update because the weight update is directly proportional to the magnitude of the gradient of the cost function. In this case, the gradient equation is ∂J(θ)/∂θ = 0.5θ.
A two-layered neural network can be used for any type of logic gate (linear or non-linear) implementation.
With two features X and Y to discriminate between two classes, the minimum number of neuron-layers required to design the neural network classifier 1 since single layer is able to do the classification task and feature points are linearly separable.
The range for a logistic function is between 0 to 1.
A neural network has 3 inputs and 2 classes and a hidden layer with 5 neurons the number of weights (including bias) to be learned by is 32. Formula:(#input+1(bias))x(#Hidden nodes) + (#Hidden Nodes+1(bias))x(#classes)
The output for a XNOR function with X₁ = 1 and X₂ = 0 is 0, after considering the activation function f(x).
Tanh activation function is more prone to the vanishing gradient problem.

Week 5 - Assignment 5

A fully connected neural network has a single hidden layer with 30 nodes represented by a 3D feature vector with a binary classification problem. Considering there are no bias nodes in the network. The number of parameters are 120. Formula = (3 * 30) + (30 * 1)
For a binary classification setting, if the probability of belonging to a class = +1 is 0.22, the probability of belonging to a class = -1 is 0.78. In the binary classification setting will keep a single output node which can denote the probability (p) of belonging to class= +1. So, probability of belonging to class= -1 is (1 - p) since the 2 classes are mutually exclusive.
With the input to SoftMax activation function is [2,4,6]. The output is [0.016,0.120, 0.864] using the definition of softmax.
A 3-input neuron has weights 1, 0.5, 2 with the constant of proportionality being equal to 2. The output will be 40 using the formula: 2*(12 + 0.520 + 2*4 ) = 40
ReLU activation functions is NOT analytically differentiable for all real values of the given input. ReLu(x) is not differentiable at x = 0, where x is the input to the ReLu layer.
The given perceptron realize a NOR function because in the above figure, when either il or i2 is 1, output is zero. When both i1 and i2 is 0, output is 1, this is NOR logic.
Is the size of the weight matrices between any layer 1 and layer 2 is given by #nodes in layer 1 X #nodes in layer 2
A fully connected neural network has input, one hidden layer, and output layer with 40, 2, 1 nodes respectively in each layer (no biases). the total number of learnable parameters are 82. Number of learnable parameters are weights and bias and for fully connected network where each node is connected its (402)+(21)
A 10-class neural network classifier when classifying a cat image to its breed, Cross Entropy Loss is well suited for classification problems.
There are (20)*10+ (10)104 + (10)*1 trainable parameters in a fully-connected neural network with five hidden layers, each of the layers having 10 hidden units. The input is 20-dimensional and the output is a scalar without bias.

Week 6 - Assignment 6

In a neural network where X = a + b and Y = X * c , the gradient of Y with respect to a, b, and c is (-4, -4, 5).
dy/da = 1 and dy/db = 0 for y = max(a, b) and a > b.
PCA reduces the dimension by finding a few Orthogonal linear combinations.
After centering X matrix and computing the unit-length principal component directions of X. Algorithm would choose option A or D.
PCA works well with non-linear data but Autoencoders are best suited for linear data, Which of the following is FALSE about PCA and Autoencoders?
Concerning the backpropagation rule, the gradient of the final layer weights are calculated first.
For PCA, the following are true:
- Rotates the axes to lie along the principal components
- Is calculated from the covariance matrix
- Removes some information from the data
A no-bias autoencoder will have the number of parameters = 10010+10100 if the inputs neurons=100, hidden neurons=10, and the hidden is one hidden layer.
The two vectors can form the first two principal components because {2;3;1} and {3;1;-9} are the vectors are orthogonal
Lets say vectors ả = {2; 4} and b = {n; 1} forms the first two principle components after applying PCA. Under such circumstances, -2 can be a possible value of n.

Week 7 - Assignment 7

Introducing an information bottleneck without nodes reduction is sparse autoencoders and the idea is encourages a network to only activate a small number of neurons, it has Statement 1 is false, but statement 2 is true.
The denosising Autoencoders has corrupted versions of input where The loss is between the original input and the reconstruction from a noisy version and can be used as a tool for feature extraction, both statements are true.
The autoencoder methods uses a uses corrupted versions of the input as a Denoising Design.
The autoencoder methods uses a hidden layer with fewer units than input layer as Undercomplete Design.
Regarding autoencoders, the true statements are:
- possesses generalization capabilities
- to minimize the reconstruction loss so output is similar to input
- compresses the input into a latent space representation and then reconstruct the output from it
The value of d(t − 34) * x(t + 56), being the delta function d(t) and * indicates the convolution. the convolution operation shifts accordingly x(t-22)
The Linear And Time Invariant system has an impulse response is the output due to impulse at time=0.
The impulse function is 1 when t=0
Denoising autoencoder is the variant of autoencoder where row 1: Original Input, Row 2: Noisy input, Row 3: Reconstructed output.
The reconstruct of original noise-free data from noisy input is the task of Denoising autoencoder
Contractive autoencoders is penalizing instances where a small change in the input leads to a large change in the encoding space.

Week 8 - Assignment 8

Which of the following is false about CNN there can be only 1 fully connected layer in CNN is the statement that is generally false about CNNs.
Convolution Matrix Size = 60X60 with P=0, I=64, F=5 and S=1 The input image has been converted into a matrix of size 64 X 64 and a kernel/filter of size 5x5 with a stride of 1 and no padding.
Output Matrix Size = 2X2 which a filter size of 3x3 is convolved matrix of size 4x4 (stride=1). To find the output use formula ((n − f +2P)/S+ x ((n − f +2P)/S+ 1)
In a CNN, if you are to add these layers: layer 1 with Filter Size – 3 X 3, Number of Filters – 10, Stride - 1, Padding -0, Layer 2 with Filter Size – 5 X 5, Number of Filters – 20, Stride – 2, Padding -0, Layer 3 with Filter Size – 5 X5, Number of Filters – 40, Stride – 2, Padding -0
layer 3 of the above is followed by a fully connected layer we give a 3-D image input of dimension 39 X 39 , the dimensions of fully connected network will be 1960.
Given 64 convulutional kernels of size 3 × 3 with padding and stride 1 in the first layer of a convolutional neural network and feed dimension 1024x1024x in then dimensions the next layer gets will be *WXHXD1. In this case, 1022 X 1022X40.
The last fully connected layer in CNN is used to classify an image has to undergo a Leaky ReLU.
There’s three By three Color image and we use Convolutional layer with 5x5 the number of parameters will be 7500. Each with dimension = 5* 5 * =75 with no bias.
Sigmoid activation function is one of the following that's causing vanishing gradients. Statement 1: Residua networks be a solution to vanishing gradient
Statement 2 Residua networks have a direct connection through earlier layers hence both Statement 1 and 2 combined are correct hence they are a solution to the gradient problem .
The soft max function which gives SoftMax is in the form of the expression of σ = exj /sum exl = 0.28 with formula with and input = 1

Week 9 - Assignment 9

When having very small learning rate there will slow convergence.
"Fraction which in dicates acceleration" What is true in reference with y as up date vector for optimizing momentum as Vt = yVt-1 + noy 0
In the reference from true moment with optimizer statement = helps accelerating stochastic decent through the right and helps the preventions of the steps and help with knowing the next step= from the above
When using gradient decent, the cost function with gradient update is 01-0a dJ0/dy where a is a learning
the cost function /25+6? The weight will be updated as new 0 - 6a (25-D= where a is a leading
The iteration of great descent is the function for Oth, 0,2, to increases what can cause The leaning rates.
the0th, 02 to decreases what causes If global then the the, and , will go the same direction if if inilizated with great cent? The vales remain similer. Exploding gratdents are a Theupdate of very great values to a unstables network Calculate of error is the part to implement for
The Theerror decreases which is why thelearning increases

Week 10 - Assignment 10

An incorrect use of Batch Normalizatrion results from a Faster inference time. Increasing computational burden, which means inferences take longer.

1/3 [1/3] + [3/3] + [1,3] = To batch in one is what the neurons what is the value? The value would be, 2-3-2 / 4-3-3 /2/5/2 - Batch normalization in a hidden layer has 3 values at, Batch normalization mean gives value in all 3. = 2,32 4,3,3,

preventing understating you need to Increase number of futures. increasing the samples to capture the date and fitting those values.
The normalisation statistics uses testing for genallizaton from the of estimate and variance. = Generalise mean and variance
A not advantages of drop out is to reduce Thetest time or compilation = Dropout makes random futures during raining but no testing down a advantage in general over batch from a recurrenct new networks
To while a new networks forimage forimage recognization task a, plot graph and errors with early stopping. The best is to view is in the error validation A random shuffle a the pixie is is not a way to argue to data augment
The best way to approach complex functions is to increase the layer numbers in a
Batch normalization normalizes all at first before sending = Normal in general

Week 11 - Assignment 11

which is correct? = What is most specific ? - ( One hot encoding) = There cannot two correct answer with classes.
What is the signal if ? = One signal is is a filter to 5/4, what with the values be? What will be the of the non cropped signal? – Find the out puts of it
The challages = The different elements to recognition that make the problems come. = limited data set size.
What the is advantage of FCN? = Large receipt field
Image = which is out of the output based on un-port. = Check values .
It Has - = What Is the dis-adv for CNN = Fixed
No bit = is hat if what are the number if 1 bit values.
dice coificcent = What will be in between = check each the .
Why for L2/ Is a use? -- To create hyper sphere in high space

Week 12 - Assignment 12

Distribution to sample an latent = Distribution is important
-What is the output for GAN generated the the the data = Its 50%/ Is to distinguish real images then from fake images = 5 Re-meter = to be sample the we can.
Which is that can show
Not be not = with

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Pattern Recognition: Descriptors and Classifiers

Choose a study mode

Podcast

Questions and Answers

Given the signature descriptor of an unknown shape, where the distance from the centroid is constant for every angle, what shape is it?

Which transformation is used to extract features for measuring the smoothness, coarseness, and regularity of a region?

Suppose a shape's Fourier descriptor has K coefficients. If the last few coefficients are removed, using only the first m (m<K) to reconstruct the shape, what effect will this truncation have?

When computing a polygonal descriptor of an arbitrary shape using a splitting technique, what is the typical starting guess?

Consider a two-class Bayes' Minimum Risk Classifier. The probabilities of classes W1 and W2 are P(W1) = 0.3 and P(W2) = 0.7, respectively. P(x) = 0.545, P(x|W1) = 0.65, P(x|W2) = 0.5, and the loss matrix values are given. If the classifier assigns x to class W1, which relationship must be true?

Which of the following provides the correct formula for the Fourier transform $a(u)$ of a complex sequence $s(k)$ for $k = 0, ..., N-1$?

For a given gray co-occurrence matrix C of an image, what does the maximum probability descriptor represent?

Which of the following is considered a region descriptor, rather than a boundary descriptor?

Which type of information is typically extracted using a gray co-occurrence matrix?

If the larger values of a gray co-occurrence matrix are concentrated around the main diagonal, which statement is most likely to be true?

If you are solving an n-class problem using discriminant functions, how many discriminant functions will you need?

If the discriminant function $g_i(x)$ is chosen to be a function $f$ of the posterior probability $p(w_i/x)$, i.e., $g_i(x) = f(p(w_i/x))$, which of the following cannot be the function $f$?

Given that all classes have equal class probabilities, what will be the nature of the decision surface when the covariance matrices of different classes are identical but otherwise arbitrary?

The mean and variance of samples from two normally distributed classes, w1 and w2, are given. What is the value expression of decision boundary between these two classes if both classes have equal class probability 0.5, for the input sample x?

For a two-class problem, the linear discriminant function is given by g(x)=a^Ty. If y is the augmented feature vector, what is the updating rule for finding the weight vector a?

Which condition must be satisfied for a minimum distance classifier?

Which of the following is the updating rule of the gradient descent algorithm, where ∇ is the gradient operator and η is the learning rate?

In k-nearest neighbor's algorithm (k-NN), how do we classify an unknown object?

What is the direction of the weight vector with respect to the decision surface for a linear classifier?

What is the distance of point P = (-3, 1, 3) from the plane defined by (2x + 2y + 5z + 9 = 0)?

What is the typical shape of the loss landscape during the optimization of an SVM?

How many local minima can be encountered while solving the optimization problem for maximizing the margin in an SVM?

Which of the following classifiers can be replaced by a linear SVM?

Find the scalar projection of vector b = <-2, 3> onto vector a = <1, 2>.

For a 2-class problem, what is the minimum possible number of support vectors needed for an SVM, assuming there are more than 4 examples from each class?

Which of the following is a valid representation of hinge loss (of margin=1) for a two-class problem where y is the class label (+1 or -1) and p is the predicted value?

Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1, y1) = (-1, -1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect to SVM?

If we employ SVM to realize two input logic gates, then which of the following will be true?

In a max-margin linear SVM, what will happen to the margin length, if one of the non-support vector training examples is removed?

A given cost function is of the form $J(\theta) = \theta^2 - \theta + 2$. With a learning rate α = 0.01, what is the weight update rule for gradient descent at step t+1?

In which of the following graphs will gradient descent not work correctly?

The following two images are graphs from gradient descent optimization. One is a batch gradient, the other is stochastic gradient descent. Which correctly identifies the graph types.

For a cost function (J(\theta) = 0.25\theta^2) as shown in the graph below, where will a weight update be the greatest?

Which logic functions can be performed using a 2-layered neural network?

Let X and Y be two features used to discriminate between two classes (values and classes given.) What is the minimum number of neuron-layers required to design the neural network classifier?

Which is the range of values for a logistic function?

The number of weights to be learned by a neural network can be estimated as: 3 inputs, 2 output classes, and a hidden layer.

When looking at the XNOR function shown, what will the otuput be when 1 is the input for (X_1) and 0 in the input for (X_2)?

Which activation function is more prone to vanishing gradient problem?

Suppose a fully-connected neural network has a single hidden layer with 30 nodes. The input is represented by a 3D feature vector and we have a binary classification problem. Calculate the number of parameters of the network. Consider there are NO bias nodes in the network

For a binary classification setting, if the probability of belonging to class= +1 is 0.22, what is the probability of belonging to class= -1 ?

What will be the output of the equation, [2,4,6], given it is run through the SoftMax activation?

A 3-input neuron has weights 1, 0.5, 2. The transfer function is linear, with the constant of proportionality being equal to 2. The inputs are 2, 20, 4 respectively. The output will be

Which of the following is NOT analytically differentiable for all real values of the given input for activation functions?

Flashcards

Signature descriptor shape identification

Texture feature extraction

Fourier descriptor components

Polygonal descriptor starting guess

Non-boundary descriptor

Texture information extraction.

Gray co-occurrence concentration

Discriminant function selection

Identical Covariance Matrices decision surface

Linear discriminant update rule

Minimum distance classifier

Gradient descent

Unequal prior probabilities effect

K-NN classification

Weight vector direction

SVM objective

SVM: Minimum Support Vectors.

Support vector removal

Non-support vector effects

Weight update rule

Gradient descent failure

More weight updates?

Two layer neural net

ReLU

Denoising autoencoder use

Undercomplete data

Not Analytically Differentiable

Impulse response

Auto encoder parameters.

What 2 form 2?

When to underfit