inception v3 pdf

Play an AI-generated podcast conversation about this lesson

What is the main observation regarding gains in the classification performance in the 2014 ILSVRC classification challenge?

Gains in the classification performance tend to transfer to significant quality gains in a wide variety of application domains.

What are the enabling factors for various use cases such as mobile vision and big-data scenarios?

Computational efficiency and low parameter count.

What are convolutional networks at the core of?

Most state-of-the-art computer vision solutions for a wide variety of tasks.

What are the top-1 and top-5 error rates achieved by the benchmarked methods?

Top-1 error: 21.2%, Top-5 error: 5.6% Signup and view all the answers

How many multiply-adds per inference does the network with a computational cost of 5 billion have?

5 billion Signup and view all the answers

How many parameters are used by the network with a computational cost of 5 billion?

Less than 25 million Signup and view all the answers

How many models are used in the ensemble with multi-crop evaluation?

4 Signup and view all the answers

What is the purpose of reducing the dimension of the input representation before spatial aggregation in a convolutional network?

To reduce loss of information during dimension reduction and promote faster learning. Signup and view all the answers

What is the effect of increasing both the width and depth of a convolutional network?

It can contribute to higher quality networks. Signup and view all the answers

Why should one avoid representational bottlenecks, especially early in the network?

To prevent extreme compression and allow for a gentle decrease in representation size from inputs to outputs. Signup and view all the answers

Why is dimension reduction often used in a vision network?

Because the outputs of near-by activations are highly correlated, allowing for reduction before aggregation and similarly expressive local representations. Signup and view all the answers

According to the text, what is the main advantage of replacing a 5 × 5 convolution with a two-layer convolutional architecture?

The main advantage is a reduction in parameter count by sharing weights between adjacent tiles. Signup and view all the answers

What is the relative gain in computational cost achieved by replacing a 5 × 5 convolution with two layers of 3 × 3 convolution?

The relative gain in computational cost is 28%. Signup and view all the answers

Does using linear activation in the first layer of the two-layer convolutional architecture result in any loss of expressiveness?

Using linear activation in the first layer is inferior to using rectified linear units in all stages of the factorization. Signup and view all the answers

In what grid sizes does the factorization of convolutions into asymmetric convolutions work well?

The factorization works well on medium grid sizes, ranging between 12 and 20. Signup and view all the answers

What is the advantage of using asymmetric convolutions, such as n × 1 convolutions?

Using asymmetric convolutions can achieve the same receptive field as a larger symmetric convolution, but with fewer computations. Signup and view all the answers

What is the disadvantage of factorizing a 3 × 3 convolution into two 2 × 2 convolutions?

Factorizing a 3 × 3 convolution into two 2 × 2 convolutions only provides a 11% saving of computation. Signup and view all the answers

What is the purpose of auxiliary classifiers in deep networks?

The purpose of auxiliary classifiers is to improve the convergence of very deep networks by pushing useful gradients to the lower layers and combating the vanishing gradient problem. Signup and view all the answers

What is the effect of removing a lower auxiliary branch from a network with multiple side-heads?

The removal of a lower auxiliary branch does not have any adverse effect on the final quality of the network. Signup and view all the answers

What is the purpose of the pooling layers in the proposed network architecture?

The purpose of the pooling layers in the proposed network architecture is to reduce the grid size. Signup and view all the answers

What is the computational cost of the proposed network architecture compared to GoogLeNet and VGGNet?

The computational cost of the proposed network architecture is only about 2.5 times higher than that of GoogLeNet and it is still much more efficient than VGGNet. Signup and view all the answers

How is the traditional 7x7 convolution factorized in the proposed network architecture?

The traditional 7x7 convolution is factorized into three 3x3 convolutions in the proposed network architecture. Signup and view all the answers

What is the loss function used for training the classifier layer in the proposed network?

The loss function used for training the classifier layer in the proposed network is the cross entropy. Signup and view all the answers