Podcast
Questions and Answers
What is the main observation regarding gains in the classification performance in the 2014 ILSVRC classification challenge?
What is the main observation regarding gains in the classification performance in the 2014 ILSVRC classification challenge?
Gains in the classification performance tend to transfer to significant quality gains in a wide variety of application domains.
What are the enabling factors for various use cases such as mobile vision and big-data scenarios?
What are the enabling factors for various use cases such as mobile vision and big-data scenarios?
Computational efficiency and low parameter count.
What are convolutional networks at the core of?
What are convolutional networks at the core of?
Most state-of-the-art computer vision solutions for a wide variety of tasks.
What are the top-1 and top-5 error rates achieved by the benchmarked methods?
What are the top-1 and top-5 error rates achieved by the benchmarked methods?
How many multiply-adds per inference does the network with a computational cost of 5 billion have?
How many multiply-adds per inference does the network with a computational cost of 5 billion have?
How many parameters are used by the network with a computational cost of 5 billion?
How many parameters are used by the network with a computational cost of 5 billion?
How many models are used in the ensemble with multi-crop evaluation?
How many models are used in the ensemble with multi-crop evaluation?
What is the purpose of reducing the dimension of the input representation before spatial aggregation in a convolutional network?
What is the purpose of reducing the dimension of the input representation before spatial aggregation in a convolutional network?
What is the effect of increasing both the width and depth of a convolutional network?
What is the effect of increasing both the width and depth of a convolutional network?
Why should one avoid representational bottlenecks, especially early in the network?
Why should one avoid representational bottlenecks, especially early in the network?
Why is dimension reduction often used in a vision network?
Why is dimension reduction often used in a vision network?
According to the text, what is the main advantage of replacing a 5 × 5 convolution with a two-layer convolutional architecture?
According to the text, what is the main advantage of replacing a 5 × 5 convolution with a two-layer convolutional architecture?
What is the relative gain in computational cost achieved by replacing a 5 × 5 convolution with two layers of 3 × 3 convolution?
What is the relative gain in computational cost achieved by replacing a 5 × 5 convolution with two layers of 3 × 3 convolution?
Does using linear activation in the first layer of the two-layer convolutional architecture result in any loss of expressiveness?
Does using linear activation in the first layer of the two-layer convolutional architecture result in any loss of expressiveness?
In what grid sizes does the factorization of convolutions into asymmetric convolutions work well?
In what grid sizes does the factorization of convolutions into asymmetric convolutions work well?
What is the advantage of using asymmetric convolutions, such as n × 1 convolutions?
What is the advantage of using asymmetric convolutions, such as n × 1 convolutions?
What is the disadvantage of factorizing a 3 × 3 convolution into two 2 × 2 convolutions?
What is the disadvantage of factorizing a 3 × 3 convolution into two 2 × 2 convolutions?
What is the purpose of auxiliary classifiers in deep networks?
What is the purpose of auxiliary classifiers in deep networks?
What is the effect of removing a lower auxiliary branch from a network with multiple side-heads?
What is the effect of removing a lower auxiliary branch from a network with multiple side-heads?
What is the purpose of the pooling layers in the proposed network architecture?
What is the purpose of the pooling layers in the proposed network architecture?
What is the computational cost of the proposed network architecture compared to GoogLeNet and VGGNet?
What is the computational cost of the proposed network architecture compared to GoogLeNet and VGGNet?
How is the traditional 7x7 convolution factorized in the proposed network architecture?
How is the traditional 7x7 convolution factorized in the proposed network architecture?
What is the loss function used for training the classifier layer in the proposed network?
What is the loss function used for training the classifier layer in the proposed network?