Summary

  • Depthwise separable convolutions → Reduced Computational cost dramatically
  • Introduced 2 simple global hyperparameters that trade off latency and accuracy
    • Width Multiplier
    • Resolution Multiplier

1. Motivation

Current advances to improve accuracy don’t consider “efficiency” of the network → size and speed

2. MobileNet Architecture

2-1. Idea

Starts from the effects of standard convolution

  1. filtering features based on conv kernels
  2. combining features to produce new representation

Then, could we split these into two steps?

2-2. Depthwise Separable Convolution

How to do this

  1. Depthwise Conv: Apply single filter per each input channel
  2. Pointwise Conv: 1x1 convolution, creates a linear combination of output of depthwise layer

Standard Convolution Computational Cost

  • D_K: kernel size / M: # of output features / N: # of input features / D_F: input feature map size
\[D_K · D_K · M · N · D_F · D_F\]

Depthwise Separable Convolution Computational Cost

\[D_K · D_K · M · D_F · D_F + M · N · D_F · D_F\]

Reduction in Computation

\[\frac{1}{N} + \frac{1}{D_K^2}\]

Question)

  • Why computational cost are computed with D_F not the output feature map size D_G?
    • Is it because BIG-O notation? which considers worst time complexity? But what if D_G > D_K ?
    • In this paper, the hypothesis was D_G = D_F. (padding = 1, stride = 1.. maybe same padding?)

3. Global Hyperparameters

Consider these hyperparameters that trades off accuracy and speed. Experiment with these hyperparameters and then choose the best model that you can get.

3-1. Width Multiplier (ɑ)

  • Thin a network uniformly at each layer
  • Computational cost reduces quadratically (ɑ^2) \(D_K·D_K·ɑM·D_F·D_F+ɑM·ɑN·D_F·D_F\)

3-2. Resolution Multiplier (𝜌)

  • Apply to input image and to internal representation of every layer by the same multiplier
  • Computational cost reduces by 𝜌^2 \(D_K·D_K·ɑM·𝜌D_F·𝜌D_F+ɑM·ɑN·𝜌D_F·𝜌D_F\)

3-3. Why two above rather than reducing the depth of layers?

Because thinner network has better performance than shallower network