Paper Review 3: Network in Network

Motivation

CNN is capable when concept is “linearly separable”
But concepts are non-linear → Need nonlinear function!
Replace Linear Model with Non-linear function → micro network structure (Network in Network)

Radial Basis Network or Multilayer perceptron are capable of capturing latent concepts (Universal function approximator)

Untitled

\[f_{i,j}^0 = x_{i,j} \\ f_{i, j, k_n}^n = max\{(w_{k_n}^n)^T f_{i,j}^{n-1} + b_{k_n}, 0 \}\]

어떤 연속 함수라도 Hidden Layer 여러 개를 통해 근사 가능!
- 따라서, Hidden Layer 여러개 쌓으면 Function Approximate 가능함

https://dlaiml.tistory.com/entry/Universal-Approximation-Theorem

참고) Universal Approximation Theorem → 이론적으로 하나의 hidden layer와 비선형의 연속적인 활성화 함수를 사용하면 어떠한 연속함수라도 근사가 가능하다는 이론

Maxpool → capable of modeling any “convex” function
- But features are not always on convex function! → Need Universal function approximator

Why global avg pooling?

More native to CNN Structure by enforcing correspondences between feature maps and categories
No parameter to optimize → avoid overfitting (FC layer is prone to overfitting)
Sums out spatial information