Paper Review 13: YOLO9000: Better, Faster, Stronger

Summary

YOLOv2 = Adds several techniques to YOLOv1
YOLO9000 = Uses WordTree for semi zero-shot detection

YOLOV2

Batch Normalization
High Resolution Classifier
1. First fine-tune classification network with ImageNet
2. Fine-tune resulting network on detection
Conv with Anchor boxes

Removed FC layers from v1 and use anchor boxes
Dimension Clusters

Instead of choosing anchor boxes by hand, use k-means clustering on the training set bounding boxes.

K-means clustering? = grouping into k similar clusters by distance

How does this paper define distance metric? → want to make good IOU scores which is independent of the box size
\[d(\text{box},\ \text{centroid})=1-\text{IOU}(\text{box},\ \text{centroid})\]
Why does this lead to good results? → Because it starts the model off with a better representation, it makes task easier to learn
Direct location prediction

When using anchor boxes, model became unstable because any anchor box can end up at any point in the image, regardless of what location predicted the box
\[x = (t_x*w_a)-x_a\\ y = (t_y*h_a) - y_a\]
Solution → Predict location coordinates relative to location of grid cell

Network predicts bounding box (5 t’s) grid cell’s location offset (cx, cy) Bounding box prior width, height (pw, ph)
\[b_x=σ(t_x)+c_x\\ b_y=σ(t_y)+c_y\\ b_w=p_we^{t_w}\\ b_h=p_he^{t_h}\\ Pr(\text{object})*\text{IOU}(b,\text{object})=σ(t_o)\]
Fine-Grained Features (Passthrough Layer)

Implemented passthrough layer that concatenates higher resolution features with low resolution features.
Multi-Scale Training

Non-fixed input image size
Darknet-19

Changed VGG models to learn faster. 3x3 filters / doubled # of channels / 1x1 filters for compressing / batch normalization …

YOLO9000

Motivation

Data is scarce so that it is hard to scale detection model → Harness existing classification data and use it to expand the scope of detection systems

Idea

Joint training on both detection and classification data

Detection data → Full backpropagation
Classification data → Partial backpropagation only for classification specific parts

How to jointly train? → Hierarchical classification

Make WordTree with WordNet
Dataset combination with WordTree
Perform classification with WordTree → Predict conditional probabilities at every node for probability of each hyponym of that synset given that synset