Residual connections
Introduce skip connections
- Helps avoid vanishing gradients
- Allows to train networks with 1000+ layers
- Skip more than one layer: DenseNets

Very common paradigm: train on large dataset for features, fine tune linear classifier on a different task with less available data