### Deep Residual Learning for Image Recognition: Deep Learning Gang

- Start with a network that performs well;
- Add additional layers that are forced to be the identity function, that is, they simply pass along whatever information arrives at them without change;
- This network is deeper, but must have the same performance as the original network by construction since the new layers do not do anything;
- Layers in a network can learn the identity function, so they should be able to exactly replicate the performance of this deep network if it is optimal.

