Andrea Montanari (Stanford University)
Despite their empirical success, the principles underlying modern deep learning models remain mysterious. These models are trained by optimizing highly non-convex objectives, using a variety of gradient-based algorithms that -at best- are only guaranteed to converge to local optima. The model complexity is often large or comparable to the sample size, and hence many choices of the model parameters exist that perform equally well on the training data, but not all of them generalize to unseen data. Over the last few years, an informal scenario has emerged that captures these phenomena. I will describe its elements and explain a few examples in which this scenario can be made more precise.
A rough outline of the lectures:
- Why does overparametrization help optimization?
- Why doesn't overparametrization hurt generalization?
- The puzzle of architecture.
- Generative modeling.