摘要

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high di-mensional data. However, many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods and regularization. The empirical success of overparameterized models, which is often known as benign overfitting, mo-tivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the ex-cess risk of stochastic gradient descent (SGD) solutions for both convex and locally non-convex loss functions. We further discuss data and model conditions that lead to a "low effective dimension". Under these conditions, we show that the excess risk either does not depend on the ambient dimen-sion p or depends on p via a poly-logarithmic factor. We also demonstrate that in several widely used statistical models, the "low effective dimen-sion" arises naturally in overparameterized settings. The studied statistical applications include both convex models such as linear regression and lo-gistic regression and non-convex models such as M-estimator and two-layer neural networks.

全文