Disciplines
Computer Sciences (10%); Mathematics (90%)
Keywords
Acceleration,
Supervised Learning,
First-Order Methods,
Implicit Bias,
Adaptive Step Sizes
Abstract
When training modern machine learning models, the number of parameters frequently exceeds the
number of samples or observations. In such scenarios, classical statistical theory suggest that the
resulting model might perform poorly due to overfitting. Possibly many solutions to the
optimization problem exist and not all of them might generalize well to unseen data. However, in
practice, we observe that when these models are trained using optimization methods based on
gradient information, no such overfitting is happening, and the models tend to perform well. This
phenomenon is commonly referred to as implicit regularization. We intend to study the effects of
two commonly used optimization techniques on the obtained solution. The two techniques in
question are momentum, referring to the use of not just the current but also past gradients, and
adaptive step sizes, meaning the step size (or learning rate) parameter is not supplied externally but
computed by the algorithm itself.