Economical use of second-order information in training machine learning models

Dec 13, 2019



Stochastic gradient descent (SGD) and variants such as Adagrad and Adam, are extensively used today to train modern machine learning models. In this talk we will discuss ways to economically use second-order information to modify both the step size (learning rate) used in SGD and the direction taken by SGD. Our methods adaptively control the batch sizes used to compute gradient and Hessian approximations and and ensure that the steps that are taken decrease the loss function with high probability assuming that the latter is self-concordant, as is true for many problems in empirical risk minimization. For such cases we prove that our basic algorithm is globally linearly convergent. A slightly modified version of our method is presented for training deep learning models. Numerical results will be presented that show that it exhibits excellent performance without the need for learning rate tuning. If there is time, additional ways to efficiently make use of second-order information will be presented.



About NIPS 2019

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker