Economical use of second-order information in training machine learning models

13. Prosinec 2019

Řečníci

O prezentaci

Stochastic gradient descent (SGD) and variants such as Adagrad and Adam, are extensively used today to train modern machine learning models. In this talk we will discuss ways to economically use second-order information to modify both the step size (learning rate) used in SGD and the direction taken by SGD. Our methods adaptively control the batch sizes used to compute gradient and Hessian approximations and and ensure that the steps that are taken decrease the loss function with high probability assuming that the latter is self-concordant, as is true for many problems in empirical risk minimization. For such cases we prove that our basic algorithm is globally linearly convergent. A slightly modified version of our method is presented for training deep learning models. Numerical results will be presented that show that it exhibits excellent performance without the need for learning rate tuning. If there is time, additional ways to efficiently make use of second-order information will be presented.

Organizátor

Kategorie

O organizátorovi (NIPS 2019)

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte NIPS 2019