Dez 2, 2022
Sprecher:in · 1 Follower:in
Sprecher:in · 0 Follower:innen
Both gradient descent and dual averaging for convex Lipschitz functionshave convergence rates that are highly dependent on the choice of learningrate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set D. A numberof approaches are known that relax this requirement, but they eitherrequire line searches, restarting (hyper-parameter grid search), or do not derivefrom the gradient descent or dual averaging frameworks (coin-betting).In this work we describe a single pass method, with no back-tracking or line searches, derived from dual averaging,which does not require knowledge of D yet asymptotically achievesthe optimal rate of convergence for the complexity class of ConvexLipschitz functions.Both gradient descent and dual averaging for convex Lipschitz functionshave convergence rates that are highly dependent on the choice of learningrate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set D. A numberof approaches are known that relax this requirement, but they eitherrequire line searches, restarting (hyper-parameter grid search), or do not de…
Konto · 962 Follower:innen
Professionelle Aufzeichnung und Livestreaming – weltweit.
Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind