Maksym Andriushchenko, Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion · SGD with large step sizes learns sparse features · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: SGD with large step sizes learns sparse features

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-005-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-005-alpha.b-cdn.net
sl-yoda-v2-stream-005-beta.b-cdn.net
1034628162.rsc.cdn77.org
1409346856.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

SGD with large step sizes learns sparse features

SGD with large step sizes learns sparse features

Jul 24, 2023

Speakers

Maksym Andriushchenko

Speaker · 0 followers

Aditya Varre

Speaker · 0 followers

Loucas Pillaud-Vivien

Speaker · 0 followers

About

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics that biases it implicitly toward simple predictors. Furthermore, we show empirically that the longer large step sizes keep SGD high in the l…

Organizer

ICML 2023

Account · 657 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Tighter Information-Theoretic Generalization Bounds from Supersamples

06:37

Tighter Information-Theoretic Generalization Bounds from Supersamples

Watch later

Favorite

Ziqiao Wang, …

ICML 2023 2 years ago

Unsupervised Embedding Quality Evaluation

11:02

Unsupervised Embedding Quality Evaluation

Watch later

Favorite

Anton Tsitsulin, …

ICML 2023 2 years ago

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

05:06

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Watch later

Favorite

Simone Bombari, …

ICML 2023 2 years ago

Towards Understanding and Reducing Graph Structural Noise for GNNs

05:14

Towards Understanding and Reducing Graph Structural Noise for GNNs

Watch later

Favorite

Mingze Dong, …

ICML 2023 2 years ago

Differentially Private Optimization on Large Model at Small Cost

05:17

Differentially Private Optimization on Large Model at Small Cost

Watch later

Favorite

ICML 2023 2 years ago

GRIL: a 2-parameter Persistence Based Vectorization for Machine Learning

10:54

GRIL: a 2-parameter Persistence Based Vectorization for Machine Learning

Watch later

Favorite

ICML 2023 2 years ago