Yan Pan, Yuanzhi Li · Toward Understanding Why Adam Converges Faster Than SGD for Transformers · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Toward Understanding Why Adam Converges Faster Than SGD for Transformers

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-007-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-007-alpha.b-cdn.net
sl-yoda-v2-stream-007-beta.b-cdn.net
1678031076.rsc.cdn77.org
1932936657.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

Dec 2, 2022

Speakers

Yan Pan

Speaker · 0 followers

Yuanzhi Li

Speaker · 2 followers

About

While stochastic gradient descent (SGD) is still the most popular optimization algorithm in deep learning, adaptive algorithms such as Adam have established empirical advantages over SGD in some deep learning applications such as training transformers. However, it remains a question why Adam converges significantly faster than SGD in these scenarios. In this paper, we explore one explanation of why Adam converges faster than SGD using a new concept directional sharpness. We argue that the perfor…

Organizer

NeurIPS 2022

Account · 953 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks

04:49

Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks

Watch later

Favorite

Yijing Liu, …

NeurIPS 2022 2 years ago

Inverse Design for Fluid-Structure Interactions using Graph Network Simulators

00:59

Inverse Design for Fluid-Structure Interactions using Graph Network Simulators

Watch later

Favorite

Kelsey R. Allen, …

NeurIPS 2022 2 years ago

Benchmarking Node Outlier Detection on Graphs

05:03

Benchmarking Node Outlier Detection on Graphs

Watch later

Favorite

NeurIPS 2022 2 years ago

Panel: Greenhouse gas emissions and climate vulnerability impact assessment

59:48

Panel: Greenhouse gas emissions and climate vulnerability impact assessment

Watch later

Favorite

Peetak Mitra, …

NeurIPS 2022 2 years ago

Temporary Goals for Exploration

04:52

Temporary Goals for Exploration

Watch later

Favorite

Haoyang Xu, …

NeurIPS 2022 2 years ago

A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree

05:07

A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree

Watch later

Favorite

Kaixun Hua, …

NeurIPS 2022 2 years ago