Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Toward Understanding Why Adam Converges Faster Than SGD for Transformers
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-007-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-007-alpha.b-cdn.net
      • sl-yoda-v2-stream-007-beta.b-cdn.net
      • 1678031076.rsc.cdn77.org
      • 1932936657.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Toward Understanding Why Adam Converges Faster Than SGD for Transformers
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Toward Understanding Why Adam Converges Faster Than SGD for Transformers

            Dec 2, 2022

            Speakers

            YP

            Yan Pan

            Speaker · 0 followers

            YL

            Yuanzhi Li

            Speaker · 2 followers

            About

            While stochastic gradient descent (SGD) is still the most popular optimization algorithm in deep learning, adaptive algorithms such as Adam have established empirical advantages over SGD in some deep learning applications such as training transformers. However, it remains a question why Adam converges significantly faster than SGD in these scenarios. In this paper, we explore one explanation of why Adam converges faster than SGD using a new concept directional sharpness. We argue that the perfor…

            Organizer

            N2
            N2

            NeurIPS 2022

            Account · 953 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks
            04:49

            Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks

            Yijing Liu, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Inverse Design for Fluid-Structure Interactions using Graph Network Simulators
            00:59

            Inverse Design for Fluid-Structure Interactions using Graph Network Simulators

            Kelsey R. Allen, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Benchmarking Node Outlier Detection on Graphs
            05:03

            Benchmarking Node Outlier Detection on Graphs

            Kay Liu, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Panel: Greenhouse gas emissions and climate vulnerability impact assessment
            59:48

            Panel: Greenhouse gas emissions and climate vulnerability impact assessment

            Peetak Mitra, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Temporary Goals for Exploration
            04:52

            Temporary Goals for Exploration

            Haoyang Xu, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree
            05:07

            A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree

            Kaixun Hua, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow NeurIPS 2022