Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server
      • Subtitles size Medium
      • Bookmarks
      • Server
      • Subtitles
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

            Apr 4, 2021

            Speakers

            KM

            Kiwan Maeng

            Speaker · 0 followers

            SB

            Shivam Bharuka

            Speaker · 0 followers

            IG

            Isabel Gao

            Speaker · 0 followers

            About

            The paper proposes and optimizes a partial recovery training system, CPR, for recommendation models. CPR relaxes the consistency requirement by enabling non-failed nodes to proceed without loading checkpoints when a node fails during training, improving failure-related overheads. The paper is the first to the extent of our knowledge to perform a data-driven, in-depth analysis of applying partial recovery to recommendation models and identified a trade-off between accuracy and performance. Motiva…

            Organizer

            M2
            M2

            MLSys 2021

            Account · 159 followers

            About MLSys 2021

            The Conference on Machine Learning and Systems targets research at the intersection of machine learning and systems. The conference aims to elicit new connections amongst these fields, including identifying best practices and design principles for learning systems, as well as developing novel learning methods and theory tailored to practical machine learning workflows.

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Oral: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models
            20:09

            Oral: Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

            Shang Wang, …

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Oral: Value Learning for Throughput Optimization of Deep Learning Workloads
            21:54

            Oral: Value Learning for Throughput Optimization of Deep Learning Workloads

            Benoit Steiner, …

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks

            Tom Bannink, …

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Al and The Virtuous Cycle of Compute
            19:49

            Al and The Virtuous Cycle of Compute

            Pradeep Dubey

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Graphcore’s IPU and GNNs
            42:02

            Graphcore’s IPU and GNNs

            Gianandrea Minneci

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Applying Circuit IR Compilers and Tools (CIRCT) to ML Applications
            28:52

            Applying Circuit IR Compilers and Tools (CIRCT) to ML Applications

            Mike Urbach

            M2
            M2
            MLSys 2021 4 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow MLSys 2021