Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Supervised Q-Learning for Continuous Control
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-005-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-005-alpha.b-cdn.net
      • sl-yoda-v2-stream-005-beta.b-cdn.net
      • 1034628162.rsc.cdn77.org
      • 1409346856.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Supervised Q-Learning for Continuous Control
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Supervised Q-Learning for Continuous Control

            Dec 2, 2022

            Speakers

            HS

            Hao Sun

            Speaker · 2 followers

            ZX

            Ziping Xu

            Speaker · 0 followers

            TW

            Taiyi Wang

            Speaker · 0 followers

            About

            Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL). However, PG algorithms rely on exploiting the value function being learned with the first-order update locally, which results in limited sample efficiency. In this work, we propose an alternative method called Zeroth-Order Supervised Policy Improvement (ZOSPI). ZOSPI exploits the estimated value function Q globally while preserving the local exploitation of the PG methods based on zeroth-order policy optimizati…

            Organizer

            N2
            N2

            NeurIPS 2022

            Account · 961 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
            00:59

            Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

            Shiqi Yang, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets
            10:01

            Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets

            Yimu Wang, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World
            04:28

            The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World

            William Gaviria Rojas, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Generative Collage and its Sticky Questions on Human-AI Co-Creativity
            21:25

            Generative Collage and its Sticky Questions on Human-AI Co-Creativity

            Piotr Mirowski

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            MVP: Practical Adversarial Multivalid Conformal Prediction
            04:56

            MVP: Practical Adversarial Multivalid Conformal Prediction

            Georgy Noarov, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Dual-Generator Offlien Reinforcement Learning
            06:33

            Dual-Generator Offlien Reinforcement Learning

            Quan Vuong, …

            N2
            N2
            NeurIPS 2022 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow NeurIPS 2022