Hao Sun, Ziping Xu, Taiyi Wang, Meng Fang, Bolei Zhou · Supervised Q-Learning for Continuous Control · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Supervised Q-Learning for Continuous Control

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-005-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-005-alpha.b-cdn.net
sl-yoda-v2-stream-005-beta.b-cdn.net
1034628162.rsc.cdn77.org
1409346856.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Supervised Q-Learning for Continuous Control

Supervised Q-Learning for Continuous Control

Dec 2, 2022

Speakers

Hao Sun

Speaker · 2 followers

Ziping Xu

Speaker · 0 followers

Taiyi Wang

Speaker · 0 followers

About

Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL). However, PG algorithms rely on exploiting the value function being learned with the first-order update locally, which results in limited sample efficiency. In this work, we propose an alternative method called Zeroth-Order Supervised Policy Improvement (ZOSPI). ZOSPI exploits the estimated value function Q globally while preserving the local exploitation of the PG methods based on zeroth-order policy optimizati…

Organizer

NeurIPS 2022

Account · 961 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

00:59

Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

Watch later

Favorite

Shiqi Yang, …

NeurIPS 2022 2 years ago

Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets

10:01

Cooperation or Competition: Avoiding Player Domination for Multi-target Robustness by Adaptive Budgets

Watch later

Favorite

NeurIPS 2022 2 years ago

The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World

04:28

The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World

Watch later

Favorite

William Gaviria Rojas, …

NeurIPS 2022 2 years ago

Generative Collage and its Sticky Questions on Human-AI Co-Creativity

21:25

Generative Collage and its Sticky Questions on Human-AI Co-Creativity

Watch later

Favorite

NeurIPS 2022 2 years ago

MVP: Practical Adversarial Multivalid Conformal Prediction

04:56

MVP: Practical Adversarial Multivalid Conformal Prediction

Watch later

Favorite

Georgy Noarov, …

NeurIPS 2022 2 years ago

Dual-Generator Offlien Reinforcement Learning

06:33

Dual-Generator Offlien Reinforcement Learning

Watch later

Favorite

Quan Vuong, …

NeurIPS 2022 2 years ago