Jiachen Li, Shuo Cheng, Zhenyu Liao, Huayan Wang, William Yang Wang, Qinxun Bai · Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-008-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-008-alpha.b-cdn.net
sl-yoda-v2-stream-008-beta.b-cdn.net
1159783934.rsc.cdn77.org
1511376917.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Dec 2, 2022

Speakers

Jiachen Li

Speaker · 0 followers

Shuo Cheng

Speaker · 0 followers

Zhenyu Liao

Speaker · 0 followers

About

Improving the sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of optimism in the face of uncertainty (OFU), we train a separate exploration policy to maximize the approximate upper confidence bound of the critics in an off-policy actor-critic framework. However, this introduces extra differences between the replay buffer and the target policy regarding their stationary state-action distributions. To mitigate the off-policy-ness, we a…

Organizer

NeurIPS 2022

Account · 961 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Deep Combinatorial Aggregation

00:52

Deep Combinatorial Aggregation

Watch later

Favorite

Yuesong Shen, …

NeurIPS 2022 2 years ago

Distributional Privacy for Data Sharing

02:12

Distributional Privacy for Data Sharing

Watch later

Favorite

NeurIPS 2022 2 years ago

12:56

Conclusion

Watch later

Favorite

NeurIPS 2022 2 years ago

Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

04:58

Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Watch later

Favorite

Ching-Yao Chuang, …

NeurIPS 2022 2 years ago

OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport

04:50

OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport

Watch later

Favorite

Zongsheng Cao, …

NeurIPS 2022 2 years ago

Movement Penalized Bayesian Optimization with Application to Wind Energy Systems

04:58

Movement Penalized Bayesian Optimization with Application to Wind Energy Systems

Watch later

Favorite

Shyam Ramesh, …

NeurIPS 2022 2 years ago