Jing Dong, Jingwei Liang, Baoxiang Wang, Jingzhao Zhang · Online Policy Optimization for Robust MDP · SlidesLive

Categories

EN

Log in Talk to sales

Next

Livestream will start soon!

Livestream has already ended.

Presentation has not been recorded yet!

SlidesLive

title: Online Policy Optimization for Robust MDP

0:00 / 0:00

Report Issue
Settings
Playlists
Bookmarks
Subtitles Off
Playback rate
Quality

Settings
Debug information
Server sl-yoda-v2-stream-008-alpha.b-cdn.net
Subtitles size Medium

Bookmarks

Server
sl-yoda-v2-stream-008-alpha.b-cdn.net
sl-yoda-v2-stream-008-beta.b-cdn.net
1159783934.rsc.cdn77.org
1511376917.rsc.cdn77.org

Subtitles
Off
English

Playback rate

Quality

Subtitles size
Large
Medium
Small

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Online Policy Optimization for Robust MDP

Online Policy Optimization for Robust MDP

Dec 2, 2022

Speakers

Jing Dong

Speaker · 0 followers

Jingwei Liang

Speaker · 0 followers

Baoxiang Wang

Speaker · 0 followers

About

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework—in which the transition probabilities belong to an uncertainty set around a nominal model—provides one way to develop robust models. While previous analysis shows RL algorithms are effec…

Organizer

NeurIPS 2022

Account · 954 followers

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

04:15

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Watch later

Favorite

Chenyang Wu, …

NeurIPS 2022 2 years ago

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

04:57

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Watch later

Favorite

Tianyuan Jin, …

NeurIPS 2022 2 years ago

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

05:01

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

Watch later

Favorite

Siddharth Reddy, …

NeurIPS 2022 2 years ago

Online Neural Sequence Detection with Hierarchical Dirichlet Point Process

05:47

Online Neural Sequence Detection with Hierarchical Dirichlet Point Process

Watch later

Favorite

NeurIPS 2022 2 years ago

Exploring the Latent Space of Autoencoders with Interventional Assays

04:49

Exploring the Latent Space of Autoencoders with Interventional Assays

Watch later

Favorite

Felix Leeb, …

NeurIPS 2022 2 years ago

Score-based Generative Models and Their Applications

29:07

Score-based Generative Models and Their Applications

Watch later

Favorite

NeurIPS 2022 2 years ago