Max Sobol Mark, Ali Ghadirzadeh, Xi Chen, Chelsea Finn · Fine-tuning Offline Policies with Optimistic Action Selection · SlidesLive

Kategorie

CS

Přihlásit se Kontaktujte nás

Další

Živý přenos začne již brzy!

Živý přenos již skončil.

Prezentace ještě nebyla nahrána!

SlidesLive

title: Fine-tuning Offline Policies with Optimistic Action Selection

0:00 / 0:00

Nahlásit chybu
Nastavení
Playlisty
Záložky
Titulky Off
Rychlost přehrávání
Kvalita

Nastavení
Debug informace
Server sl-yoda-v2-stream-008-alpha.b-cdn.net
Velikost titulků Střední

Záložky

Server
sl-yoda-v2-stream-008-alpha.b-cdn.net
sl-yoda-v2-stream-008-beta.b-cdn.net
1159783934.rsc.cdn77.org
1511376917.rsc.cdn77.org

Titulky
Off
English

Rychlost přehrávání

Kvalita

Velikost titulků
Velké
Střední
Malé

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Fine-tuning Offline Policies with Optimistic Action Selection

Fine-tuning Offline Policies with Optimistic Action Selection

2. prosince 2022

Řečníci

Max Sobol Mark

Řečník · 1 sledující

Ali Ghadirzadeh

Řečník · 0 sledujících

Xi Chen

Řečník · 0 sledujících

O prezentaci

Offline reinforcement learning algorithms can train performant policies for hard tasks using previously-collected datasets. However, the quality of the offline dataset often limits the levels of performance possible. We consider the problem of improving offline policies through online fine-tuning. Offline RL requires a pessimistic training objective to mitigate distributional shift between the trained policy and the offline behavior policy, which will make the trained policy averse to picking no…

Organizátor

NeurIPS 2022

Účet · 962 sledujících

Baví vás formát? Nechte SlidesLive zachytit svou akci!

Profesionální natáčení a streamování po celém světě.

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

05:00

k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

Zhlédnout později

Oblíbené

Ziv Goldfeld, …

NeurIPS 2022 2 years ago

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

01:02

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Zhlédnout později

Oblíbené

Eloïse Berthier, …

NeurIPS 2022 2 years ago

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

04:33

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

Zhlédnout později

Oblíbené

Renrui Zhang, …

NeurIPS 2022 2 years ago

Causal Structural Hypothesis Testing and Data Generative Models

02:16

Causal Structural Hypothesis Testing and Data Generative Models

Zhlédnout později

Oblíbené

Jeffrey Jiang, …

NeurIPS 2022 2 years ago

SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

04:51

SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

Zhlédnout později

Oblíbené

Gamaleldin F. Elsayed, …

NeurIPS 2022 2 years ago

Towards Low Cost Automated Monitoring of Life Below Water to De-risk Ocean-Based Carbon Dioxide Removal and Clean Power

04:45

Towards Low Cost Automated Monitoring of Life Below Water to De-risk Ocean-Based Carbon Dioxide Removal and Clean Power

Zhlédnout později

Oblíbené

Kameswari Devi Ayyagari, …

NeurIPS 2022 2 years ago