Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo · Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice · SlidesLive

Kategorie

CS

Přihlásit se Kontaktujte nás

Další

Živý přenos začne již brzy!

Živý přenos již skončil.

Prezentace ještě nebyla nahrána!

SlidesLive

title: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

0:00 / 0:00

Nahlásit chybu
Nastavení
Playlisty
Záložky
Titulky Off
Rychlost přehrávání
Kvalita

Nastavení
Debug informace
Server sl-yoda-v2-stream-010-alpha.b-cdn.net
Velikost titulků Střední

Záložky

Server
sl-yoda-v2-stream-010-alpha.b-cdn.net
sl-yoda-v2-stream-010-beta.b-cdn.net
1759419103.rsc.cdn77.org
1016618226.rsc.cdn77.org

Titulky
Off
English

Rychlost přehrávání

Kvalita

Velikost titulků
Velké
Střední
Malé

Mode
Video Slideshow
Audio Slideshow
Slideshow
Video

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

24. července 2023

Řečníci

Toshinori Kitamura

Řečník · 0 sledujících

Tadashi Kozuno

Řečník · 0 sledujících

Yunhao Tang

Řečník · 0 sledujících

O prezentaci

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an ε-optimal policy with pro…

Organizátor

ICML 2023

Účet · 657 sledujících

Baví vás formát? Nechte SlidesLive zachytit svou akci!

Profesionální natáčení a streamování po celém světě.

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

How Jellyfish Characterise Alternating Group Equivariant Neural Networks

04:49

How Jellyfish Characterise Alternating Group Equivariant Neural Networks

Zhlédnout později

Oblíbené

Edward Pearce-Crump

ICML 2023 2 years ago

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

05:16

Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

Zhlédnout později

Oblíbené

Liam Hodgkinson, …

ICML 2023 2 years ago

Delayed Feedback in Kernel Bandits

05:20

Delayed Feedback in Kernel Bandits

Zhlédnout později

Oblíbené

Sattar Vakili, …

ICML 2023 2 years ago

Faster Rates of Convergence to Stationary Points in Differentially Private Optimization

05:25

Faster Rates of Convergence to Stationary Points in Differentially Private Optimization

Zhlédnout později

Oblíbené

Raman Arora, …

ICML 2023 2 years ago

TIDE: Time Derivative Diffusion for Deep Learning on Graphs

05:33

TIDE: Time Derivative Diffusion for Deep Learning on Graphs

Zhlédnout později

Oblíbené

Maysam Behmanesh, …

ICML 2023 2 years ago

Temporal Label Smoothing for Early Event Prediction

05:02

Temporal Label Smoothing for Early Event Prediction

Zhlédnout později

Oblíbené

Hugo Yèche, …

ICML 2023 2 years ago