Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-010-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-010-alpha.b-cdn.net
      • sl-yoda-v2-stream-010-beta.b-cdn.net
      • 1759419103.rsc.cdn77.org
      • 1016618226.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

            Jul 24, 2023

            Sprecher:innen

            TK

            Toshinori Kitamura

            Řečník · 0 sledujících

            TK

            Tadashi Kozuno

            Řečník · 0 sledujících

            YT

            Yunhao Tang

            Řečník · 0 sledujících

            Über

            Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an ε-optimal policy with pro…

            Organisator

            I2
            I2

            ICML 2023

            Účet · 657 sledujících

            Gefällt euch das Format? Vertraut auf SlidesLive, um euer nächstes Event festzuhalten!

            Professionelle Aufzeichnung und Livestreaming – weltweit.

            Freigeben

            Empfohlene Videos

            Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

            How Jellyfish Characterise Alternating Group Equivariant Neural Networks
            04:49

            How Jellyfish Characterise Alternating Group Equivariant Neural Networks

            Edward Pearce-Crump

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes
            05:16

            Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

            Liam Hodgkinson, …

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Delayed Feedback in Kernel Bandits
            05:20

            Delayed Feedback in Kernel Bandits

            Sattar Vakili, …

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Faster Rates of Convergence to Stationary Points in Differentially Private Optimization
            05:25

            Faster Rates of Convergence to Stationary Points in Differentially Private Optimization

            Raman Arora, …

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            TIDE: Time Derivative Diffusion for Deep Learning on Graphs
            05:33

            TIDE: Time Derivative Diffusion for Deep Learning on Graphs

            Maysam Behmanesh, …

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Temporal Label Smoothing for Early Event Prediction
            05:02

            Temporal Label Smoothing for Early Event Prediction

            Hugo Yèche, …

            I2
            I2
            ICML 2023 2 years ago

            Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

            Interessiert an Vorträgen wie diesem? ICML 2023 folgen