Next
Livestream will start soon!
Livestream has already ended.
Presentation has not been recorded yet!
  • title: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
      0:00 / 0:00
      • Report Issue
      • Settings
      • Playlists
      • Bookmarks
      • Subtitles Off
      • Playback rate
      • Quality
      • Settings
      • Debug information
      • Server sl-yoda-v2-stream-010-alpha.b-cdn.net
      • Subtitles size Medium
      • Bookmarks
      • Server
      • sl-yoda-v2-stream-010-alpha.b-cdn.net
      • sl-yoda-v2-stream-010-beta.b-cdn.net
      • 1759419103.rsc.cdn77.org
      • 1016618226.rsc.cdn77.org
      • Subtitles
      • Off
      • English
      • Playback rate
      • Quality
      • Subtitles size
      • Large
      • Medium
      • Small
      • Mode
      • Video Slideshow
      • Audio Slideshow
      • Slideshow
      • Video
      My playlists
        Bookmarks
          00:00:00
            Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
            • Settings
            • Sync diff
            • Quality
            • Settings
            • Server
            • Quality
            • Server

            Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

            Jul 24, 2023

            Speakers

            TK

            Toshinori Kitamura

            Speaker · 0 followers

            TK

            Tadashi Kozuno

            Speaker · 0 followers

            YT

            Yunhao Tang

            Speaker · 0 followers

            About

            Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function approximation in practice, the theoretical understanding of MDVI has been limited to tabular Markov decision processes (MDPs). We study MDVI with linear function approximation through its sample complexity required to identify an ε-optimal policy with pro…

            Organizer

            I2
            I2

            ICML 2023

            Account · 650 followers

            Like the format? Trust SlidesLive to capture your next event!

            Professional recording and live streaming, delivered globally.

            Sharing

            Recommended Videos

            Presentations on similar topic, category or speaker

            On the stepwise nature of SSL
            05:11

            On the stepwise nature of SSL

            Jamie Simon, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Towards Deep Attention in GNNs: Problems and Remedies
            05:11

            Towards Deep Attention in GNNs: Problems and Remedies

            Soo Yong Lee, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Regions of Reliability In the Evaluation of Multivariate Probabilistic Forecasts
            03:36

            Regions of Reliability In the Evaluation of Multivariate Probabilistic Forecasts

            Étienne Marcotte, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Adversarial Training Should Be Cast as a Non-Zero-Sum Game
            10:50

            Adversarial Training Should Be Cast as a Non-Zero-Sum Game

            Alex Robey, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Last Switch Dependent Bandits with Monotone Payoff Functions
            05:06

            Last Switch Dependent Bandits with Monotone Payoff Functions

            Ayoub Foussoul, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
            05:11

            Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

            Tal Lancewicki, …

            I2
            I2
            ICML 2023 2 years ago

            Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

            Interested in talks like this? Follow ICML 2023