Eloïse Berthier, Ziad Kobeissi, Francis Bach · A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning · SlidesLive

Categories

EN

Log in Get an estimate

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Dec 6, 2022

Speakers

About

Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Sharing

Recommended Videos

Presentations on similar topic, category or speaker