Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson · Beyond No Regret: Instance-Dependent PAC Reinforcement Learning · SlidesLive

Kategorien

DE

Anmelden Kostenvoranschlag

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Jul 2, 2022

Sprecher:innen

Über

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying ϵ-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an ϵ-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this is not possible—there exists a fundamental tradeoff between achieving low regret and identifying an ϵ-optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity—yielding a complexity which scales with the suboptimality gaps and the “reachability” of a state. We show our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

Organisator

Über COLT

The conference is held annually since 1988 and has become the leading conference on Learning theory by maintaining a highly selective process for submissions. It is committed in high-quality articles in all theoretical aspects of machine learning and related topics.

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind