Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Jul 2, 2022

Sprecher:innen

Über

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying ϵ-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an ϵ-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show this is not possible—there exists a fundamental tradeoff between achieving low regret and identifying an ϵ-optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity—yielding a complexity which scales with the suboptimality gaps and the “reachability” of a state. We show our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

Organisator

Über COLT

The conference is held annually since 1988 and has become the leading conference on Learning theory by maintaining a highly selective process for submissions. It is committed in high-quality articles in all theoretical aspects of machine learning and related topics.

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Ewigspeicher-Fortschrittswert: 0 = 0.0%

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? COLT folgen