Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard · Reinforcement Learning in Parametric MDPs with Exponential Families · SlidesLive

Categories

EN

Log in Get an estimate

Reinforcement Learning in Parametric MDPs with Exponential Families

Apr 14, 2021

Speakers

About

Extending model-based regret minimization strategies for Markov decision processes (MDPs) beyond discrete state-action spaces requires structural assumptions on the reward and transition models. Existing parametric approaches establish regret guarantees by making strong assumptions about either the state transition distribution or the value function as a function of state-action features, and often do not satisfactorily capture classical problems like linear dynamical systems or factored MDPs. This paper introduces a new MDP transition model defined by a collection of linearly parameterized exponential families with $d$ unknown parameters. For finite-horizon episodic RL with horizon $H$ in this MDP model, we propose a model-based upper confidence RL algorithm (Exp-UCRL) that solves a penalized maximum likelihood estimation problem to learn the $d$-dimensional representation of the transition distribution, balancing the exploitation-exploration tradeoff using confidence sets in the exponential family space. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order $\tilde O(d\sqrt{H^3 N})$, sub-linear in total time $N$, linear in dimension $d$, and polynomial in the planning horizon $H$. This is achieved by deriving a novel concentration inequality for conditional exponential families that might be of independent interest. The exponential family MDP model also admits an efficient posterior sampling-style algorithm for which a similar guarantee on the Bayesian regret is shown.

Organizer

Categories

About AISTATS 2021

The 24th International Conference on Artificial Intelligence and Statistics was held virtually from Tuesday, 13 April 2021 to Thursday, 15 April 2021.

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker