Refined Regret for Adversarial MDPs with Linear Function Approximation

Jul 24, 2023

Speakers

About

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over K episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021) is of order π’ͺΜƒ(K^2/3) (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to π’ͺΜƒ(√(K)) in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the log-barrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced loss estimator, further removing the polynomial dependency on the number of actions in the first algorithm and leading to the optimal regret bound (up to logarithmic terms and dependency on the horizon). Moreover, we also extend the first algorithm to simulator-free linear MDPs, which achieves π’ͺΜƒ(K^8/9) regret and greatly improves over the best existing bound π’ͺΜƒ(K^14/15). This algorithm relies on a better alternative to the Matrix Geometric Resampling procedure by Neu Olkhovskaya (2020), which could again be of independent interest.

Organizer

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow ICML 2023