A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Mär 28, 2022

Sprecher:innen

Über

Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramér distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramér and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramér distance.

Organisator

Über AISTATS 2022

AISTATS is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas. Since its inception in 1985, the primary goal of AISTATS has been to broaden research in these fields by promoting the exchange of ideas among them. We encourage the submission of all papers which are in keeping with this objective at AISTATS.

Präsentation speichern

Soll diese Präsentation für 1000 Jahre gespeichert werden?

Wie speichern wir Präsentationen?

Ewigspeicher-Fortschrittswert: 0 = 0.0%

Freigeben

Empfohlene Videos

Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind

Interessiert an Vorträgen wie diesem? AISTATS 2022 folgen