Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

10. Prosinec 2023

Řečníci

O prezentaci

Many problems in Reinforcement Learning (RL) have an optimal policy that is stochastic; these include problems in randomized allocation of resource such as placement of security resources, emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large. Existing RL methods do not perform well in such large categorical action spaces. Also, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. In this work, we address these issues by (1) using a (state) conditional normalizing flow to compactly represent the stochastic policy; the compactness arises due to the network only producing one sampled action and log probability of the action, which is then used by an actor-critic method. (2) using an invalid action rejection method (by using a valid action oracle) to modify the base policy. The action rejection is enabled by a modified policy gradient that we derive. We show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state in our experiments.

Organizátor

Uložení prezentace

Měla by být tato prezentace uložena po dobu 1000 let?

Jak ukládáme prezentace

Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího

Zajímají Vás podobná videa? Sledujte NeurIPS 2023