Dec 10, 2023
Sprecher:in · 0 Follower:innen
Sprecher:in · 0 Follower:innen
Sprecher:in · 0 Follower:innen
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. However, one of the major challenges in solving ACRL is to find valid actions that satisfy the constraints in each RL step. While adding a projection layer on top of the original policy network is a commonly used approach, it involves solving a mathematical program, either during training or in action execution, or both, which can result in longer training times and slower convergence. To address this issue, first, we leverage Hamiltonian Monte Carlo simulation to generate uniformly distributed valid actions that satisfy the constraints. Second, we approximate the distribution of these valid actions using a normalizing flow model, which can transform a simple distribution, such as a Gaussian or uniform distribution, into a complex one. Third, we integrate the learned normalizing flow model with DDPG algorithm, where the policy network outputs an element of the simple distribution used in the normalizing flow, and the element is transformed into a valid action via the normalizing flow. By design, a well-trained and perfect normalizing flow model will always transform an element into a valid action, and solving a mathematical program is only required when a valid action is not transformed successfully. Finally, we demonstrate that our our framework significantly outperforms previous ACRL algorithms in terms of sample efficiency and convergence speed on a variety of continuous control tasks.Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. However, one of the major challenges in solving ACRL is to find valid actions that satisfy the constraints in each RL step. While adding a projection layer on top of the original policy network is a commonly used approach, it involves solving a mathematical program, either during training or in action execution, or both, which can result in…
Konto · 645 Follower:innen
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Haoran He, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Mirac Suzgun, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Nouha Dziri, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%