MOPA: a Minimalist Off-Policy Approach to Safe-RL

Dec 2, 2022

Speakers

About

Safety is one of the crucial concerns for the real-world application of reinforcement learning (RL). Previous works consider the safe exploration problem as Constrained Markov Decision Process (CMDP), where the policies are being optimized under constraints. However, when encountering any potential danger, human tends to stop immediately and rarely learns to behave safely in danger. Moreover, the off-policy learning nature of humans guarantees high learning efficiency in risky tasks. Motivated by human learning, we introduce a Minimalist Off-Policy Approach (MOPA) to address Safe-RL problem. We first define the Early Terminated MDP (ET-MDP) as a special type of MDPs that has the same optimal value function as its CMDP counterpart. An off-policy learning algorithm MOPA based on recurrent models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP. Experiments on various Safe-RL tasks show a substantial improvement over previous methods that directly solve CMDP, in terms of higher asymptotic performance and better learning efficiency.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022