Standing Still Is Not An Option: Alternative Baselines for Attainable Utility Preservation

Jun 3, 2023

Speakers

About

The rapid development of machine learning and artificial intelligence in general has led to growing concerns about the potential impact of AI on society. Ensuring that AI systems behave safely and beneficially is a major challenge, particularly in the context of Reinforcement Learning, where an agent learns by interacting with an environment and receiving feedback in the form of rewards. Avoiding negative side-effects is one of those challenges, where the agent should not cause unintended harm while trying to achieve its primary objective. A promising way to accomplish this task in an implicit way without telling the agent what not to do, is Attainable Utility Preservation (AUP). AUP is a safe Reinforcement Learning approach that minimizes side-effects by optimizing for a primary reward function while preserving the ability to optimize auxiliary reward functions. However, AUP's applicability is limited to tasks where a no-op action (e.g., standing sill) is available in the agent's action space. Depending on the environment, this cannot always be guaranteed. To overcome this limitation, we introduce new baselines for AUP, which are applicable to environments with or without a no-op action in the agent's action space. We achieve this by regularizing the primary reward function in different ways with respect to auxiliary goals, depending on the used variation. This enables designers of environments to define simple reward functions, which then get extended by our introduced baselines to induce safer behavior. We evaluate all introduced variants on multiple AI safety gridworlds, which were specifically designed to test the agent's ability to solve a primary objective while avoiding negative side-effects. These effects include e.g., facing the agent in front of several options where only one solution without a side-effect is imminent, refraining from causing damage or interfering with the environment's dynamics, rescuing items without destroying them, or to learn how to mitigate delayed effects to some extent and to not complete the primary objective on purpose. We show how our approach induces safe, conservative, and effective behavior, even when a no-op action is not available for the agent. An additional benefit lies in the variation-based approach, which allows to consider multiple variants depending on the tasks to solve. In conclusion, our work addresses critical challenges in AI safety related to Reinforcement Learning and proposes an updated approach to achieve safe behavior implicitly by avoiding negative side-effects, contributing to the broader effort of designing safe and beneficial AI systems for the future.

Organizer

Categories

About Machine Learning Prague

Machines can learn. Incredibly fast. Faster than you. They are getting smarter and smarter every day. They are already changing your world, your business and your life. Artificial intelligence revolution is here. Come and learn how to turn this threat into your biggest opportunity. This is not another academic conference. Our goal is to foster discussion between machine learning practitioners and all people who are interested in applications of modern trends in artificial intelligence. You can look forward to inspiring people, algorithms, data, applications, workshops and a lot of fun during three days as well as at two great parties.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow Machine Learning Prague