Identifying the Reward Function using Anchor Actions

Jul 12, 2020



We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. Our method sequentially estimates the policy, the Q-function, and the reward. We refer to it as the PQR method. This method does not require the assumption that the reward depends on the state only, but instead allows also for dependency on the choice of action. Moreover, the method allows for the state transitions to be stochastic. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the reward estimator of PQR uniquely recovers the true reward. With unknown transitions, convergence analysis is presented for the PQR method. Finally, we apply PQR to both synthetic and real-world datasets, demonstrating superior performance in terms of reward estimation compared to competing methods.



About ICML 2020

The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker