Jul 12, 2020
We propose a reward function estimation framework for inverse reinforcement learning with deep energy-based policies. Our method sequentially estimates the policy, the Q-function, and the reward. We refer to it as the PQR method. This method does not require the assumption that the reward depends on the state only, but instead allows also for dependency on the choice of action. Moreover, the method allows for the state transitions to be stochastic. To accomplish this, we assume the existence of one anchor action whose reward is known, typically the action of doing nothing, yielding no reward. We present both estimators and algorithms for the PQR method. When the environment transition is known, we prove that the reward estimator of PQR uniquely recovers the true reward. With unknown transitions, convergence analysis is presented for the PQR method. Finally, we apply PQR to both synthetic and real-world datasets, demonstrating superior performance in terms of reward estimation compared to competing methods.
The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning. ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker