Constrained Imitation Q-learning with Earth Mover’s Distance reward

Dec 2, 2022

Speakers

About

We propose constrained Earth Mover's Distance (CEMD) Imitation Q-learning that combines the exploration power of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL). Sample efficiency makes Imitation Q-learning a suitable approach for robot learning. For Q-learning, immediate rewards can be efficiently computed by a greedy variant of Earth Mover's Distance (EMD) between the observed state-action pairs and state-actions in stored expert demonstrations. In CEMD, we constrain the otherwise non-stationary greedy EMD reward by proposing a greedy EMD upper bound estimate and a generic Q-learning lower bound. In PyBullet continuous control benchmarks, CEMD is more sample efficient, achieves higher performance and yields less variance than its competitors.

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022