Policy Aware Model Learning via Transition Occupancy Matching

Dec 2, 2022

Speakers

About

Model-based reinforcement learning (MBRL) is an effective paradigm for sample-efficient policy learning. The pre-dominant MBRL strategy iteratively learns the dynamics model by performing maximum likelihood (MLE) on the entire replay buffer and trains the policy using fictitious transitions from the learned model. Given that not all transitions in the replay buffer are equally informative about the task or the policy's current progress, this MLE strategy cannot be optimal and bears no clear relation to the standard RL objective. In this work, we propose Transition Occupancy Matching (TOM), a policy-aware model learning algorithm that maximizes a lower bound on the standard RL objective. TOM learns a policy-aware dynamics model by minimizing an f-divergence between the distribution of transitions that the current policy visits in the real environment and in the learned model; then, the policy can be updated using any pre-existing RL algorithm with log-transformed reward. TOM's practical implementation builds on tools from dual reinforcement learning and learns the optimal transition occupancy ratio between the current policy and the replay buffer; leveraging this ratio as importance weights, TOM amounts to performing MLE model learning on the correct, policy aware transition distribution. Crucially, TOM is a model learning sub-routine and is compatible with any backbone MBRL algorithm that implements MLE-based model learning. On the standard set of Mujoco locomotion tasks, we find TOM improves the learning speed of a standard MBRL algorithm and can reach the same asymptotic performance with as much as 50

Organizer

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022