Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization

Dec 2, 2022

Speakers

About

Existence of policy-value representation asymmetry negatively affects the generalization capability of the traditional actor-critic architecture that uses a shared representation of policy and value. Fully separated networks for policy and value avoid overfitting by addressing this representation asymmetry. However, two separate networks introduce high computational overhead. Previous work has also shown that partial separation can achieve the same level of generalization in most tasks while reducing this computational overhead. Thus, the questions arise Do we really need two separate networks? Is there any particular scenario where only full separation works? In this work, we attempt to analyze the generalization performance compared to the extent of decoupling. We compare four different degrees of subnetwork separation, namely: fully shared; early separated, lately separated, and fully separated on the RL generalization benchmark Procgen, a suite of 16 procedurally-generated environments. We show that unless the environment has a distinct or explicit source of value estimation, partial separation can easily capture the necessary policy-value representation asymmetry and achieve better generalization performance in unseen scenarios.

Organizer

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022