Nov 28, 2022
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream offline RL, where the agent is given a new task sharing the same representation as the upstream tasks and an offline dataset, and aims to find a near-optimal policy. We develop a sample-efficient algorithm with the suboptimality gap bounded by the estimation error of the learned representation in the upstream plus a vanishing term that decreases as the number of offline samples becomes large. Our result further captures the benefit of employing the learned representation from upstream training as opposed to learning the representation of the low-rank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL.As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL…
Account · 954 followers
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Lucio Dery, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Dongze Lian, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Zhaolin Li, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Percy Liang, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Sarah Sachs, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%