24. července 2023
Sprecher:in · 0 Follower:innen
Sprecher:in · 0 Follower:innen
Sprecher:in · 1 Follower:in
Sprecher:in · 0 Follower:innen
Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterizes diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task. Despite encouraging results, the program policies that LEAPS can produce are limited by the distribution of the pre-generated program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, which fails to reward correct parts of programs and penalize incorrect program parts precisely. To address these issues, we propose to learn a meta-policy that can compose programs sampled from the learned program embedding space. By composing programs, our proposed method can produce program policies that describe out-of-distributionally complex behaviors and directly assign credits to programs that induce desired behaviors. Experimental results in the Karel domain show that our proposed framework outperforms baselines. Ablation studies justify our design choices, including the reinforcement learning algorithms used to learn the meta-policy, and the dimensionality of the program embedding space.Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterizes diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task. Despite encouraging results, the program policies that LEAPS can produce are limited by the…
Profesionální natáčení a streamování po celém světě.
Prezentace na podobné téma, kategorii nebo přednášejícího
Julius Kunze, …
EungGu Yun, …