Dez 2, 2022
Sprecher:in · 0 Follower:innen
Sprecher:in · 17 Follower:innen
Sprecher:in · 0 Follower:innen
Sprecher:in · 0 Follower:innen
Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm is based on a Bayesian framework to model suboptimal expert actions and derives Q-values' update rules by maximizing the posterior probability. It weighs expert information by the uncertainty of learnt Q-values and avoids excessive reliance on expert data, gradually reducing the usage of uninformative expert data. Experimentally, we evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards. With the proposed methods, we can achieve better results than Deep Q-learning from Demonstrations (Hester et al., 2017) in most environments.Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm is based on a Bayesian framework to model suboptimal expert actions and derives Q-values' update rules by maximizing the posterior probability. It weighs expert information by the uncertainty of learnt Q-valu…
Konto · 962 Follower:innen
Professionelle Aufzeichnung und Livestreaming – weltweit.
Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Tianyi Lin, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Keyu Yan, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Peetak Mitra, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%
Hanbyul Lee, …
Ewigspeicher-Fortschrittswert: 0 = 0.0%