Dez 6, 2021
Řečník · 0 sledujících
Řečník · 0 sledujících
Řečník · 5 sledujících
We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an Õ(√(d^3H^3T) + dHT/B) regret, where d is the dimension of the feature mapping, H is the episode length, T is the number of interactions and B is the number of batches. Our result suggests that it suffices to use only √(T/dH) batches to obtain Õ(√(d^3H^3T)) regret. For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an Õ(√(d^3H^3T[1+T/(dH)]^dH/B)) regret, which implies that dHlog T policy switches suffice to obtain the Õ(√(d^3H^3T)) regret. Our algorithms achieve the same regret as the LSVI-UCB algorithm <cit.>, yet with a substantially smaller amount of adaptivity. We also establish a lower bound for the batch learning model, which suggests that the dependency on B in our regret bound is tight.We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-U…
Účet · 1,9k sledujících
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professionelle Aufzeichnung und Livestreaming – weltweit.
Präsentationen, deren Thema, Kategorie oder Sprecher:in ähnlich sind
Trent Kyono, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Dan Jarrett, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Pro uložení prezentace do věčného trezoru hlasovalo 2 diváků, což je 0.2 %
Tanya Roosta, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %
Yiheng Lin, …
Pro uložení prezentace do věčného trezoru hlasovalo 0 diváků, což je 0.0 %