24. července 2023
Řečník · 0 sledujících
Řečník · 5 sledujících
We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an 𝒪̃(d B_* √(K)) regret bound, where d is the dimension of the feature mapping in the linear transition kernel, B_* is the upper bound of the total cumulative cost for the optimal policy, and K is the number of episodes. Our regret upper boundmatches the Ω(dB_*√(K)) lower bound of Linear Mixture SSPs , which suggests that our algorithm is nearly minimax optimal.We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extende…
Profesionální natáčení a streamování po celém světě.
Prezentace na podobné téma, kategorii nebo přednášejícího
Sanmi Koyejo, …
Chuqin Geng, …
Tianxin Wei, …