Dec 6, 2021
Speaker Β· 0 followers
Speaker Β· 0 followers
Speaker Β· 1 follower
We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through T episodes, with the goal of achieving πͺ(β(T)) regret when the losses are adversarial and simultaneously πͺ(log T) regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader (FTRL) framework together with a set of new techniques. Specifically, we first propose a loss-shifting trick in the FTRL analysis, which greatly simplifies the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by the regret itself in the stochastic setting, a key property to ensure πͺ(log T) regret.We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through T episodes, with the goal of achieving πͺ(β(T)) regret when the losses are adversarial and simultaneously πͺ(log T) regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regulβ¦
Account Β· 1.9k followers
Category Β· 10.8k presentations
Category Β· 2.4k presentations
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Jack Parker-Holder, β¦
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Ruibin Xiong, β¦
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Sham M. Kakade, β¦
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Andreea Deac, β¦
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Danial Tyulmankov, β¦
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%