Nov 28, 2022
Speaker · 0 followers
Speaker · 2 followers
Speaker · 0 followers
Speaker · 0 followers
A central problem in online learning and decision making—from bandits to reinforcement learning—is to understand whatmodeling assumptions lead to sample-efficient learning guarantees. With a focus on stochastic environments, a recent line of research provides general structural conditions under which sample-efficient learning is possible, but robust learning guarantees for agnostic or adversarial settings have remained elusive. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show—via new upper and lower bounds—that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. (2021) in the stochastic counterpart to our setting, is both necessary and sufficient for low regret in the adversarial setting. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results—both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György.A central problem in online learning and decision making—from bandits to reinforcement learning—is to understand whatmodeling assumptions lead to sample-efficient learning guarantees. With a focus on stochastic environments, a recent line of research provides general structural conditions under which sample-efficient learning is possible, but robust learning guarantees for agnostic or adversarial settings have remained elusive. We consider a general adversarial decision making framework that enc…
Account · 952 followers
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Yangze Zhou, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Sungbin Shin, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Pierre-Cyril Aubin-Frankowski, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Tim Althoff, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%