Nov 28, 2022
Speaker · 0 followers
Speaker · 0 followers
When an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on immutable preferences, but also on their prior engagements with the platform. The Recommender's primary objective is typically to encourage content consumption which optimizes some reward, such as ad revenue, but they often additionally aim to ensure that a sufficiently wide variety of content is consumed by the Agent over time. We formalize this problem as an adversarial bandit task. At each step, the Recommender presents a menu of k (out of n) items to the Agent, who selects one item in the menu according to their unknown preference model, which maps their history of past items to relative selection probabilities. The Recommender then observes the Agent's selected item and receives bandit feedback of the item's (adversarial) reward. In addition to optimizing reward from the selected items at each step, the Recommender must also ensure that the total distribution of chosen items has sufficiently high entropy. We define a class of preference models which are locally learnable, i.e. behavior over the entire domain can be estimated by only observing behavior in a small region; this includes models representable by bounded-degree polynomials as well as functions with a sparse Fourier basis. For this class, we give an algorithm for the Recommender which obtains O(T^3/4) regret against all item distributions satisfying two conditions: they are sufficiently diversified, and they are instantaneously realizable at any history by some distribution over menus. We show that these conditions are closely connected: all sufficiently high-entropy distributions are instantaneously realizable at any history of selected items. We also give a set of negative results justifying our assumptions, in the form of a runtime lower bound for non-local learning and linear regret lower bounds for alternate benchmarks.When an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on immutable preferences, but also on their prior engagements with the platform. The Recommender's primary objective is typically to encourage content consumption which optimizes some reward, such as ad revenue, but they often additionally aim to ensure that a sufficiently wide variety of content is consumed by the Agent over time. We formalize this problem as an adversarial bandi…
Account · 953 followers
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Linfeng Zhao, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Ofer Yehuda, …
Total of 1 viewers voted for saving the presentation to eternal vault which is 0.1%
Alex Hanna, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Ben Dixon, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%