Dec 6, 2021
Speaker · 0 followers
Speaker · 0 followers
Speaker · 0 followers
Speaker · 2 followers
In this paper we consider running N instances of the adversarial bandit problem in parallel. Specifically, the learning process happens over a sequence of T trials: on each trial t Learner is given a user u_t, out of a set of N users, and has to select an action a_t, out of a set of K actions. After the selection of a_t its loss l_a_t,t is revealed to Learner. The goal of Learner is to minimise the sum of the losses l_a_t,t across all trials t. The bandit model is adversarial in the sense that Nature (i.e. Learner's adversary) is completely free to choose the user u_t and the loss l_a,t of action a at trial t arbitrarily from [0,1] at the start of the learning process. In our model the users are in a social network and Learner is aided by a-priori knowledge of the strengths of the social links between all pairs of users. It is assumed that if the social link between two users is strong then an action that is particularly good (that is: has low losses) for one of the users is likely to be good for the other. However, this may not be the case as Nature has complete freedom over its choice of losses. We measure the performance of Learner by the notion of regret. In this multi-user setting the regret is measured relative to an arbitrary function y which maps users to actions. The regret is the difference between the total loss of Learner and its total loss if it were to choose action y(u_t) on each trial t. The naturalness of such a function y is measured by its robustified resistance weighted cutsize Ψ, which is essentially the resistance weighted cutsize of y in the social network but made tolerant to noise. Our learning strategies are biased towards more natural functions in that lower values of Ψ lead to lower regret bounds. We give two very efficient randomized algorithms, GABA1 and GABA2, for Learner: GABA1 has an expected regret bound of 𝒪(√(ln(NK/Ψ)Ψ KT)) and has a per-trial time complexity of 𝒪(Kln(N)) whilst GABA2 has a weaker expected regret bound of 𝒪(√(ln(N/Ψ)ln(NK/Ψ)Ψ KT)) but is faster: having a per-trial time complexity of 𝒪(ln(K)ln(N)).In this paper we consider running N instances of the adversarial bandit problem in parallel. Specifically, the learning process happens over a sequence of T trials: on each trial t Learner is given a user u_t, out of a set of N users, and has to select an action a_t, out of a set of K actions. After the selection of a_t its loss l_a_t,t is revealed to Learner. The goal of Learner is to minimise the sum of the losses l_a_t,t across all trials t. The bandit model is adversarial in the sense that N…
Account · 1.9k followers
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker
Yaoqing Yang, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Cody Coleman, …
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%