Regret Circuits: Composability of Regret Minimizers Regret minimization is a powerful tool for solving large-scale problems; it was recently used in breakthrough results for large-scale extensive-form game solving. This was achieved by composing simplex regret minimizers into an overall regret-minimization framework for extensive-form game strategy spaces. In this paper we study the general composability of regret minimizers. We derive a calculus for constructing regret minimizers for composite convex sets that are obtained from convexity-preserving operations on simpler convex sets. We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set. As one application, we show that the CFR framework can be constructed easily from our framework. We also show ways to include curtailing (constraining) operations into our framework. For one, they enables the construction of CFR generalization for extensive-form games with general convex strategy constraints that can cut across decision points. Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function Computing the Nash equilibrium (NE) of multi-player games has witnessed renewed interest due to the recent advances in generative adversarial networks. For non-convex optimization formulations involving such games, a technique to analyze the quality of a solution is to resort to merit functions, among which the Nikaido-Isoda (NI) function is a suitable choice -- this function is positive and goes to zero only when each player is at their optima. However, computing equilibrium conditions efficiently is challenging. To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves: (i) as a merit function, vanishing only at the first order stationary points of each player's optimization problem, and (ii) provides error bounds to a first-order NE. Specifically, for bilinear min-max games and multi-player quadratic games, a reformulation using the GNI function results in convex optimization objectives -- the application of gradient descent on such reformulations yield linear convergence to an NE. For the general case, we provide conditions under which the gradient descent provides sub-linear convergence. We show experiments on different multi-player game settings, demonstrating the effectiveness of our scheme against other popular game-theoretic optimization methods. Stable-Predictive Optimistic Counterfactual Regret Minimization The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice. However, the theoretical rate at which past CFR-based algorithms converge to the Nash equilibrium is on the order of O(T−1/2), where T is the number of iterations. In contrast, first-order methods can be used to achieve a O(T−1) dependence on iterations, yet these methods have been less successful in practice. In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage optimistic'' regret minimizers to achieve a O(T−3/4) convergence rate within the CFR framework. When Samples Are Strategically Selected In standard classification problems, the assumption is that the entity making the decision (the {\em principal}) has access to {\em all} the samples. However, in many contexts, she either does not have direct access to the samples, or can inspect only a limited set of samples and does not know which are the most relevant ones. In such cases, she must rely on another party (the {\em agent}) to either provide the samples or point out the most relevant ones. If the agent has a different objective, then the principal cannot trust the submitted samples to be representative. She must set a {\em policy} for how she makes decisions, keeping in mind the agent's incentives. In this paper, we introduce a theoretical framework for this problem and provide key structural and computational results. Statistical Foundations of Virtual Democracy Virtual democracy is an approach to automating decisions, by learning models of the preferences of individual people, and, at runtime, aggregating the predicted preferences of those people on the dilemma at hand. One of the key questions is which aggregation method -- or voting rule -- to use; we offer a novel statistical viewpoint that provides guidance. Specifically, we seek voting rules that are robust to prediction errors, in that their output on people's true preferences is likely to coincide with their output on noisy estimates thereof. We prove that the classic Borda count rule is robust in this sense, whereas any voting rule belonging to the wide family of pairwise-majority consistent rules is not. Our empirical results further support, and more precisely measure, the robustness of Borda count. Optimal Auctions through Deep Learning Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981. Even after 30-40 years of intense research the problem remains unsolved for seemingly simple multi-bidder, multi-item settings. In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions. We model an auction as a multi-layer neural network, frame optimal auction design as a constrained learning problem, and show how it can be solved using standard pipelines. We prove generalization bounds and present extensive experiments, recovering essentially all known analytical solutions for multi-item settings, and obtaining novel mechanisms for settings in which the optimal mechanism is unknown. Learning to Clear the Market The problem of market clearing is to set a price for an item such that quantity demanded equals quantity supplied. In this work, we cast the problem of predicting clearing prices into a learning framework and apply the resulting models to the problem of revenue optimization in auctions and markets with contextual information. The economic intuition behind market clearing allows us to obtain fine-grained control over the aggressiveness of the resulting pricing policy, grounded in theory. To evaluate our approach, we fit a model of clearing prices over a massive dataset of bids in display ad auctions from a major ad exchange. The learned prices outperform other techniques in the literature in terms of revenue and efficiency trade-offs. Because of the convex nature of the clearing loss function, our method has very fast convergence, as fast as linear regression over the same dataset. Learning to bid in revenue-maximizing auctions We consider the problem of the optimization of bidding strategies in prior-dependent revenue-maximizing auctions, when the seller fixes the reserve prices based on the bid distributions. Our study is done in the setting where one bidder is strategic. Using a variational approach, we study the complexity of the original objective and we introduce a relaxation of the objective functional in order to use gradient descent methods. Our approach is simple, general and can be applied to various value distributions and revenue-maximizing mechanisms. The new strategies we derive yield massive uplifts compared to the traditional truthfully bidding strategy. Open-ended learning in symmetric zero-sum games Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them winner' andloser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSROrN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSROrN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives. Deep Counterfactual Regret Minimization Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large imperfect-information games. It iteratively traverses the game tree in order to converge to a Nash equilibrium. In order to deal with extremely large games, CFR typically uses domain-specific heuristics to simplify the target game in a process known as abstraction. This simplified game is solved with tabular CFR, and its solution is mapped back to the full game. This paper introduces DeepRegret, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that DeepRegret is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games.

Watch SlidesLive on mobile devices

© SlidesLive Inc.