# COLT 2016

Welcome to 2016 edition of the Conference On Learning Theory:

Columbia University

New-York City, USA

June 23-26, 2016

COLT is conveniently located close to ICML, which will take place in New-York immediately before with two days of overlap between the two conferences.

1. ## 1.3.1 Simple Bayesian Algorithms for Best Arm Identification

Daniel Russo

ABSTRACT. This paper considers the optimal adaptive allocation of measurement effort in order to identify the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality, with the goal of confidently identifying the best design after a small number of measurements.

We propose two simple Bayesian algorithms for adaptively allocating measurement effort. One, which we call Top-Two sampling, computes the two designs with the highest posterior probability of being optimal, and then randomizes to select among these two. The other algorithm we propose is a modified version of Thompson sampling that is tailored for identifying the best design.

We prove that these simple algorithms satisfy a strong optimality property. In a frequestist setting where the true quality of the designs is fixed, the posterior is said to be consistent if it correctly identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. We show that under the proposed algorithms this convergence occurs at an \emph{exponential rate}, and the corresponding exponent is the best possible among all allocation rules.

2. ## 1.3.2 Optimal Best Arm Identification with Fixed Confidence

Aurélien Garivier and Emilie Kaufmann

ABSTRACT. We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

3. ## 1.3.3 Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

Alexandra Carpentier and Andrea Locatelli

ABSTRACT. We consider the problem of \textit{best arm identification} with a \textit{fixed budget $T$}, in the $K$-armed stochastic bandit setting, with arms distribution defined on $[0,1]$. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded as $$\exp\Big(-\frac{T}{\log(K)H}\Big),$$ where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps. Our result disproves formally the general belief - coming from results in the fixed confidence setting - that there must exist an algorithm for this problem whose probability of error is upper bounded as $\exp(-T/H)$. This also proves that some existing strategies based on the Successive Rejection of the arms are optimal - closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

4. ## 1.3.4 Best-of-K Bandits

Max Simchowitz, Kevin Jamieson and Ben Recht

ABSTRACT. This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective is to identify the subset that achieves the highest expected reward with high probability using as few queries as possible. We present distribution-dependent lower bounds which force a learner to consider all N-choose-K subsets and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. Nevertheless, we present evidence that exhaustive search may be avoided for certain, favorable distributions because the influence of high-order order correlations may be dominated by lower order statistics. Finally, we present an algorithm and analysis for independent arms, which mitigates the surprising non-trivial information occlusion that occurs due to only observing the max in the subset. This may inform strategies for more general dependent measures, and we complement these result with independent-arm lower bounds.

5. ## 1.3.5 Pure Exploration of Multi-armed Bandit Under Matroid Constraints

Lijie Chen, Anupam Gupta and Jian Li

ABSTRACT. We study the pure exploration problem subject to a matroid constraint (Best-Basis) in a stochastic multi-armed bandit game. In a Best-Basis instance, we are given $n$ stochastic arms with unknown reward distributions, as well as a matroid $M$ over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of $M$ with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top-$k$ arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of Best-Basis, and provide algorithms with nearly-optimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top-$k$ arm identification problem and the combinatorial pure exploration problem (when the combinatorial constraint is a matroid).

6. ## 1.4.1 Maximin Action Identification: A New Bandit Framework for Games

Aurélien Garivier, Emilie Kaufmann and Wouter M. Koolen

ABSTRACT. We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.n