
1.3.1 Simple Bayesian Algorithms for Best Arm Identification
Daniel Russo
ABSTRACT. This paper considers the optimal adaptive allocation of measurement effort in order to identify the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality, with the goal of confidently identifying the best design after a small number of measurements.
We propose two simple Bayesian algorithms for adaptively allocating measurement effort. One, which we call TopTwo sampling, computes the two designs with the highest posterior probability of being optimal, and then randomizes to select among these two. The other algorithm we propose is a modified version of Thompson sampling that is tailored for identifying the best design.
We prove that these simple algorithms satisfy a strong optimality property. In a frequestist setting where the true quality of the designs is fixed, the posterior is said to be consistent if it correctly identifies the optimal design, in the sense that that the posterior probability assigned to the event that some other design is optimal converges to zero as measurements are collected. We show that under the proposed algorithms this convergence occurs at an \emph{exponential rate}, and the corresponding exponent is the best possible among all allocation rules.

1.3.2 Optimal Best Arm Identification with Fixed Confidence
Aurélien Garivier and Emilie Kaufmann
ABSTRACT. We provide a complete characterization of the complexity of bestarm identification in oneparameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'TrackandStop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

1.3.3 Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem
Alexandra Carpentier and Andrea Locatelli
ABSTRACT. We consider the problem of \textit{best arm identification} with a \textit{fixed budget $T$}, in the $K$armed stochastic bandit setting, with arms distribution defined on $[0,1]$. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded as $$\exp\Big(\frac{T}{\log(K)H}\Big),$$ where $H$ is the sum for all suboptimal arms of the inverse of the squared gaps. Our result disproves formally the general belief  coming from results in the fixed confidence setting  that there must exist an algorithm for this problem whose probability of error is upper bounded as $\exp(T/H)$. This also proves that some existing strategies based on the Successive Rejection of the arms are optimal  closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

1.3.4 BestofK Bandits
Max Simchowitz, Kevin Jamieson and Ben Recht
ABSTRACT. This paper studies the BestofK Bandit game: At each time the player chooses a subset S among all NchooseK possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective is to identify the subset that achieves the highest expected reward with high probability using as few queries as possible. We present distributiondependent lower bounds which force a learner to consider all NchooseK subsets and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. Nevertheless, we present evidence that exhaustive search may be avoided for certain, favorable distributions because the influence of highorder order correlations may be dominated by lower order statistics. Finally, we present an algorithm and analysis for independent arms, which mitigates the surprising nontrivial information occlusion that occurs due to only observing the max in the subset. This may inform strategies for more general dependent measures, and we complement these result with independentarm lower bounds.

1.3.5 Pure Exploration of Multiarmed Bandit Under Matroid Constraints
Lijie Chen, Anupam Gupta and Jian Li
ABSTRACT. We study the pure exploration problem subject to a matroid constraint (BestBasis) in a stochastic multiarmed bandit game. In a BestBasis instance, we are given $n$ stochastic arms with unknown reward distributions, as well as a matroid $M$ over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of $M$ with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top$k$ arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of BestBasis, and provide algorithms with nearlyoptimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top$k$ arm identification problem and the combinatorial pure exploration problem (when the combinatorial constraint is a matroid).

1.4.1 Maximin Action Identification: A New Bandit Framework for Games
Aurélien Garivier, Emilie Kaufmann and Wouter M. Koolen
ABSTRACT. We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixedconfidence setting: MaximinLUCB, based on lower and upper confidence bounds; and MaximinRacing, which operates by successively eliminating the suboptimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.n