Optimal algorithm for solving n-armed bandit problems?

Question

I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret.

Is there a known optimal algorithm for solving the n-armed bandit problem? Is there a choice of algorithm that seems to perform best in practice?

Presumably there is not a recognised optimal solution, as otherwise the [Wikipedia page](http://en.wikipedia.org/wiki/Multi-armed_bandit) would say so and there would not be an experimental [Sourceforge page](http://sourceforge.net/projects/bandit/) — Henry, May 11 '11 at 20:17
@mbq since reinforcement learning is a branch of machine learning, I don't think so ;) — mlwida, May 12 '11 at 14:18

mlwida · Accepted Answer · 2011-05-13T08:51:01.957

Here are two survey papers I have found recently. I have not read them yet, but the abstracts sound promising.

Joann`s Vermorel and Mehryar Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation (2005)

From the abstract:

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

Volodymyr Kuleshov and Doina Precup: Algorithms for the multi-armed bandit problem (2000) From the abstract:

Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identiﬁes for each algorithm the settings where it performs well, and the settings where it performs poorly.

Optimal algorithm for solving n-armed bandit problems?

1 Answers1