13

I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret.

Is there a known optimal algorithm for solving the n-armed bandit problem? Is there a choice of algorithm that seems to perform best in practice?

Artem Kaznatcheev
  • 2,569
  • 1
  • 20
  • 38
JS01
  • 168
  • 6
  • Presumably there is not a recognised optimal solution, as otherwise the [Wikipedia page](http://en.wikipedia.org/wiki/Multi-armed_bandit) would say so and there would not be an experimental [Sourceforge page](http://sourceforge.net/projects/bandit/) – Henry May 11 '11 at 20:17
  • Shouldn't this be on Theoretical Computer Science SE? –  May 12 '11 at 06:31
  • 1
    @mbq since reinforcement learning is a branch of machine learning, I don't think so ;) – mlwida May 12 '11 at 14:18
  • @steffen Sure, the name seemed "tcsy". –  May 12 '11 at 14:48
  • @mbq I don't get it. What does "tscy" mean ? – mlwida May 13 '11 at 08:28
  • @steffen: TCS-y (Theoretical Computer Science)-y. – sophros Sep 02 '19 at 15:40

1 Answers1

9

Here are two survey papers I have found recently. I have not read them yet, but the abstracts sound promising.

Joann`s Vermorel and Mehryar Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation (2005)

From the abstract:

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

Volodymyr Kuleshov and Doina Precup: Algorithms for the multi-armed bandit problem (2000) From the abstract:

Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identifies for each algorithm the settings where it performs well, and the settings where it performs poorly.

mlwida
  • 9,922
  • 2
  • 45
  • 74