Questions tagged [multiarmed-bandit]

A problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation.

See https://en.wikipedia.org/wiki/Multi-armed_bandit.

An excerpt from there: A problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice.[3][4] This is a classic reinforcement learning problem that exemplifies the exploration-exploitation trade-off dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine.[5] The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

164 questions

votes

3 answers

Best bandit algorithm?

The most well-known bandit algorithm is upper confidence bound (UCB) which popularized this class of algorithms. Since then I presume there are now better algorithms. What is the current best algorithm (in terms of either empirical performance or…

asked Feb 16 '12 at 23:06

yeewhye

votes

4 answers

In what kind of real-life situations can we use a multi-arm bandit algorithm?

Multi-arm bandits work well in situation where you have choices and you are not sure which one will maximize your well being. You can use the algorithm for some real life situations. As an example, learning can be a good field: If a kid is…

algorithms reinforcement-learning multiarmed-bandit

asked Aug 18 '16 at 15:22

Andy K

votes

2 answers

What is Thompson Sampling in layman's terms?

I am unable to understand Thompson Sampling and how it works. I was reading about Multi Arm Bandit and after reading Upper Confidence Bound Algorithm, many text suggested that Thompson Sampling performs better than UCB. What is Thompson Sampling, in…

machine-learning definition multiarmed-bandit

asked Dec 16 '15 at 14:26

dejavu

votes

1 answer

Cost functions for contextual bandits

I'm using vowpal wabbit to solve a contextual-bandit problem. I'm showing ads to users, and I have a fair bit of information about the context in which the ad is shown (e.g. who the user is, what site they're on, etc.). This seems to be a pretty…

multinomial-distribution gradient-descent multiarmed-bandit vowpal-wabbit contextual-bandit

asked Apr 02 '14 at 15:47

Zach

22,308
18
114
158

votes

1 answer

Optimal algorithm for solving n-armed bandit problems?

I've read about a number of algorithms for solving n-armed bandit problems like $\epsilon$-greedy, softmax, and UCB1, but I'm having some trouble sorting through what approach is best for minimizing regret. Is there a known optimal algorithm for…

machine-learning reinforcement-learning multiarmed-bandit

asked May 11 '11 at 19:57

JS01

votes

1 answer

Multi armed bandit for general reward distribution

I'm working on a multi-armed bandit problem where we do not have any information about the reward distribution. I have found many papers which guarantee regret bounds for a distribution with known bound, and for general distributions with support…

references multiarmed-bandit

asked Jun 28 '13 at 22:32

guest

votes

2 answers

Etymology of multi-armed bandit

I'm studying Reinforcement Learning, and have come across multi-armed bandits. Why are these called bandits? And why are they armed?

reinforcement-learning multiarmed-bandit

asked May 18 '20 at 13:04

Tom Hale

2,231
3
13
31

votes

3 answers

Multi-armed bandit algorithms vs Uplift modeling

Multi-Armed Bandit: http://en.wikipedia.org/wiki/Multi-armed_bandit Uplift Modeling: http://en.wikipedia.org/wiki/Uplift_modelling How are these two approaches different? How are they similar? Is one better than the other? Edit: If an example…

multiarmed-bandit

asked Mar 10 '14 at 02:00

AMathew

1,000
12
18

votes

2 answers

How is the Upper Confidence Bound derived?

I came across the formula for obtaining the upper confidence bounds on the k-armed bandit problem: $$c\sqrt{\frac{\text{ln} N_i}{n_i}}$$ where $n_i$ is the number of samples we have for this particular bandit and $N_i$ is the total amount of samples…

machine-learning confidence-interval reinforcement-learning multiarmed-bandit

asked Jan 19 '18 at 03:20

chessprogrammer

votes

1 answer

Gradient Bandit Algorithm baseline

I am reading Sutton's latest draft of "Reinforcement learning, an introduction" and I came to the Gradient Bandit Algorithm (page 29). I am having a bit of trouble understanding how the baseline should be computed. The book says: "$\bar{R_t} ∈…

gradient-descent reinforcement-learning multiarmed-bandit stochastic-gradient-descent

asked Jan 25 '18 at 05:34

bones.felipe

votes

0 answers

Confidence Interval for least squares estimator

There was a paper by Yasin-Abbasi-Yadkori https://arxiv.org/pdf/1102.2670.pdf titled Online Least Squares Estimation with Self-Normalized Processes. I am trying to give a brief context before asking my question. Let $\{x_t, y_t\}$ be data not…

regression confidence-interval least-squares multiarmed-bandit ellipse

asked Jun 19 '21 at 22:20

rostader

votes

1 answer

Thompson Sampling

I read on Wikipedia that Thompson sampling consists in playing the action ${\displaystyle a \in {\mathcal {A}}}$ according to the probability that this action maximizes the expected reward. This probability seems to be: $\int {\mathbb …

optimization reinforcement-learning multiarmed-bandit

asked Jul 10 '16 at 19:50

Josh

3,408
4
22
46

votes

2 answers

Linear Regret for epsilon-greedy algorithm in Multi-Armed Bandit problem

I am reading about $\epsilon$-greedy algorithm in Multi-Armed Bandit (or $K$ armed bandit) problem, as can be seen here: https://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform_strategies. For the sake of completeness, I am stating the …

machine-learning multiarmed-bandit

asked Jan 11 '16 at 15:20

Sayan Pal

votes

2 answers

A continuous generalization of the binary bandit

There is plenty of reading out there about Bayesian (beta-binomial) multiarm bandits for 0/1 data, but I would like to extend this slightly. To give some context, suppose I have two webpages, A and B. Now I want to test which webpage gets more…

reinforcement-learning beta-binomial-distribution multiarmed-bandit

asked Mar 18 '15 at 19:32

Moderat

votes

1 answer

Bandits with mixed reward processes?

I am trying to model a sequential exploration-exploitation problem with learning as a multi-armed bandit, where the reward mixes a Markovian and a stochastic reward. I understand how to model a bandit problem with Markovian rewards. In a Markovian…

reinforcement-learning multiarmed-bandit

asked Jun 19 '14 at 13:16

dr___mario

2 3

…

10 11 Next