Questions tagged [contextual-bandit]
34 questions
14
votes
1 answer
Cost functions for contextual bandits
I'm using vowpal wabbit to solve a contextual-bandit problem. I'm showing ads to users, and I have a fair bit of information about the context in which the ad is shown (e.g. who the user is, what site they're on, etc.). This seems to be a pretty…

Zach
- 22,308
- 18
- 114
- 158
5
votes
1 answer
Equivalence of Contextual Bandit formulations
I find two different type of Contextual Bandit problem formulations in the literature:
Definition 1: (https://hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf) In a contextual bandits problem, there is a distribution $P$ over…

Apprentice
- 642
- 1
- 24
4
votes
2 answers
Exploiting features in a multiarmed bandit scenario
I am facing a challenging problem:
Say I have shirts of three different colors (same price). And say I am running
a strange kind of store in into which people come in one by one,
and I can show them only one shirt, and they decide weather to buy…

user46386
- 41
- 2
3
votes
1 answer
How to Deploy Contextual Bandits in Online Experimentation Platform?
This question is about how to deploy contextual bandits(CMAB) in the context of web site optimization and online experimentation. I implemented contextual free MAB(MAB). When I run a MAB experiment, I just run it standalone in the online…

etang
- 623
- 3
- 9
3
votes
2 answers
Contextual bandits: Number of models to estimate
I have recently read several papers on contextual bandits especially for the case of binary rewards. However, one very basic aspect is not entirely clear to me:
In some papers (e.g. here https://arxiv.org/pdf/1812.06227.pdf), it is explicitly…

hoeftn
- 31
- 2
3
votes
1 answer
Using IPS(inverse probability weighting) with a deterministic policy as the logging policy
In a contextual bandit problem, why can't we use inverse probability weighting (inverse propensity score) with a deterministic policy as the logging policy? Could you give me a concrete example?

Hunnam
- 155
- 5
3
votes
1 answer
Multi-armed bandit in face of full reward information
I am new to this area of machine learning. I am just walking myself through UCB1 algorithm which seems to assume that the payoff can be learnt only for action that is taken. What I am curious about is if there is already analysis of the multi-armed…

Sal
- 131
- 2
2
votes
1 answer
Are Bandit Algorithms Considered as Online Algorithms?
I think bandit algorithms(such as multi-armed bandit algorithms) can be considered as online algorithms because they make decision and update the parameters as data arrives. However, I can't find any articles/posts that confirm this statement.

etang
- 623
- 3
- 9
2
votes
1 answer
Mechanism of Adversarial Multi-Armed Bandit Problem?
I am studying the Bandit Algorithms book by Tor Lattimore and Csaba Szepesv´ari and I have studied the adversarial bandit problem. However, I don't understand what is the mechanism of adversarial bandit problem. The book says that regret of…

Katatonia
- 481
- 1
- 4
- 10
2
votes
0 answers
How can one optimize black-box functions given context?
Libraries like hyperopt or scikit-optimize allow one to optimize a black-box function. However, they do not allow specifying contextual information outside of the parameters to be chosen by the acquisition function. Is there a similar (to skopt)…

Brian Bien
- 592
- 3
- 19
2
votes
0 answers
Why psuedo regret and not regret is used in adversarial bandits?
In adversarial settings, psuedo regret and not the actual regret is used. The explanation I have been given is that with actual regret the problem is no longer learnable (that is adversary can generate losses so that regret is no longer…

Inspired_Blue
- 121
- 3
2
votes
1 answer
For a Multi-arm Bandits set-up, does the Signal to Noise Ratio have any meaning there?
Suppose that we have a Multi-arm Bandit set-up with $K = 5$ bandits. Each bandit has a reward distribution of:
$$
X_i \sim Bern(p_i), \ \ \ i \in \{1, \ldots, 5\}
$$
In literature about MABs, authors speak of a signal-to-noise ratio. I was…

user321627
- 2,511
- 3
- 13
- 49
2
votes
1 answer
EXP4 algorithm for contextual bandits: where do experts come from?
I am working on an implementation of the EXP4 algorithm in the context of a pricing decision (e.g. given a context, the user should be given a price from a few pre-determined options).
The EXP4 uses "experts" as a mean to handle various aspects of…

amit
- 541
- 3
- 10
2
votes
1 answer
Posterior sampling for bandit
I am looking at this paper on posterior sampling.
The algorithm is on page 8 (image below):
Let’s say I have 3 arms and on line 22 arm 3 is the best followed by
arm 2 then arm
Line 24 calculates the number of candidate arm sample to draw. There is…

user3022875
- 726
- 1
- 6
- 17
1
vote
1 answer
Real-World, Operationalized Applications of Multi-Arm Bandits
Multi-armed bandits are wonderful and have lots of potential applications. However, I don't know many companies or real-world practitioners who have implemented bandit algorithms.
What are some examples of multi-bandits that are up and running? By…

ABC
- 409
- 3
- 7