Questions tagged [contextual-bandit]

34 questions
14
votes
1 answer

Cost functions for contextual bandits

I'm using vowpal wabbit to solve a contextual-bandit problem. I'm showing ads to users, and I have a fair bit of information about the context in which the ad is shown (e.g. who the user is, what site they're on, etc.). This seems to be a pretty…
5
votes
1 answer

Equivalence of Contextual Bandit formulations

I find two different type of Contextual Bandit problem formulations in the literature: Definition 1: (https://hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf) In a contextual bandits problem, there is a distribution $P$ over…
4
votes
2 answers

Exploiting features in a multiarmed bandit scenario

I am facing a challenging problem: Say I have shirts of three different colors (same price). And say I am running a strange kind of store in into which people come in one by one, and I can show them only one shirt, and they decide weather to buy…
3
votes
1 answer

How to Deploy Contextual Bandits in Online Experimentation Platform?

This question is about how to deploy contextual bandits(CMAB) in the context of web site optimization and online experimentation. I implemented contextual free MAB(MAB). When I run a MAB experiment, I just run it standalone in the online…
3
votes
2 answers

Contextual bandits: Number of models to estimate

I have recently read several papers on contextual bandits especially for the case of binary rewards. However, one very basic aspect is not entirely clear to me: In some papers (e.g. here https://arxiv.org/pdf/1812.06227.pdf), it is explicitly…
3
votes
1 answer

Using IPS(inverse probability weighting) with a deterministic policy as the logging policy

In a contextual bandit problem, why can't we use inverse probability weighting (inverse propensity score) with a deterministic policy as the logging policy? Could you give me a concrete example?
3
votes
1 answer

Multi-armed bandit in face of full reward information

I am new to this area of machine learning. I am just walking myself through UCB1 algorithm which seems to assume that the payoff can be learnt only for action that is taken. What I am curious about is if there is already analysis of the multi-armed…
2
votes
1 answer

Are Bandit Algorithms Considered as Online Algorithms?

I think bandit algorithms(such as multi-armed bandit algorithms) can be considered as online algorithms because they make decision and update the parameters as data arrives. However, I can't find any articles/posts that confirm this statement.
etang
  • 623
  • 3
  • 9
2
votes
1 answer

Mechanism of Adversarial Multi-Armed Bandit Problem?

I am studying the Bandit Algorithms book by Tor Lattimore and Csaba Szepesv´ari and I have studied the adversarial bandit problem. However, I don't understand what is the mechanism of adversarial bandit problem. The book says that regret of…
Katatonia
  • 481
  • 1
  • 4
  • 10
2
votes
0 answers

How can one optimize black-box functions given context?

Libraries like hyperopt or scikit-optimize allow one to optimize a black-box function. However, they do not allow specifying contextual information outside of the parameters to be chosen by the acquisition function. Is there a similar (to skopt)…
2
votes
0 answers

Why psuedo regret and not regret is used in adversarial bandits?

In adversarial settings, psuedo regret and not the actual regret is used. The explanation I have been given is that with actual regret the problem is no longer learnable (that is adversary can generate losses so that regret is no longer…
2
votes
1 answer

For a Multi-arm Bandits set-up, does the Signal to Noise Ratio have any meaning there?

Suppose that we have a Multi-arm Bandit set-up with $K = 5$ bandits. Each bandit has a reward distribution of: $$ X_i \sim Bern(p_i), \ \ \ i \in \{1, \ldots, 5\} $$ In literature about MABs, authors speak of a signal-to-noise ratio. I was…
2
votes
1 answer

EXP4 algorithm for contextual bandits: where do experts come from?

I am working on an implementation of the EXP4 algorithm in the context of a pricing decision (e.g. given a context, the user should be given a price from a few pre-determined options). The EXP4 uses "experts" as a mean to handle various aspects of…
amit
  • 541
  • 3
  • 10
2
votes
1 answer

Posterior sampling for bandit

I am looking at this paper on posterior sampling. The algorithm is on page 8 (image below): Let’s say I have 3 arms and on line 22 arm 3 is the best followed by arm 2 then arm Line 24 calculates the number of candidate arm sample to draw. There is…
user3022875
  • 726
  • 1
  • 6
  • 17
1
vote
1 answer

Real-World, Operationalized Applications of Multi-Arm Bandits

Multi-armed bandits are wonderful and have lots of potential applications. However, I don't know many companies or real-world practitioners who have implemented bandit algorithms. What are some examples of multi-bandits that are up and running? By…
1
2 3