Highest Voted 'contextual-bandit' Questions - Statistical Analysis Stack Exchange

14

votes

1 answer

Cost functions for contextual bandits

I'm using vowpal wabbit to solve a contextual-bandit problem. I'm showing ads to users, and I have a fair bit of information about the context in which the ad is shown (e.g. who the user is, what site they're on, etc.). This seems to be a pretty…

asked Apr 02 '14 at 15:47

Zach

22,308
18
114
158

5

votes

1 answer

Equivalence of Contextual Bandit formulations

I find two different type of Contextual Bandit problem formulations in the literature: Definition 1: (https://hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf) In a contextual bandits problem, there is a distribution $P$ over…

reinforcement-learning multiarmed-bandit sequential-analysis contextual-bandit

asked Jun 23 '20 at 07:10

Apprentice

642
1
24

4

votes

2 answers

Exploiting features in a multiarmed bandit scenario

I am facing a challenging problem: Say I have shirts of three different colors (same price). And say I am running a strange kind of store in into which people come in one by one, and I can show them only one shirt, and they decide weather to buy…

machine-learning reinforcement-learning multiarmed-bandit contextual-bandit

asked May 29 '14 at 14:51

user46386

41
2

3

votes

1 answer

How to Deploy Contextual Bandits in Online Experimentation Platform?

This question is about how to deploy contextual bandits(CMAB) in the context of web site optimization and online experimentation. I implemented contextual free MAB(MAB). When I run a MAB experiment, I just run it standalone in the online…

experiment-design reinforcement-learning multiarmed-bandit contextual-bandit

asked May 22 '20 at 00:19

etang

623
3
9

3

votes

2 answers

Contextual bandits: Number of models to estimate

I have recently read several papers on contextual bandits especially for the case of binary rewards. However, one very basic aspect is not entirely clear to me: In some papers (e.g. here https://arxiv.org/pdf/1812.06227.pdf), it is explicitly…

machine-learning reinforcement-learning multiarmed-bandit contextual-bandit

asked Feb 18 '20 at 11:21

hoeftn

31
2

3

votes

1 answer

Using IPS(inverse probability weighting) with a deterministic policy as the logging policy

In a contextual bandit problem, why can't we use inverse probability weighting (inverse propensity score) with a deterministic policy as the logging policy? Could you give me a concrete example?

machine-learning unbiased-estimator propensity-scores multiarmed-bandit contextual-bandit

asked Aug 12 '19 at 18:27

Hunnam

155
5

3

votes

1 answer

Multi-armed bandit in face of full reward information

I am new to this area of machine learning. I am just walking myself through UCB1 algorithm which seems to assume that the payoff can be learnt only for action that is taken. What I am curious about is if there is already analysis of the multi-armed…

machine-learning reinforcement-learning multiarmed-bandit contextual-bandit

asked Jul 20 '14 at 14:46

Sal

131
2

2

votes

1 answer

Are Bandit Algorithms Considered as Online Algorithms?

I think bandit algorithms(such as multi-armed bandit algorithms) can be considered as online algorithms because they make decision and update the parameters as data arrives. However, I can't find any articles/posts that confirm this statement.

online-algorithms multiarmed-bandit contextual-bandit

asked Oct 19 '20 at 19:42

etang

623
3
9

2

votes

1 answer

Mechanism of Adversarial Multi-Armed Bandit Problem?

I am studying the Bandit Algorithms book by Tor Lattimore and Csaba Szepesv´ari and I have studied the adversarial bandit problem. However, I don't understand what is the mechanism of adversarial bandit problem. The book says that regret of…

multiarmed-bandit contextual-bandit

asked Apr 04 '20 at 21:02

Katatonia

481
1
4
10

2

votes

0 answers

How can one optimize black-box functions given context?

Libraries like hyperopt or scikit-optimize allow one to optimize a black-box function. However, they do not allow specifying contextual information outside of the parameters to be chosen by the acquisition function. Is there a similar (to skopt)…

optimization scikit-learn bayesian-optimization contextual-bandit

asked Dec 22 '19 at 15:29

Brian Bien

592
3
19

2

votes

0 answers

Why psuedo regret and not regret is used in adversarial bandits?

In adversarial settings, psuedo regret and not the actual regret is used. The explanation I have been given is that with actual regret the problem is no longer learnable (that is adversary can generate losses so that regret is no longer…

multiarmed-bandit contextual-bandit

asked Feb 09 '19 at 11:58

Inspired_Blue

121
3

2

votes

1 answer

For a Multi-arm Bandits set-up, does the Signal to Noise Ratio have any meaning there?

Suppose that we have a Multi-arm Bandit set-up with $K = 5$ bandits. Each bandit has a reward distribution of: $$ X_i \sim Bern(p_i), \ \ \ i \in \{1, \ldots, 5\} $$ In literature about MABs, authors speak of a signal-to-noise ratio. I was…

reinforcement-learning multiarmed-bandit contextual-bandit

asked Apr 13 '18 at 22:36

user321627

2,511
3
13
49

2

votes

1 answer

EXP4 algorithm for contextual bandits: where do experts come from?

I am working on an implementation of the EXP4 algorithm in the context of a pricing decision (e.g. given a context, the user should be given a price from a few pre-determined options). The EXP4 uses "experts" as a mean to handle various aspects of…

multiarmed-bandit contextual-bandit

asked Feb 04 '18 at 19:34

amit

541
3
10

2

votes

1 answer

Posterior sampling for bandit

I am looking at this paper on posterior sampling. The algorithm is on page 8 (image below): Let’s say I have 3 arms and on line 22 arm 3 is the best followed by arm 2 then arm Line 24 calculates the number of candidate arm sample to draw. There is…

multiarmed-bandit contextual-bandit

asked Dec 12 '17 at 17:33

user3022875

726
1
6
17

1

vote

1 answer

Real-World, Operationalized Applications of Multi-Arm Bandits

Multi-armed bandits are wonderful and have lots of potential applications. However, I don't know many companies or real-world practitioners who have implemented bandit algorithms. What are some examples of multi-bandits that are up and running? By…

optimization reinforcement-learning multiarmed-bandit contextual-bandit

asked Jul 29 '21 at 16:57

ABC

409
3
7

Questions tagged [contextual-bandit]