5

I'm very new to bandit problems (apologies if I've formatted my question incorrectly), but I have to solve the optimal stopping of what I think is a very simple case. Suppose I have two arms $k = {1, 2}$. At each step, arm one gives a known, fixed payout $τ > 0$ plus a Bernoulli r.v. (which I will define in a moment), and arm two gives a r.v. payout $E > 0$ with Bernoulli probability $p_i$ and $0$ otherwise, where there are two possible types, $i = {L, H}$ and $p_L < p_H$.

The payouts are determined as followed. If the first arm is pulled, then I receive $τ$ and either $0$ or $E$ from a Bernoulli distribution that follows $p_L$, so $r_1 = τ + X_L$. If the second arm is pulled, I receive $0$ and either $0$ or $E$ from a Bernoulli distribution that follows either $p_L$ or $p_H$ depending on the type of the second arm, so $r_2 = X_i$, where $i$ is determined by nature. I have some prior of the probability that $i = H$, denoted by $\alpha$.

I have a solution to the myopic case where $A = (1, 0, 0, ...)$, but I want a strategy that maximises the expected sum of my payouts with geometric discounting $A = (1, β, β^2, ..)$, such that $P = \sum_{t=1}^{\infty} β^{t-1}E(r_k)$. From what I've read, I should be solving an optimal stopping problem here. I've seen closed-form stopping thresholds for some examples of Bernoulli one-armed bandit problems with geometric discounting (ex. Berry and Fristedt (1979)), but none had a fixed payout like mine and all payouts were 0 or 1. Any help finding a solution that corresponds to my problem would be greatly appreciated.

Matt
  • 51
  • 1

0 Answers0