Bellman equation / dynamic programming for darts

Question

When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, ..., S20, the double regions D1, ..., D20, the treble regions T2, ..., T20 and the single- and double bullseye, SB and DB. Every region has a numerical score, h. Lets say, that mu represents the location on the dartboard where the player aims at. In addition, we know the probability of hitting any z when aiming for mu.

Ignoring turns, a player wants to minimize the expected number of throws to get to zero. This can be formulated as a stochastic shortest path DP. The state space for this DP is s {1, 2, ..., 501} and represents the players score. i denotes the number of darts thrown. The goal is to determine the aiming location that minimizes the expected number of throws.

When a player throws, there are three possible outcomes:

new score
checkout
bust

This can be formulated as:

In addition, we have a Bellman equation that satisfies

The terminal condition is equal to

V(0)=0

We can solve the DP successively for s = 2, ..., 501. All of the above can be found on page 13 of the following paper.

Just for my understanding, please let me know if I understand the methodology correctly:
If we have s=2, there is only one legal action, D1. This the only p(z; mu) we consider, correct? Because other possibilities of z results in busts. And then you have basically a circular reference where the value of V(2) is

V(f(2,0) = V(2)

In addition, throwing D1 when aiming for mu gives the terminal condition

V(f(2, D1)) = 0

So, the Bellman equation will always be equal 1 correct? Although, the probability of throwing D1 is larger when the player aims for D1 than for DB, the product of the probabilities and the terminal condition is always 0. So, it doesn't matter where you aim on the dartboard, but this feels incorrect.

Please advice!

No, because at each stage you are *adding* 1 to the summation term, and this carries backwards; after going backwards $N$ stages, you would have added 1 $N$ times. — jbowman, Sep 09 '21 at 16:59
The averaging over $p(z;u)$ converts the overall calculation to be, in effect, 1 + the expected number of stages remaining until termination. — jbowman, Sep 09 '21 at 19:00
Sorry, I don't understand your point. In the first stage, where S=2, where should the player aim at? Could be anywhere right? Because every `p(2;u)` is multiplied by `V(0)=0`. Or am I seeing this wrong? — HJA24, Sep 10 '21 at 08:19

Bellman equation / dynamic programming for darts

0 Answers0