Questions tagged [dynamic-programming]
13 questions
8
votes
0 answers
Why are the value and policy iteration dynamic programming algorithms?
Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations.
My current understanding of dynamic programming is this:
It is a method applied to…

Karthik Thiagarajan
- 525
- 5
- 11
2
votes
1 answer
Value of the absorbing state in a MDP and greedy policy - Why choose to go to the absorbing state if state value is 0?
I was going through an example of a Markov Decision Problem and I got the optimal value function with the value iteration algorithm described in Sutton Barto.
In this algorithm I chose to initialise the value function with all zero for all states.…

Peter Keller
- 123
- 3
2
votes
1 answer
Dyna-Q Algorithm Reinforcement Learning
In step(f) of the Dyna-Q algorithm we plan by taking random samples from the experience/model for some steps.
Wouldn't it be more efficient if we construct an MDP from experience by computing the state transition probabilities and reward…

gnikol
- 657
- 2
- 6
- 16
2
votes
0 answers
Understanding Approximate Dynamic Programming
I am trying to write a paper for my optimization class about Approximate Dynamic Programming. I found a few good papers but they all seem to dive straight into the material without talking about the basics so I am really lost about what it is.
In…

optimuscontrol
- 21
- 1
1
vote
1 answer
Add maximum time step to value iteration algorithm
What would a value iteration algorithm look like if I specify a maximum time step?
For example, from a given state the environment does not reach a terminating state but instead should terminate because it has exceeded the maximum number of steps…
1
vote
0 answers
Bellman Optimality Operator fixed point
I'm reading Szepesvári's book on RL. My question is concerning the proof of Theorem A.10 (p. 71).
Theorem
Let $V$ be the fixed point of $T^∗$ and assume that there is policy $π$ which is greedy w.r.t $V:T^πV=T^∗V$. Then $V=V^∗$ and $π$ is an…

Nick Halden
- 23
- 5
1
vote
2 answers
Choosing a changepoint detection algorithm
I've been reading up on changepoint algorithms (dynamic programming, Bayesian Online Changepoint detection, Hidden Markov Models, etc.) and am looking to implement an algorithm that has a certain set of attributes, ideally in Python. I don't have a…

SuperCodeBrah
- 133
- 5
1
vote
1 answer
Efficiency of Viterbi Algorithm
Considering that we have a sequence of observed states $\{y_1, y_2, \dots, y_T \}$ of length $T$.
We want to generate a path $\{x_1, x_2, \dots, x_T\}$, which is a sequence of states $x_n \in S = \{s_1, s_2, \dots, s_K\}$, that is, each $x_i$ can…

Gustavo Rangel
- 21
- 2
0
votes
0 answers
Circular reference in states of the Bellman equation
I want to formulate a game of darts as a dynamic program. The goal is to minimize the number of darts thrown while reaching checkout. A dart player has a score s. If his score is s = 2, there is only a single legal / possible move.
He can checkout…

HJA24
- 11
- 4
0
votes
0 answers
Bellman equation / dynamic programming for darts
When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, ..., S20, the double regions D1, ..., D20, the treble regions T2, ..., T20 and the single- and double bullseye, SB and DB.
Every region has a…

HJA24
- 11
- 4
0
votes
0 answers
How does AlphaZero guarantee it could make consistent improvement?
I know the detail of AlphaZero. And in detail, I know it is improving by "policy iteration" mechanism.
I found an answer that prove it can finally converge to optimal.
But...
Is it still correct when using policy iteration with neural network like…

Junwei Dong
- 51
- 3
0
votes
0 answers
Bellman equations and reinforcement learning
We need to define just a few things first:
$(a_t, s_t) \in \mathcal{A} \times \mathcal{S}$ are the action and state at time $t$;
$r(s_t, a_t)$ is the reward for taking action $a_t$ in state $s_t$;
Regarding timing, our agent observes $s_t$, then…

Stéphane
- 213
- 1
- 6
0
votes
1 answer
Unique game problem (ML, DP, PP etc)
Looking for a solution to my below game problem. I believe it to be some sort of dynamic programming, machine learning, or probabilistic programming challenge, but am unsure... This is my original problem, and is part of an initiative to create…

rambossa
- 121
- 5