Highest Voted 'dynamic-programming' Questions - Statistical Analysis Stack Exchange

8

votes

0 answers

Why are the value and policy iteration dynamic programming algorithms?

Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations. My current understanding of dynamic programming is this: It is a method applied to…

asked Jan 30 '17 at 10:24

Karthik Thiagarajan

525
5
11

2

votes

1 answer

Value of the absorbing state in a MDP and greedy policy - Why choose to go to the absorbing state if state value is 0?

I was going through an example of a Markov Decision Problem and I got the optimal value function with the value iteration algorithm described in Sutton Barto. In this algorithm I chose to initialise the value function with all zero for all states.…

reinforcement-learning dynamic-programming

asked May 13 '20 at 08:21

Peter Keller

123
3

2

votes

1 answer

Dyna-Q Algorithm Reinforcement Learning

In step(f) of the Dyna-Q algorithm we plan by taking random samples from the experience/model for some steps. Wouldn't it be more efficient if we construct an MDP from experience by computing the state transition probabilities and reward…

reinforcement-learning markov-decision-process dynamic-programming

asked Sep 24 '19 at 21:02

gnikol

657
2
6
16

2

votes

0 answers

Understanding Approximate Dynamic Programming

I am trying to write a paper for my optimization class about Approximate Dynamic Programming. I found a few good papers but they all seem to dive straight into the material without talking about the basics so I am really lost about what it is. In…

optimization markov-process reinforcement-learning dynamic-programming

asked May 06 '19 at 20:28

optimuscontrol

21
1

1

vote

1 answer

Add maximum time step to value iteration algorithm

What would a value iteration algorithm look like if I specify a maximum time step? For example, from a given state the environment does not reach a terminating state but instead should terminate because it has exceeded the maximum number of steps…

reinforcement-learning dynamic-programming value-iteration

asked Sep 22 '21 at 21:21

Stephane Hatgiskessell

13
3

1

vote

0 answers

Bellman Optimality Operator fixed point

I'm reading Szepesvári's book on RL. My question is concerning the proof of Theorem A.10 (p. 71). Theorem Let $V$ be the fixed point of $T^∗$ and assume that there is policy $π$ which is greedy w.r.t $V:T^πV=T^∗V$. Then $V=V^∗$ and $π$ is an…

machine-learning reinforcement-learning q-learning markov-decision-process dynamic-programming

asked May 29 '21 at 15:57

Nick Halden

23
5

1

vote

2 answers

Choosing a changepoint detection algorithm

I've been reading up on changepoint algorithms (dynamic programming, Bayesian Online Changepoint detection, Hidden Markov Models, etc.) and am looking to implement an algorithm that has a certain set of attributes, ideally in Python. I don't have a…

time-series maximum-likelihood change-point timeseries-segmentation dynamic-programming

asked Jan 28 '21 at 08:55

SuperCodeBrah

133
5

1

vote

1 answer

Efficiency of Viterbi Algorithm

Considering that we have a sequence of observed states $\{y_1, y_2, \dots, y_T \}$ of length $T$. We want to generate a path $\{x_1, x_2, \dots, x_T\}$, which is a sequence of states $x_n \in S = \{s_1, s_2, \dots, s_K\}$, that is, each $x_i$ can…

hidden-markov-model viterbi-algorithm dynamic-programming

asked Nov 02 '20 at 13:24

Gustavo Rangel

21
2

0

votes

0 answers

Circular reference in states of the Bellman equation

I want to formulate a game of darts as a dynamic program. The goal is to minimize the number of darts thrown while reaching checkout. A dart player has a score s. If his score is s = 2, there is only a single legal / possible move. He can checkout…

optimization dynamic-programming

asked Sep 14 '21 at 09:29

HJA24

11
4

0

votes

0 answers

Bellman equation / dynamic programming for darts

When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, ..., S20, the double regions D1, ..., D20, the treble regions T2, ..., T20 and the single- and double bullseye, SB and DB. Every region has a…

optimization reinforcement-learning dynamic-programming value-iteration

asked Sep 09 '21 at 16:17

HJA24

11
4

0

votes

0 answers

How does AlphaZero guarantee it could make consistent improvement?

I know the detail of AlphaZero. And in detail, I know it is improving by "policy iteration" mechanism. I found an answer that prove it can finally converge to optimal. But... Is it still correct when using policy iteration with neural network like…

machine-learning monte-carlo reinforcement-learning algorithms dynamic-programming

asked Mar 22 '21 at 05:17

Junwei Dong

51
3

0

votes

0 answers

Bellman equations and reinforcement learning

We need to define just a few things first: $(a_t, s_t) \in \mathcal{A} \times \mathcal{S}$ are the action and state at time $t$; $r(s_t, a_t)$ is the reward for taking action $a_t$ in state $s_t$; Regarding timing, our agent observes $s_t$, then…

machine-learning reinforcement-learning policy-gradient dynamic-programming

asked Feb 04 '21 at 23:11

Stéphane

213
1
6

0

votes

1 answer

Unique game problem (ML, DP, PP etc)

Looking for a solution to my below game problem. I believe it to be some sort of dynamic programming, machine learning, or probabilistic programming challenge, but am unsure... This is my original problem, and is part of an initiative to create…

machine-learning probability reinforcement-learning probabilistic-programming dynamic-programming

asked Jul 20 '19 at 14:54

rambossa

121
5

Questions tagged [dynamic-programming]