Highest Voted 'value-iteration' Questions - Statistical Analysis Stack Exchange

8

votes

0 answers

Why are the value and policy iteration dynamic programming algorithms?

Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations. My current understanding of dynamic programming is this: It is a method applied to…

asked Jan 30 '17 at 10:24

Karthik Thiagarajan

525
5
11

1

vote

1 answer

Q-value Iteration Convergence in Reinforcement Learning

I just started learning value iteration in reinforcement learning and I am confused about the theorem indicating that the iterations to have an error of at most $\epsilon$ grows with $λ$ is: $$N = \frac{\log…

reinforcement-learning value-iteration

asked Oct 06 '21 at 07:49

Williamwyn

13
2

1

vote

1 answer

Add maximum time step to value iteration algorithm

What would a value iteration algorithm look like if I specify a maximum time step? For example, from a given state the environment does not reach a terminating state but instead should terminate because it has exceeded the maximum number of steps…

reinforcement-learning dynamic-programming value-iteration

asked Sep 22 '21 at 21:21

Stephane Hatgiskessell

13
3

1

vote

1 answer

Q-learning shows worse results than value iteration

I'm trying to solve the same problem with different algorithms (Travel max possible distance with a car). While using value iteration and policy iteration I was able to get the best results possible but with Q-learning it doesn't seem to go well. My…

reinforcement-learning q-learning value-iteration policy-iteration deterministic-policy

asked Mar 11 '19 at 10:28

Most Wanted

255
1
13

0

votes

0 answers

Bellman equation / dynamic programming for darts

When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, ..., S20, the double regions D1, ..., D20, the treble regions T2, ..., T20 and the single- and double bullseye, SB and DB. Every region has a…

optimization reinforcement-learning dynamic-programming value-iteration

asked Sep 09 '21 at 16:17

HJA24

11
4

0

votes

1 answer

Small difference of q-function between different actions for the same state

I am trying out reinforcement learning using Q-learning. The data come from some made-up equations so I have infinite number of data. One thing that troubles me is after I learn the Q-function, I use $$argmax_a Q(s, a)$$ to pick action for state…

reinforcement-learning q-learning value-iteration

asked Jan 02 '20 at 04:02

DiveIntoML

1,583
1
11
21

Questions tagged [value-iteration]

Why are the value and policy iteration dynamic programming algorithms?

Q-value Iteration Convergence in Reinforcement Learning

Add maximum time step to value iteration algorithm

Q-learning shows worse results than value iteration

Bellman equation / dynamic programming for darts

Small difference of q-function between different actions for the same state