Questions tagged [value-iteration]
6 questions
8
votes
0 answers
Why are the value and policy iteration dynamic programming algorithms?
Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations.
My current understanding of dynamic programming is this:
It is a method applied to…

Karthik Thiagarajan
- 525
- 5
- 11
1
vote
1 answer
Q-value Iteration Convergence in Reinforcement Learning
I just started learning value iteration in reinforcement learning and I am confused about the theorem indicating that the iterations to have an error of at most $\epsilon$ grows with $λ$ is:
$$N = \frac{\log…

Williamwyn
- 13
- 2
1
vote
1 answer
Add maximum time step to value iteration algorithm
What would a value iteration algorithm look like if I specify a maximum time step?
For example, from a given state the environment does not reach a terminating state but instead should terminate because it has exceeded the maximum number of steps…
1
vote
1 answer
Q-learning shows worse results than value iteration
I'm trying to solve the same problem with different algorithms (Travel max possible distance with a car). While using value iteration and policy iteration I was able to get the best results possible but with Q-learning it doesn't seem to go well.
My…

Most Wanted
- 255
- 1
- 13
0
votes
0 answers
Bellman equation / dynamic programming for darts
When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, ..., S20, the double regions D1, ..., D20, the treble regions T2, ..., T20 and the single- and double bullseye, SB and DB.
Every region has a…

HJA24
- 11
- 4
0
votes
1 answer
Small difference of q-function between different actions for the same state
I am trying out reinforcement learning using Q-learning. The data come from some made-up equations so I have infinite number of data.
One thing that troubles me is after I learn the Q-function, I use
$$argmax_a Q(s, a)$$
to pick action for state…

DiveIntoML
- 1,583
- 1
- 11
- 21