Highest Voted 'policy-iteration' Questions - Statistical Analysis Stack Exchange

12

votes

2 answers

Why does the policy iteration algorithm converge to optimal policy and value function?

I was reading Andrew Ng's lecture notes on reinforcement learning, and I was trying to understand why policy iteration converged to the optimal value function $V^*$ and optimum policy $\pi^*$. Recall policy iteration is: $ \text{Initialize $\pi$…

reinforcement-learning policy-iteration

asked May 09 '14 at 16:51

Charlie Parker

5,836
11
57
113

8

votes

0 answers

Why are the value and policy iteration dynamic programming algorithms?

Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations. My current understanding of dynamic programming is this: It is a method applied to…

reinforcement-learning policy-iteration value-iteration dynamic-programming

asked Jan 30 '17 at 10:24

Karthik Thiagarajan

525
5
11

4

votes

0 answers

Convergence Proof of First Visit Monte Carlo Control

I am currently trying to find a formal proof of convergence for the Monte Carlo Reinforcement Learning Methods described in Sutton,Barto's Book "Reinforcement Learning - An Introduction" , Section 5. They explain that along the ideas of generalized…

monte-carlo reinforcement-learning policy-iteration

asked Jun 11 '19 at 11:08

GreenLogic

193
6

2

votes

1 answer

Why the $\gamma^t$ is needed here in REINFORCE: Monte-Carlo Policy-Gradient Control (episodic) for $\pi_{*}$?

While reading PG method in Prof Sutton's RL book again, I found there is $\gamma^t$ in the last row (as shown below) in pseudo code. The book said The second difference between the pseudocode update and the REINFORCE update equation (13.8) is that…

reinforcement-learning policy-gradient policy-iteration

asked Sep 19 '21 at 13:10

GoingMyWay

1,111
2
13
25

2

votes

1 answer

Policy improvement in SARSA and Q learning

I have a rather trivial doubt in SARSA and Q learning. Looking at the pseudocode of the two algorithms in Sutton&Barto book, I see the policy improvement step is missing. How will I get the optimal policy by the two algorithms? Are they used to find…

reinforcement-learning q-learning sarsa policy-iteration stochastic-policy

asked Apr 07 '21 at 11:45

Jor_El

391
3
9

1

vote

1 answer

One small confusion on $\epsilon$-Greedy policy improvement based on Monte Carlo

I'm working on the RL book of Barto and Sutton, the author has provided the proof based on the policy improvement theorem, I can fully understand the inequality, but for the first equality, it really confuses me. why does $ q_{\pi}(s,\pi^{'}(s)) =…

monte-carlo reinforcement-learning policy-iteration

asked Oct 20 '20 at 01:20

FantasticAI

417
1
4
12

1

vote

0 answers

How to increase the total number of iterations it takes to converge a MDP?

I was reading about Policy Iteration. What are the factors that influence the total number of iterations the algorithm takes to converge? For a given MDP which converges in 3 iterations, what setting needs to be influenced for the MDP so that the…

markov-process reinforcement-learning markov-decision-process policy-iteration

asked Jun 08 '19 at 12:49

Amanda

111
1

1

vote

1 answer

Q-learning shows worse results than value iteration

I'm trying to solve the same problem with different algorithms (Travel max possible distance with a car). While using value iteration and policy iteration I was able to get the best results possible but with Q-learning it doesn't seem to go well. My…

reinforcement-learning q-learning value-iteration policy-iteration deterministic-policy

asked Mar 11 '19 at 10:28

Most Wanted

255
1
13

Questions tagged [policy-iteration]