How to prove that stochastic policy iteration converges?

Asked May 22 '21 at 09:27

Active May 22 '21 at 09:27

Viewed 22 times

I was reading Sutton's book Reinforcement Learning: An Introduction, especially policy iteration part.

There was a proof for convergence of policy iteration with deterministic policy.

So i tried to find the proof for the case of stochastic policy, Curiously.

But i couldn't find any clear explanations dealing with it.

Can i have clear proof for convergence of policy iteration with stochastic policy?

asked May 22 '21 at 09:27

nawab

0 Answers0