0

I was reading Sutton's book Reinforcement Learning: An Introduction, especially policy iteration part.

There was a proof for convergence of policy iteration with deterministic policy.

So i tried to find the proof for the case of stochastic policy, Curiously.

But i couldn't find any clear explanations dealing with it.

Can i have clear proof for convergence of policy iteration with stochastic policy?

nawab
  • 1

0 Answers0