0

There is a question here for 2014 about the convergence of policy iteration algorithm with two answers > Question However, it is not clear for me how we change the value functions after one policy improvement and it does not affect our convergence. Actually, the user in the mentioned question asks exactly the same thing in the last paragraph but I think answers are not satisfying! I have attached a screenshot from the last paragraph too. Paragraph

Ali Ghghgh
  • 33
  • 6
  • How do the two questions differ? – Jesper for President Dec 09 '18 at 12:18
  • They are same but the question does not have only one part and I do not see any answer for last paragraph. For example, one of answers is covering getting stuck in local optimums. – Ali Ghghgh Dec 09 '18 at 16:11
  • If I need to explain more or anything else, please ask me. I really want to know the answer. Jumping from one value function to another and also jumping from one policy to another is making convergence. GPI figure shows it too but how? – Ali Ghghgh Dec 18 '18 at 07:34
  • I think your question is very unclear. After reading it I still do not know what you are aksing (and I actually spent 1 hour + trying to). I think you are asking for a proof of the monotonically improvement in value functions when doing policy iterations. But the way I see it the question you refer to explicitly adress this and you do not say why you think the answers are unsatisfying. If it is due to the lack of mathematical rigor I suggest you consult the litterature, there are articles written on the topic and getting monotone convergence is not trivial. – Jesper for President Dec 18 '18 at 13:29
  • You could look at the article "CONVERGENCE PROPERTIES OF POLICY ITERATION" by MANUEL S. SANTOS and JOHN RUST, lemma 4.1 and 4.2 establishes monotone convergence. But offcourse they are considering a specific class of Markov Decision Problems. – Jesper for President Dec 18 '18 at 13:36
  • So it is really dependent on MDP because there is another proof in this book: "Dynamic programming and Markov processes" by Howard for completely ergodic systems in the finite horizon. And... the answer in the mentioned link wasn't covering my problem! – Ali Ghghgh Dec 18 '18 at 18:57

0 Answers0