0

I have read this Confusion around Bellman (update) operator and yet I am not clear about the difference between the two equations $V^{\pi }(s)=R(s,\pi (s))+\gamma \sum _{s'}P(s'|s,\pi (s))V^{\pi }(s').\ $ from https://en.wikipedia.org/wiki/Bellman_equation and enter image description here

that I read from here https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture26-ri.pdf and many other places. All helps and clarifications are welcome and will be very helpful. Thank you.

yk3003018
  • 1
  • 2
  • Can you give more information about what your problem is? You show two different equations, but don't really ask a question, so it is not clear where you are stuck. At the very least it would help if you explain which of the terms you understand or don't understand in each equation. Otherwise someone answering has to explain all the terms, which is unnecessary extra work if all you need to understand is difference between $\pi(s)$ and $\pi(a|s)$ for instance – Neil Slater Sep 04 '19 at 06:34
  • I think the problem is that both equations refer to the same term on the left hand side but it is not clear why the right hand side is equal as well... – Fabian Werner Sep 04 '19 at 08:06
  • For the answer: The reason why in the second equation, there is a sum over the $R$ term is that people sometimes use different setups: Is $R$ actually a random variable or a deterministic reward? In the second equation they view that as a random variable and therefore, if you sum over it you get $E[R_t|S_t=s]$ which the people from the first equation just define / take for granted as $R(s, \pi(s))$... i.e. the second equation is more general and in the setup in which the first one is used, it reduces to the first one. – Fabian Werner Sep 04 '19 at 10:59
  • You might also be interested in (shamelessly pointing to my own answer) this great answer ;-) https://stats.stackexchange.com/questions/243384/deriving-bellmans-equation-in-reinforcement-learning/370199#370199 – Fabian Werner Sep 04 '19 at 11:00
  • @Neil Slater the two equations suppose to mean the same, but I don't seem to understand why the are different. The second equation look to me as the sum or expectation of the first one. – yk3003018 Sep 04 '19 at 14:11
  • @Fabien Werner Thank for answer – yk3003018 Sep 04 '19 at 14:13

0 Answers0