0

I am trying to derive/deduce the bellman equation using Value and Q-functions. I came only so far with understanding it and tried it myself in Latex: enter image description here

Why is the $V^*$ suddenly in $Q^\pi$ function? Why not $Q^\pi = r + \gamma Q^\pi(s_{t+1}, a)$?

And the $Q^*$ doesn't make so much sense either

1 Answers1

1

Those euquations are simply wrong. The value-functions under an policy $\pi$ are defined as the expected amount of the total (discounted) return following policy $\pi$ from state $s$ onwards. Since the succesor states and rewards are random, one can't simply omit the expectancy.