Deduce the Bellman equation from the Value and Q functions

Question

I am trying to derive/deduce the bellman equation using Value and Q-functions. I came only so far with understanding it and tried it myself in Latex:

Why is the $V^*$ suddenly in $Q^\pi$ function? Why not $Q^\pi = r + \gamma Q^\pi(s_{t+1}, a)$?

And the $Q^*$ doesn't make so much sense either

score 1 · Answer 1 · answered May 29 '21 at 16:12

Those euquations are simply wrong. The value-functions under an policy $\pi$ are defined as the expected amount of the total (discounted) return following policy $\pi$ from state $s$ onwards. Since the succesor states and rewards are random, one can't simply omit the expectancy.

Deduce the Bellman equation from the Value and Q functions

1 Answers1