1

How do we derive the associated Bellman equation from the optimal value function, $V^*(k)$ using the optimal action-value function which is $Q^*(k,a)$?

Currently I've have/derived the following: $V^*(k)$ := $\max_{\pi}$ $E\left(\sum^{\infty}_{n=0}R(X_{n})|X_{0}=k\right)$

$Q^*(k,a)$ = $R(k,a)$ + $\sum_{l}P^{a}V^*(k)$

And I know that I'll need to take the maximum of $Q^*(k,a)$ in order to express the equation using this function.

I'm attempting to use first step analysis to do this but I just couldn't seem to derive a decent equation.

eunice
  • 33
  • 3
  • 1
    https://stats.stackexchange.com/questions/243384/deriving-bellmans-equation-in-reinforcement-learning – Alex R. Nov 01 '17 at 19:53

0 Answers0