ε-Greedy with Q learning / SARSA can have stochastic policy?

Asked Jan 17 '21 at 09:56

Active Jan 17 '21 at 09:56

Viewed 27 times

Hello I'm now studying Q learning and SARSA with ε-Greedy , Softmax startegies. And have a question about my readings. In my readings, when SARSA with ε-Greedy, SARSA causes value-function oscilliations in case of stochastic polices. but I think that only softmax startegies can be case of stochastic polices, not Greedy startegy. Maybe reading wants to say that SARSA is on policy so that Q(s',a') oscilliate but I don't know that it means sarsa can have stochastic policy. Please teach me what I miss.

asked Jan 17 '21 at 09:56

BE LEO

ε-Greedy with Q learning / SARSA can have stochastic policy?

0 Answers0