1

I'm a student to start to study RL.

When I studied MDP and watched the gridworld example, I had one question.

In the gridworld, we usually assume that we can have four actions in any states, e.g. up, down, left, right.

In this case, if we have a random policy, which means the probability is 0.25 for each action in any states, then does the policy correspond to a stochastic policy?

As I searched it, because the deterministic policy maps only one action for each state with 1.0 probability, I think that the random policy corresponds to the stochastic policy, but I'm not sure.

beef stew
  • 43
  • 3

0 Answers0