I'm stuck in solving a simple dynamic probabilistic model. I have Three
states {Sunny, Cloudy, Rainy}
.
I have the Transition Probability Matrix
for the states transitioning to another state (for eg. Sunny -> Cloudy or Sunny -> Sunny). For the Action Space
I have {"Bring Umbrella", "Don't Bring Umbrella"}
and I have decided on the Reward Matrix
.
Now, I want to solve this problem. That is, I want to find the best policy.
I was referring to various models and was directed towards Markov Decision Process
. How can I solve the same with the above given information?
I have looked for python and R packages to solve the same. I came across mdptoolbox
. To solve this problem the library requires the transition matrix with actions, i.e. for each given action, what is the corresponding transition matrix. (I don't know how to find these).
How shall I proceed further? State Transition Matrix
and Reward Matrix
is all the information that I have.