0

Given a map $\pi: S \to A$, is there an MDP with state ans action spaces $S,A$ such that it has $\pi$ as an optimal policy if we suppose the MDP is over an infinite time horizon and the optimality criterion is the expected discounted total reward?

Let $\pi$ be a sequence of maps $d_i:S \to A$ for $i=1,...,N$, is there an $N$ steps time horizon MDP that has $\pi$ an optimal policy if the optimality criterion is the total expected reward?

Vincent L.
  • 101
  • 2

0 Answers0