MDP optimal policy inverse problem

Asked Apr 15 '21 at 18:01

Active Apr 15 '21 at 18:09

Viewed 11 times

Given a map $\pi: S \to A$, is there an MDP with state ans action spaces $S,A$ such that it has $\pi$ as an optimal policy if we suppose the MDP is over an infinite time horizon and the optimality criterion is the expected discounted total reward?

Let $\pi$ be a sequence of maps $d_i:S \to A$ for $i=1,...,N$, is there an $N$ steps time horizon MDP that has $\pi$ an optimal policy if the optimality criterion is the total expected reward?

edited Apr 15 '21 at 18:09

asked Apr 15 '21 at 18:01

Vincent L.

MDP optimal policy inverse problem

0 Answers0