1

I have two agents that both follow a baseline behavioral policy pi(a|s). If I then modify the state-action distribution for the two agents (resulting in two new policies), is there a standard measure I can use to tell how "far" the policies are from the baseline policy, or each other?

More generally, Is there a standard way to measure the difference between two policies that operate on the same state space?

Divergence measures (like KL divergence) are useful, but are not actually distance metrics, so I'm wondering whether there's some known method I'm missing.

Dirk
  • 111
  • 2
  • Hey posted the answer (no), and a link to that question. if you still think its without value I'll delete. – Dirk Jan 28 '21 at 18:54
  • 2
    Your question's on-topic here, so no need to delete: (1) it may attract further valid answers (the question on AI doesn't insist on "true" distances), & (2) it remains, for people searching on CV, as a pointer to the answer on the AI site (on which they may find much else of interest besides). – Scortchi - Reinstate Monica Feb 02 '21 at 16:40

1 Answers1

0

It looks like the answer is, no, there not known distance metrics you can use to compare two policies.

Thanks to nbro for this question which is helpful and has a lot of good information about non-metric distances between policies: https://ai.stackexchange.com/q/25166/2444

Dirk
  • 111
  • 2