Applying Bayes rule in the context of reinforcement learning

Question

I was watching this video on reinforcement learning. At 1:28, it says following:

$$Pr(s'|a,z,s)=\frac{Pr(z|s',a,s)Pr(s'|a,s)}{Pr(z|a,s)}$$

I was unable to get how this was obtained. I pondered a bit and come up with possible reasoning. But still unsure if I am correct. This is what I have thought:

$Pr(s'|a,z,s) = \frac{Pr(s',a,z,s)}{Pr(a,z,s)} $ ... equation (1) by Bayes theorem
$Pr(z|s',a,s) = \frac{Pr(s',a,z,s)}{Pr(a,s',s)}$ ... by Bayes theorem
$\therefore Pr(s',a,z,s) = Pr(z|s',a,s)Pr(a,s',s)$ ...equation (2)
$Pr(s'|a,z,s) = \frac{Pr(z|s',a,s)Pr(a,s',s)}{Pr(a,z,s)}$ ... By putting equation (2) in equation (1)

Now I need to prove $Pr(s'|a,s) = Pr(s',a,s)$ and $Pr(z|a,s) = Pr(z,a,s)$. As per the context available to me, even $a\cap s=(a,s)$ seems to form the whole sample space (I am not sure of this though, it seems so after watching the video from starting). That is both events $s'$ and $z$ are subset of Event $(a,s)$. Will that make $Pr(s'|a,s) = Pr(s',a,s)$ and $Pr(z|a,s) = Pr(z,a,s)$? If yes, then I guess I will be able to get the original quoated equation. Am I correct with this?

PS: I believe $Pr(s'|a,z,s)$ means $s'$ depends on all $a,z$ and $s$.

I answered a somehow similar question some time ago : https://stats.stackexchange.com/a/493683/271601 The main idea is that Bayes equality ($p(A|B) = \frac{p(B|A)p(A)}{p(B)}$) is still valid with conditional probabilities. You can write $p(A|B,C) = \frac{p(B|A,C)p(A|C)}{p(B|C)}$. — Camille Gontier, Dec 13 '20 at 13:27

score 2 · Accepted Answer · answered Dec 13 '20 at 13:29

You don't need any further assumptions. Simple way to see this is to remove all RVs common in the given side (i.e. right of $|$): $$P(s'|z)=\frac{P(z|s')P(s')}{P(z)}$$

You can add any set of RVs to given portion of these probabilities and obtain a valid formula.

Or:

$$P(s'|a,z,s)=\frac{P(s',a,z,s)}{P(a,z,s)}=\frac{P(z|s',a,s)P(s'|a,s)P(a,s)}{P(z|a,s)P(a,s)}=\frac{Pr(z|s',a,s)Pr(s'|a,s)}{Pr(z|a,s)}$$

Applying Bayes rule in the context of reinforcement learning

1 Answers1