2

In this machine learning paper Overcoming catastrophic forgetting in neural networks, they present to you equation 1, the log of bayes rule:

$$ \log p(\theta|D) = \log p(D|\theta) + \log p(\theta) - \log p(D) \quad (1)$$

Where $\theta$ are the machine learning parameters and $D$ is the dataset you are fitting to. OK, i get that. Then they say that "the data is split into two independent parts, one defining task A ($D_a$) and the other task B ($D_b$). Then, we can re-arrange equation 1:"

$$ \log p(\theta|D) = \log p(D_b|\theta) + \log p(\theta|D_a) - \log p(D_b) \quad (2) $$

Can someone guide me step by step how they go from (1) to (2)? Many thanks in advance!

Sycorax
  • 76,417
  • 20
  • 189
  • 313
clam
  • 198
  • 2
  • 6

1 Answers1

3

Assuming both total and conditional independence, you have the following relationships: $$p(D)=p(D_a)p(D_b), \ \ p(D|\theta)=p(D_a|\theta)p(D_b|\theta)$$

Using these, we can write the original Bayes rule as $$p(\theta|D)=\frac{p(D_a|\theta)p(D_b|\theta)p(\theta)}{p(D_a)p(D_b)}=\frac{p(\theta|D_a)p(D_a)p(D_b|\theta)}{p(D_a)p(D_b)}=\frac{p(\theta|D_a)p(D_b|\theta)}{p(D_b)}$$

When taken the logarithm of both sides, you obtain (2).

gunes
  • 49,700
  • 3
  • 39
  • 75