Maximum Mean Discrepancy Implementation

Question

I am just beginning to learn about MMD as a way to measure the difference between two probability distributions using this tutorial. I want to implement it code-wise but I don't understand it empirically.

At a high level, it seems to be applying the kernel trick to two sets of data, then see the difference between the expected values of each "kernelized" dataset. Is that correct?

So you have one set of i.i.d. RV's, $X_1, X_2,..,X_N \sim P$ and another set of i.i.d. RV's $Y_1, Y_2,..,Y_M \sim Q$ . We want to know if probability distribution $P\neq Q$.

So first we find the estimated empirical kernel mean embedding (seems like an odd name to me) by applying some kernel function, $\phi$, to each dataset : $$\hat{\mu_p}= \frac{1}{N} \sum_{n=1}^N\phi(X_n)$$ $$\hat{\mu_q}= \frac{1}{M} \sum_{m=1}^M\phi(Y_m)$$

Then the MMD statistic can be estimated as:

\begin{aligned} \widehat{\mathrm{MMD}}^{2}=& \frac{1}{\mathrm{N}(\mathrm{N}-1)} \sum_{i=1}^{\mathrm{N}} \sum_{j \neq i}^{\mathrm{N}}\left(k\left(X_{i}, X_{j}\right)+k\left(Y_{i}, Y_{j}\right)\right) \\ &-\frac{1}{\mathrm{N}^{2}} \sum_{i=1}^{\mathrm{N}} \sum_{j=1}^{\mathrm{N}}\left(k\left(X_{i}, Y_{j}\right)+k\left(X_{j}, Y_{i}\right)\right) \end{aligned}

Questions are:

Is what I wrote above correct? (please clarify if need be)
How do you get the MMD estimation from the estimated kernel mean embeddings?
Does using the MMD two-sample test tell you how the two distributions differ? (either cause the mean is different or the variance or both)

Maximum Mean Discrepancy Implementation

0 Answers0