4

I am given the values for mean, co-variance, initial_weights for a mixture of Gaussian Models. Now how can I generate samples given those: In brief, I need a function like

X = GMMSamples(W, mu, sigma, d)

where W: weight vector, mu - mean vector, sigma - covariance vector, d - dimensions of samples How can I implement it in python ? I found scipy library that has GaussianMixture library. It basically takes input as sample values and calculate itself mean, co-variance. But for my case it is almost reverse. I am given mean, co-variance, and parameters mentioned above and I need to generate sample data values. Thank you.

Tim
  • 108,699
  • 20
  • 212
  • 390
Shyamkkhadka
  • 243
  • 1
  • 4
  • 12

1 Answers1

9

Sampling from mixture distribution is super simple, the algorithm is as follows:

  1. Sample $I$ from categorical distribution parametrized by vector $\boldsymbol{w} = (w_1,\dots,w_d)$, such that $w_i \ge 0$ and $\sum_i w_i = 1$.
  2. Sample $x$ from normal distribution parametrized by $\mu_I$ and $\sigma_I$.

This thread on StackOverflow describes how to sample from categorical distribution.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Tim, can you please elaborate more ? – Shyamkkhadka Oct 31 '16 at 15:04
  • What is unclear for you? – Tim Oct 31 '16 at 15:04
  • Sample I from categorical distribution means what ? What is the role of value I ? – Shyamkkhadka Oct 31 '16 at 15:08
  • @Shyamkkhadka Actually I found other thread that asks about exactly the same thing http://stats.stackexchange.com/questions/226834/sampling-from-a-mixture-of-two-gamma-distributions/226837#226837 , so please check it as it is a duplicate of your question. – Tim Oct 31 '16 at 15:09
  • @Shyamkkhadka the idea of a mixture is that you have $d$ components, each appearing with probability $w_i$, so $I$ is a way of saying "take $I$-th component with probability $w_i$", what follows from the definition of mixture distribution. – Tim Oct 31 '16 at 15:10
  • Ok. But in second step your mentioned algorithm that I-th component is not used anymore . I have to implement it in a code. Can you please elaborate more the algorithm ? – Shyamkkhadka Oct 31 '16 at 15:15
  • @Shyamkkhadka it *is* used since you sample from the $N(\mu_I,\sigma_I)$ distribution, i.e. from the distribution of $I$-th component that has mean $\mu_I$ and sd $\sigma_I$. You can check the linked thread for a code example – Tim Oct 31 '16 at 15:17
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/47763/discussion-between-shyamkkhadka-and-tim). – Shyamkkhadka Oct 31 '16 at 15:25