0

I.e., the data was generated from 5 normal distributions:

first <- rnorm(10000, sd=0.5)
second <- rnorm(1000, mean=2, sd=1.5)
third <- rnorm(800, mean=-2,sd=0.5)
fourth <- rnorm(500,mean=4, sd=2)
fifth <- rnorm(600, mean=-4, sd=0.25)

I know number of distributions in mixture (for this case 5). What I want to do: infer means. I know dependency between means and SDs: $\sigma = f(\mu)$ and I know $f$. Also I know: if we have data generated from normal distribution with mean $\mu_1$, we also have substantial amount of data generated from $-\mu_1$ (with other SD).

Is modified $k$-means clustering "optimal" for this problem? Or are there more sophisticated algorithms, that can be helpful?

UPDATE1: Lab mates told me about the package mclust. I used it and obtained "good" results just after cutting of everything that is between -1SD and 1SD from 0 (it did not work with original data). But I am pretty sure that the additional information can help for the performance.

enter image description here

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
German Demidov
  • 1,501
  • 10
  • 22
  • 1
    You can check recent thread: https://stats.stackexchange.com/questions/192309/estimate-group-averages/192333#192333 – Tim Feb 01 '16 at 12:23
  • @Tim thank you, I guess I should just modify the procedure to consider 2-means-step in one. Do not know how to do it, but hope will figure it out. – German Demidov Feb 01 '16 at 12:28
  • Tools designed for finite mixture models are designed for this kind of problems, k-means is much more primitive algorithm – Tim Feb 01 '16 at 12:34

0 Answers0