1

Let's say I have a data set $X = [1,2,3,4,5]$. And I want to measure how close it is to a Gaussian distribution. Is there a way to use cross-validation to do this?

For example, if I do leave-one-out cross-validation, I could calculate the sample mean and variance for $2...5$, then calculate $P(1)$. I repeat this procedure for each of the elements in the data set, and average the results. If the resulting average is "high" then the distribution is fairly normal, but if it is "low" then it is not very normal.

It's not very clear to me exactly what "high" and "low" would mean in this case, so I'm wondering if there is any standard metric for doing this (like logloss or error rate in supervised learning). Or is this just a completely ridiculous idea to begin with?

I'd be particularly interested in books/academic articles that review some of pros/cons of this technique.


Note: My question is not how to measure how well a distribution fits a data set (like e.g. this question). I am specifically asking about how to use cross-validation to do this, and if that is ever a good idea.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Mike Izbicki
  • 363
  • 1
  • 11

0 Answers0