Can we use cross-validation to measure how well a distribution fits sample data?

Asked Oct 25 '12 at 06:42

Active Aug 17 '18 at 08:54

Viewed 326 times

Let's say I have a data set $X = [1,2,3,4,5]$. And I want to measure how close it is to a Gaussian distribution. Is there a way to use cross-validation to do this?

For example, if I do leave-one-out cross-validation, I could calculate the sample mean and variance for $2...5$, then calculate $P(1)$. I repeat this procedure for each of the elements in the data set, and average the results. If the resulting average is "high" then the distribution is fairly normal, but if it is "low" then it is not very normal.

It's not very clear to me exactly what "high" and "low" would mean in this case, so I'm wondering if there is any standard metric for doing this (like logloss or error rate in supervised learning). Or is this just a completely ridiculous idea to begin with?

I'd be particularly interested in books/academic articles that review some of pros/cons of this technique.

Note: My question is not how to measure how well a distribution fits a data set (like e.g. this question). I am specifically asking about how to use cross-validation to do this, and if that is ever a good idea.

edited Aug 17 '18 at 08:54

kjetil b halvorsen

63,378
26
142
467

asked Oct 25 '12 at 06:42

Mike Izbicki

1

What you are trying to do sounds similar to [jackknife](http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife) methods – jem77bfp Oct 25 '12 at 06:59
What is "$P(1)$"? – whuber Oct 25 '12 at 14:17

Can we use cross-validation to measure how well a distribution fits sample data?

0 Answers0