Confidence Interval when Central Limit Theorem Doesnt Hold

Question

I have sample set of a large amount (>10000 samples) but the distribution is not normal at all. If I derived the distribution of the sampling statistic (mean), it is not Normal. Hence, I cannot reliably use the standard confidence interval formula's for this. How can I get a more reliable estimate for the confidence interval of the sampling statistic here?

The underlying problem is that I am measuring # of hazardous events per mile of operation. Hazardous events are very rare occurrences so most of my samples are 0 events per mile and the occasional event that occurs causes 1 event in 100 miles which is 0.01 or samples like that. This is why the distribution of this data is very skewed and doesnt even allow normal distribution as the sampling distribution of the mean event per mile.

One idea I had is to derive the sampling statistic distribution empirically by drawing with replacement N samples from the sample set and then deriving the distribution of the statistic from there. I.e. how central limit theorem is constructed. Then I take the 5% and 95% area under curve of that distribution in order to derive the confidence interval. Would this be always a more reliable albeit less accurate (if CLM holds) approach?

If you have $> 10,000$ samples, the chances are very good that the sample mean is extremely close to Normally distributed unless you have some reason to believe the variance of the underlying distribution is infinite. After all, the Uniform distribution is "not normal at all", but a sample size of about 10 is good enough to get a sample mean that is reasonably close to Normally distributed for testing purposes... — jbowman, Nov 28 '17 at 01:51
I see. So the underlying problem is that I am measuring # of hazardous events per mile of operation. Hazardous events are very rare occurrences so most of my samples are 0 and the occasional event that occurs causes 1 event in 100 miles which is 0.01. This is why the distribution of this data is very skewed and doesnt even allow normal distribution as the sampling distribution of the mean event per mile. — SriK, Nov 28 '17 at 03:24
Thanks @kjetilbhalvorsen . Do you have a link describing this technique to get the confidence interval? Is it simply sampling with replacement? — SriK, Nov 29 '17 at 18:17
You need to read in introduction to the bootstrap, there is many. Search this site! start with https://stats.stackexchange.com/questions/19340/bootstrap-based-confidence-interval or https://stats.stackexchange.com/questions/26088/explaining-to-laypeople-why-bootstrapping-works — kjetil b halvorsen, Nov 29 '17 at 18:40

Confidence Interval when Central Limit Theorem Doesnt Hold

0 Answers0