I have sample set of a large amount (>10000 samples) but the distribution is not normal at all. If I derived the distribution of the sampling statistic (mean), it is not Normal. Hence, I cannot reliably use the standard confidence interval formula's for this. How can I get a more reliable estimate for the confidence interval of the sampling statistic here?
The underlying problem is that I am measuring # of hazardous events per mile of operation. Hazardous events are very rare occurrences so most of my samples are 0 events per mile and the occasional event that occurs causes 1 event in 100 miles which is 0.01 or samples like that. This is why the distribution of this data is very skewed and doesnt even allow normal distribution as the sampling distribution of the mean event per mile.
One idea I had is to derive the sampling statistic distribution empirically by drawing with replacement N samples from the sample set and then deriving the distribution of the statistic from there. I.e. how central limit theorem is constructed. Then I take the 5% and 95% area under curve of that distribution in order to derive the confidence interval. Would this be always a more reliable albeit less accurate (if CLM holds) approach?