1

Let's say that I want to compute confidence interval of mean for a purity of crystal. I know for fact that purity of any chemical substance cannot exceed 100%. How can I construct confidence interval of mean, which a statistic that has upper, or lower bound?

Assume the following:

  1. sample size = 21, std = 3, mean = 99.1
  2. samples are normally distributed
  3. upper bound of mean is 100

With Scipy, I can construct its 95% confidence interval like this:

stats.t.interval(1 - 0.05, 21 - 1, loc=99.1, scale= 3 / np.sqrt(21))
>>> (97.73441637228476, 100.46558362771523)

The calculated upper bound for the confidence interval of mean exceeds 100, which is not physically possible in real life.

How do I deliver my conclusion in this case? Is truncating my interval above 100 good, like this?

>>> (97.73441637228476, 100)

Or is there any special modification that I need to make?

Eric Kim
  • 741
  • 4
  • 19
  • I'd try a logistic regression or beta regression. See [this post](https://stats.stackexchange.com/q/373835/21054). – COOLSerdash Jun 30 '19 at 09:17

3 Answers3

0

If you want to construct a proper confidence interval, one way to do it would be to assume another distribution than the normal. Indeed the normal distribution has unbounded support, but in your problem, your support is bounded between 0% and 100%.

C.D
  • 1
  • 2
0

Would you take a bayesian answer? You get a credible interval instead of a confidence interval, which is a different beast. But on the plus side it's very natural to have a prior that only has support from 0 to 1, then model each observation as coming from a truncated normal centered around the characteristic mean purity. If these seem like reasonable assumptions given your knowledge about crystals then the posterior will be a similarly good summary for talking about the crystal purity.

0

IIUC, your samples are not normally distributed (contrary to your point 2) but maybe rather according to a truncated normal. So you should use a confidence interval implementation for the truncated normal (or whatever distribution you deem most fitting).

frank
  • 1,434
  • 1
  • 8
  • 13