Variable data measured discretely with poor resolution

Question

Is there an appropriate method for estimating variation when variable data is measured discretely where the resolution is poor. Here's an example: You are looking at how long it takes for a solid to degrade while in solution, and you check it twice a day, n=15. At 120 hours they are all there. At 132 hours 10 remain, and by 144 hours they are all gone. Is it possible to get a reasonable estimate of the mean and standard deviation based on this?

BruceET · Answer 1 · 2020-10-29T20:28:19.680

Estimates of $\mu$ and $\sigma$ will necessarily be rough with observations spaced so far apart. You might guess that $\sigma = (144-120)/6 = 4.$ If you saw about half of the $n=15$ were degraded by hour $132$ then you might suppose the population mean (also median) must be about $132.$ However, with only five of fifteen degraded (about $1/3),$ Then you might suppose that the population mean (also median) must be a little above $132.$

Then we see, from R, that the normal distribution $\mathsf{Norm}(\mu=133.72, \sigma=4)$ has $0.03\%$ of its probability below $120,$ $33\%$ below $132,$ and more than $99.5\%$ below $144,$ as shown by the R computation below.

pnorm(c(120, 132, 144), 133.72, 4)
[1] 0.0003017906 0.3335978206 0.9949150743

So if you assume degradation time is normal and that $120h$ and $144h$ are not too long before degradation begins and ends, respectively, then $\mu = 133.72, \sigma = 4$ are not inconsistent with your observations.

Slightly different approaches and slightly different (or additional) assumptions may give somewhat different conclusions about $\mu$ and $\sigma.$

For example, assuming that $\sigma = 3,$ so that there eight standard deviations between $120$ and $144,$ leads to estimating the mean as $\mu = 133.29,$ as shown below. Similarly, assuming that $\sigma = 24/5 = 4.8$ implies that $\mu = 134.06.$ Although I prefer the solution above (with $\mu = 133.72),$ it seems difficult to rule out either of these two possibilities.

pnorm(c(120, 132, 144), 133.29, 3)
[1] 4.711654e-06 3.335978e-01 9.998215e-01

pnorm(c(120, 132, 144), 134.06, 24/5)
[1] 0.001699361 0.333900970 0.980812813

Note: Quantile $0.333$ of the standard normal distribution is about $-0.43.$ Then $\sigma = 4$ implies that the mean $\mu$ must satisfy $132 = \mu - 0.43(4)$ and $\mu \approx 133.72,$ which is slightly above $132.$

132 - 4*qnorm(1/3)
[1] 133.7229

Getting 5 or more of the new particles out of 20 would lead to rejection (5% level) of the null hypothesis of uniform mixing of particles.

qhyper(.95, 100, 900, 20)
[1] 4
phyper(4, 100, 900, 20)
[1] 0.9585121

Variable data measured discretely with poor resolution

1 Answers1