2

I can take samples of a random variable $X \sim U(a, b)$, where the length of $(a, b)$ is known. I am interested in its mean $E[X]$, estimated with $\hat{X_n} = \frac{1}{n} \sum_{i=1}X_n$, but I need some guarantees for this estimate. Specifically, given a $\delta > 0$, I want to find out how many samples I should take s.t. $(\hat{X_n} - \delta, \hat{X_n} + \delta)$ is a 95% confidence interval.

I would know how to do this for a normally distributed random variable, but I'm not sure about the uniform case. Maybe a very similar or even the same procedure can be justified using the CLT?

corazza
  • 121
  • 2
  • 3
    The estimator $(X_{(1)}+X_{(n)})/2$ is both unbiased for the mean $(a+b)/2$ and has smaller variance than $\bar X$. You can derive its distribution and derive a confidence interval using a pivot similar to what I describe in https://stats.stackexchange.com/a/352879/77222 – Jarle Tufto Mar 14 '21 at 12:12

1 Answers1

1
  1. The central limit theorem applies in this situation and the normal approximation is actually good for fairly low numbers of observations (with $n=20$ you should be pretty safe).

  2. However, the midrange (maximum+minimum)/2 is a better estimate than the mean; it has a smaller variance and therefore allows for smaller confidence intervals at the same level.

  3. I don't have the time to look for the exact distribution of the midrange but it may be derived somewhere (sorry, incomplete answer). I don't think I have seen it written down in closed form but I'm pretty sure it can be done, although it may look complex. The Wikipedia page https://en.wikipedia.org/wiki/Continuous_uniform_distribution#Estimation_of_midpoint does it for the situation that the lower boundary is known to be zero, which is somewhat easier (more precisely, one can easily derive a CI for half the maximum, which in that case estimates the mean, from what is given there). One can also simulate it (and use equivariance properties for generalising to arbitrary values a and b) or use bootstrap.

  4. It may be that knowledge of the length (b-a) can be used to improve the confidence interval even more, but I'm not quite sure how. This would require more time than I'd be willing to put into a Cross-Validated answer.

Christian Hennig
  • 10,796
  • 8
  • 35
  • Thanks for the answer. I have two questions/notes however: (1) if my confidence interval is very small, it is possible that $n=20$ would not be enough, right? (2) I know the midrange value however, since I know $b - a$ (the length of the interval over which $X$ is uniform) – corazza Mar 14 '21 at 12:13
  • You can only know the midrange if you know both b and a; knowledge of (b-a) isn't enough to know the midrange. But if you know both a and b, the true mean is known to be (b-a)/2, so you don't need to estimate it. I'm not sure what you mean by "very small" for your CI but I'm pretty confident that $n=20$ is large enough a sample size for normal approximation. It's not possible to say more without knowing the precise data and information that you have. – Christian Hennig Mar 14 '21 at 12:30
  • I think I confused your "(maximum-minimum)/2" for $(b-a)/2$, since $b$ and $a$ are the bounds of my interval and hence the minimum and maximum values. But you're referring to the min/max of the sampled values correct? – corazza Mar 14 '21 at 12:35
  • Sorry, I meant "(maximum+minimum)/2", corrected! – Christian Hennig Mar 14 '21 at 12:38
  • Also (b+a)/2 in case you know a and b. – Christian Hennig Mar 14 '21 at 12:40
  • I don't know them, I only know $b-a$ (the length of the interval). Regarding $n=20$ samples, this surely depends on the length of the confidence interval, if you take a much smaller interval you might end up needing $n=400$ samples for the same confidence level. – corazza Mar 14 '21 at 12:41
  • True, fair enough. The statement about $n=20$ regards the validity of the CLT normal approximation. – Christian Hennig Mar 14 '21 at 12:42