2

In the book Computer Age Statistical Inference the James-Stein estimator is introduced. Brad Efron runs through an example where batting averages are estimated from each players 90 at-bats.

$$p_i\sim N(P_i,\sigma_0^2)$$

where $\sigma_0^2$ is the binomial variance

$$\sigma_0^2=\bar{p}(1-\bar{p})/90$$

Here $P_i$ is the true average (which is averages from about 300 additional at-bats). $\bar{p}$ is the average of the $p_i$.

All of the above is very clear, but I am working with a situation where the value I have for each player is not 90. In fact, they are never equal. For one player I have an 1080 at-bats, for another I have 1223, and for another 1700, etc.

What is my denominator in $\sigma_0^2=\bar{p}(1-\bar{p})/90$ when I have unequal numbers of at-bats? Is the average a good value?

sdittmar
  • 206
  • 1
  • 9
Alex
  • 888
  • 9
  • 24

1 Answers1

1

Technically, your example has unequal variances, which is a special, more complex case of the Stein estimation problem. See Stein Estimator with unequal variances or Stein Estimator with unequal variances - Part 2 for more details.

However, given that n is rather large in your example and the underlying variances should not be so different, using the averages may be a practical although not entirely correct path.

sdittmar
  • 206
  • 1
  • 9