I had read wiki and some sources. jmanton's blog, Wasserman's blog
the background is that:
You have Xi ∼ N(θi, 1), and we want to estimate the each θi.
Where Xi are independent
。
For MSE
risk of vector (X1,...,Xn), when n>=3.
- the MLE estimator $\hat θ_i = x_i$ is inadmissable .
- While James-Stein estimator is better than MLE in such case.
Looking from Shrinkage
quote from Larry Wasserman:
Note that the James-Stein estimator
shrinks
{X} towards the origin. (In fact, you can shrink towards any point; there is nothing special about the origin.)
I know the James-Stein estimator is special to MLE for its shrinkage behavior. I still dont get it, the variables are independent.
Why James-Stein estimator shrinkage
to arbitrary point can improve the MSE risk versus where MLE do not shrinkage
at all ??
and seeing this from Larry Wasserman:
This can be viewed as an empirical Bayes estimator ....
If view it from Empirical Bayes case, it is shrinkage to a overall mean, but that is nonsense if you have independent variable to estimate.
Is there better example or intuitive explanation for it?? THANKS.