4

The asymptotic variance of a maximum likelihood estimator can be obtained from the inverse of the Hessian of the log-likelihood function at the MLE, and the variance of derived quantities can be obtained from the multivariate delta method.

After developing a new model, I'd like to verify the accuracy of these variance estimates in a finite sample size situations, to get a sense of how big N needs to be for the error bars derived by these methods to be useful.

Assume that I can generate $T \rightarrow \infty$ data sets from some underlying data generating process. For each dataset, $i$, I fit a maximum likelihood model to estimate a statistic and its uncertainty, ($\hat{\mu}^{(i)}$, $\sigma_\hat{\mu}^{(i)}$). Then, what statistic should I use to check that the empirical distribution of the MLE is consistent with what would be expected based on the distribution of the estimated asymptotic variances?

Poor solution #1:

  • Select one of trials to be "special", say the first ($i=1$), and assume that the know the "true" value of $\mu$. We can use the rest of the trials to try to verify $\sigma_\hat{\mu}^{(1)}$. The remaining trials, $i>1$, $\hat{\mu}^{(i)}$ should be distributed as $\mathcal{N}(\mu, \sigma_\hat{\mu}^{(0)})$, which we could check with a Q-Q plot or something. But this seems very wasteful, because it only uses one of the $\sigma_\hat{\mu}$s. What's the better way?
  • Distributions do not much estimates. Distributions much distributions. Form the frequency distribution of the standardized obtained estimates, to be compared with/tested against their theoretical asymptotic distribution. – Alecos Papadopoulos Dec 16 '14 at 05:20
  • I tried to clarify the wording of the question. – Robert T. McGibbon Dec 16 '14 at 05:34
  • Generate the T data sets using the same underlying parameters so you have hypothetical/theoretical parameters to compare to. Any other procedure wouldn't accomplish what you're trying to accomplish – shadowtalker Dec 16 '14 at 14:24

2 Answers2

0

It appears possible to get a pretty good test for this based on the empirical distribution of the standardized deviations between the different estimates, $z_{ij} = \frac{\theta_i - \theta_j}{\sqrt{\sigma^2_i + \sigma^2_j}}$, which should be $\mathcal{N}(0,1)$ if everything is kosher.

I made a short example in the IPython notebook.

0

Simple example: The MLE of the mean $\mu$ is known to be $\bar x$. For an iid sample, the asymptotic variance is $\sigma^2$ ($\sqrt{n}(\bar x -\mu)\to_d N(0,\sigma^2)$). The MLE of $\sigma^2$ is $\sum_i(x_i-\bar x)^2/n$.

In this example, we know the MLE of $\sigma^2$ is biased, but if the situation is more complicated, you could simulate the bias like so (a Monte-Carlo simulation):

mlesig <- function(n, mu, sigma){
  x <- rnorm(n, mu, sigma)
  xbar <- mean(x)
  sighat.sq <- mean((x-xbar)^2)
  bias <- sighat.sq - sigma^2
}

mean(replicate(10000, mlesig(10, 1, 2)))

Here, the analytical bias is $-\sigma^2/n$, which equals $-0.4$ for the given parameter values, which is simulated rather precisely.

You might similary also assess the variance of the variance estimator, etc.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106