26

Given an output from optim with a hessian matrix, how to calculate parameter confidence intervals using the hessian matrix?

fit<-optim(..., hessian=T)

hessian<-fit$hessian

I am mostly interested in the context of maximum likelihood analysis, but am curious to know if the method can be expanded beyond.

Etienne Low-Décarie
  • 1,563
  • 3
  • 16
  • 27

1 Answers1

37

If you are maximising a likelihood then the covariance matrix of the estimates is (asymptotically) the inverse of the negative of the Hessian. The standard errors are the square roots of the diagonal elements of the covariance (from elsewhere on the web!, from Prof. Thomas Lumley and Spencer Graves, Eng.).

For a 95% confidence interval

fit<-optim(pars,li_func,control=list("fnscale"=-1),hessian=TRUE,...)
fisher_info<-solve(-fit$hessian)
prop_sigma<-sqrt(diag(fisher_info))
prop_sigma<-diag(prop_sigma)
upper<-fit$par+1.96*prop_sigma
lower<-fit$par-1.96*prop_sigma
interval<-data.frame(value=fit$par, upper=upper, lower=lower)

Note that:

  • If you are maximizing the log(likelihood), then the NEGATIVE of the hessian is the "observed information" (such as here).
  • If you MINIMIZE a "deviance" = (-2)*log(likelihood), then the HALF of the hessian is the observed information.
  • In the unlikely event that you are maximizing the likelihood itself, you need to divide the negative of the hessian by the likelihood to get the observed information.

See this for further limitations due to optimization routine used.

Etienne Low-Décarie
  • 1,563
  • 3
  • 16
  • 27
  • Corey Chivers provided the answer. – Etienne Low-Décarie Apr 25 '12 at 21:12
  • Have you verified that this method gives confidence intervals with the correct coverage? I was looking for confidence intervals for optim parameters too, but this method seems to give intervals that were too wide in the examples I tried, although it's possible that I made a mistake. – mark999 Apr 26 '12 at 00:36
  • No, I have not yet done simulations to check this out. It would be great to compare this to other methods. – Etienne Low-Décarie Apr 26 '12 at 11:34
  • 2
    (+1) The **inverse** of the negative hessian is an estimator of the asymptotic covariance matrix - I know this appears in your code but I think it's important to point out. – Macro Apr 26 '12 at 12:20
  • @mark999, I have used a similar method in a number of situations to get confidence intervals and have found the coverage probabilities to be about right, although I had a decent sample size (this is an asymptotic result). But, I estimated the fisher information by the (co)variance of the score function (using individual scores as samples from the score function) - this method should produce similar results, though. – Macro Apr 26 '12 at 12:31
  • @Macro, thanks for your comment. Does the above method only apply if the function being optimised is a likelihood function? I was using `optim` to fit a non-linear function to some data using least squares. I was using `optim` instead of `nls` because it was harder to get `nls` to converge to a solution. – mark999 Apr 26 '12 at 20:20
  • Yes, I was referring to when the optimized function is the log-likelihood. The non-linear least squares solution can still be thought of as an MLE if the errors are normally distributed. Note that there are other estimating equations that have the property that the (inverse) negative expected hessian is the asymptotic covariance matrix. – Macro Apr 26 '12 at 20:51
  • 1
    Excellent answer, should the upper and lower bounds read `upper – forecaster Jun 16 '15 at 00:03
  • 4
    Why not delete line 4? – Jason Sep 12 '16 at 19:34
  • 3
    Is including the line `prop_sigma – Mark Miller Jan 21 '17 at 02:58
  • One important distinction I was not aware of: For this to work, you need to be optimizing the *sum* of the log-likelihood, not the *mean*. – jwdink Apr 22 '19 at 18:48
  • FWIW `numDeriv::hessian()` might be worth a look. https://stat.ethz.ch/pipermail/r-help/2009-August/401566.html – swihart Apr 01 '20 at 14:50