Sandwich Estimator in Maximum Likelihood Estimation of Logit

Question

I am estimating a discrete choice model using mixed logit using Halton Draws. So everything is effectively done with MCMC. The code is written in MATLAB. I am using MATLAB's fminunc to do unconstrained maximization to maximize Total Likelihood of the data.

Everything works fine and I am able to get convergence. So now coming to the hardest part of computing standard errors. I am currently using Hessian of total likelihood function computed at the estimated parameter values. Let '$H$' be the Hessian of the total likelihood function '$G$' be the gradient of the total likelihood function at the estimated parameter values.

I understand that if I specified model correctly, I can simply use $H^{-1}$ as the covariance matrix of parameters and compute standard errors using square roots of diagonal elements of $H^{-1}$.

But, I am pretty sure that model is misspecified and also there are other reasons for that I would like to use robust standard errors computed using a sandwich estimator.

I am trying to compute sandwich estimator of covariance using the following formula:

$$\operatorname {Avar}\left[(\hat \beta_{ML}-\beta)\right] = H^{-1} (GG') H^{-1}$$

Here again, $H$ is the Hessian of total likelihood function at estimated parameter values and $G$ is the gradient vector of total likelihood function at the estimated parameter values.

However, I am getting some really small numbers like these when I use sandwich estimator:

 [1.05168538219926e-06
  5.9339853972794e-07
  7.01710867889658e-07
  7.97963642190759e-07
  1.13620559901437e-07
  1.13044986953749e-06
  6.55610827346093e-07
  4.43881339088365e-07
  1.84278850463206e-07
  2.22828877738744e-08
  4.10861810690839e-05
  4.54442639404077e-06
  2.21053267440383e-06
  3.37916846394782e-05
  8.28011445689887e-06]

Where as when I compute standard errors directly from Hessian using inverse of it $H^{-1}$, I get numbers like these

   [0.0577079149534054
    0.0404864700640107
    0.0461888908236772
    0.0261238062383719
    0.0102708428131226
    0.0281533361985351
    0.0140083835916406
    0.0363304266097451
    0.00659924659481049
    0.0138356739092131
    0.160015160061869
    0.148202640816394
    0.142970918172629
    0.165068255115429
    0.0475734540195273]

They are nothing alike and I am not sure where I am doing something wrong. I would extremely appreciate if somebody can point out where my computations are wrong. Thanks!

I'm not familiar with MATLAB but I suspect that the problem is the specification of the gradient. To compute the sandwich estimator $G$ needs to be the matrix of _observation-wise_ contributions to the gradient rather than the vector with the _sums_ of contributions over all observations. In your case $G$ seems to be the latter and hence is essentially zero because this is just the first-order condition of you optimization. — Achim Zeileis, Apr 19 '15 at 13:38
Thank you. That makes sense. I was thinking about this since many papers also wrote about sandwich estimator in that way. Can you please let me know how these contributions are computed?I am not able to get my head around this since my likelihood function cannot be differentiated analytically. I can compute entire gradient at parameter values but I am not sure how compute observation wise contributions to gradient. Thanks again for your very informative reply. — Chenna Cotla, Apr 19 '15 at 22:07
I am not particular about MATLAB here at all. If you use whatever language you are comfortable with that is just as fine and nice as using say just notation for things. Thank you. — Chenna Cotla, Apr 19 '15 at 22:20
You need the derivative of the log-likelihood for one observation with respect to the parameter vector. The sum of these should then yield the total gradient vector that you computed. And if your observations are independent, then to total gradient is just the sum of the observation-wise derivatives. I'm not familiar with the specific mixed logit model you want to estimate but I know that the R package `mlogit` supports some types of mixed logit models and returns the full matrix for the `$gradient`. This can then also be used with the `sandwich` package for estimating the covariance matrix. — Achim Zeileis, Apr 19 '15 at 22:34

Sandwich Estimator in Maximum Likelihood Estimation of Logit

0 Answers0