7

I want to maximize a log-likelihood function (L) that is a function of parameters $\beta_i$ for $i=1,..,k$ and $\alpha_1, \alpha_2$. Ideally, I want to perform the estimation of all parameters in one step. Unfortunately, I cannot perform one step estimation due to the form of the model I have. But for some fix values of $\alpha_1, \alpha_2$, I can find the maximum likelihood estimations of $\beta_i$ for $i=1,..,k$. So what I did is that I created a function like $f(\alpha_1, \alpha_2)$ and defined it as $f(\alpha_1, \alpha_2)=L(\alpha_1, \alpha_2,\hat{\beta})$ i.e. the log-likelihood corresponding to $\alpha_1, \alpha_2$. Then I maximized this function $f$ numerically with respect to $\alpha_1, \alpha_2$.

Does this approach solve the inconsistency of the two stages estimation that I have? Is this a valid approach at all? If not, is there any other estimation method that I can use?

Stat
  • 7,078
  • 1
  • 24
  • 49
  • 1
    I think that what you are doing is a form of "Coordinate descent". If you google it you should find quite a lot of literature on the topic, as it is a widely used class optimization of algorithms. – Matteo Fasiolo May 08 '14 at 22:20

1 Answers1

2

This seems analogous to a profile likelihood approach. If no matter what $\beta$ is, you will always get the same MLE for $(\alpha_1, \alpha_2)$, then you can maximize with respect to $(\alpha_1, \alpha_2)$ first, then maximize with respect to $\beta$ conditional on$(\hat{\alpha_1}, \hat{\alpha_2})$ .

If that's not true, then you could switch to a 2-dimensional profile likelihood approach. What you can do is for each point on a suitably fine two-dimensional grid of $(\alpha_1, \alpha_2)$, compute the MLE for each of the $\beta$'s. The overall MLE will involve computing the joint likelihood, $L(\alpha_1,\alpha_2, \hat{\beta})$ for every point on the grid. It's a brute force method but it's guaranteed to find the joint MLE.

jsk
  • 2,810
  • 1
  • 12
  • 25
  • Thanks jsk. The 2-dimensional profile likelihood approach that you mentioned seems to be similar to what I did if we set $L(\alpha_1,\alpha_2,\hat{\beta})=f(\alpha_1,\alpha_2)$. I also used lower and upper bounds for $\alpha_1$ and $\alpha_2$ when numerically maximizing. The difference is that here we create a 2-dim table to find the optimum values. But I did it numerically. Am I right? – Stat May 08 '14 at 23:29
  • @Stat Not sure. Did you calculate a different $\hat{\beta}$ for every point in the 2-dim table? – jsk May 09 '14 at 00:00
  • Yes, I did. Depending on different values of $\alpha_1$ and $\alpha_2$, I have obtained different $\hat{\beta}$ (i.e. ML estimates) and therefore different log-likelihood. – Stat May 09 '14 at 00:01
  • @Stat Then yes, your approach sounds identical to a two-dimensional profile likelihood approach, which is itself a numerical approach. – jsk May 09 '14 at 00:09