nonlinear least squares versus maximum likelihood in R, nls() or nlm()?

Question

I am estimating the model $$E(Y|X) = Pr(Y=1|X) = \alpha_0 + (1 - \alpha_0 - \alpha_1)\phi(X'\beta),$$ where $\alpha_0$ and $\alpha_1$ are parameters, $\beta$ is a $p$-length vector of parameters, $X$ is a $p \times n$ matrix of data, $Y$ the dependent variable is a binary, and $\phi()$ is a probit model, so the cumulative distribution function of the standard normal distribution. To derive the expectation the assumption was made that the errors are normal and mean zero.

The source for the model is here (see equations 6 and 7), and per the paper I can estimate the model either via nonlinear least squares or maximum likelihood. I tried both approaches in R, using the nls() function for nonlinear least squares and the nlm() function for maximum likelihood. Experimentation suggests the results are very similar for my application, but nls() is faster. Is there a reason to favor one approach over the other? How should I think about picking a method, e.g. do similar assumptions underly both approaches?

Any suggestions for thinking through the differences between these two approaches, or suggestions for relevant literature to consult would be greatly appreciated.

With regards to faster solution did you ensure that nls and nlm used same optimizer? If you used different optometrist that could explain why one would be faster than the other. — forecaster, Jan 30 '17 at 02:03
Good question. I did not check that - I will investigate. Speed is not so important in my application that I would be ill-disposed towards using either model. I'm more interested in knowing whether there are theoretical reasons to prefer one over the other. — gfgm, Jan 30 '17 at 02:17
What kind of variable is $y$? Is it numerical, categorical, binary, etc.? What is $\phi$? And what is your statistical model? What you wrote is not a model - there is no error term. At most, it could be an expression for the conditional mean of $y$ with respect to $x$. If $y$ is a continuous random variable and you assume additive iid Gaussian errors with zero mean and constant variance, then the NLS estimator and the MLE estimator are the same. — DeltaIV, Feb 01 '17 at 15:28
Thanks @DeltaIV, I have edited the post to make it clearer. It is an expression of the conditional mean of y with respect to x as you surmised, and y is binary. — gfgm, Feb 01 '17 at 15:43

score 6 · Accepted Answer · answered Feb 07 '17 at 15:54

To derive the expectation the assumption was made that the errors are normal and mean zero.

If that is your assumption, MLE and NLS should be mathematically identical, and differences would probably be explained by the choice / setting of the optimizer.

Whether a normal distribution for a binary response is a good idea is another question. An alternative would be a logistic glm with your nonlinear predictor, estimated with MLE.

If you do MLE, you might want to consider using https://cran.r-project.org/web/packages/bbmle/index.html instead of nlm(), more options for CIs and so on.

nonlinear least squares versus maximum likelihood in R, nls() or nlm()?

1 Answers1