Selecting appropriate likelihood during non-linear regression

Question

When performing regression to fit a function, $f (x,{\bf \beta})$, to a set of observed data, $y_i(x_i)$, we are seeking to optimize the parameters, $\beta$, of the fitting function, to minimize some relative measure of deviation between $f (x_i,{\bf \beta})$ and $y_i(x_i)$.

For a non-linear least squares regression model that assumes heteroscedastic errors, $\sigma_{y,i}$ on each measurement and $\sigma_{x,i}$ on the explanatory variable, we are trying to maximize the likelihood (i.e. arbitrary measure of similarity) of $$ L \propto \prod_i e^{-\frac{1}{2} \left( \frac{y_i-f (x_i + \sigma_{x,i},{\bf \beta})}{\sigma_{y,i}} \right)^2} $$

by optimizing the parameters $\beta$ in our model. This is equivalent to minimizing the relative deviation between our fit and measurements. But is there an explicit reason why the likelihood takes the Gaussian form displayed above? If our errors $\sigma_{y,i}$ and $\sigma_{x,i}$ were not normally distributed but instead followed an arbitrary distribution (e.g. Poisson, Lorentzian), how would the above likelihood function have to be modified?

That likelihood doesn't look quite right: the sum $x_i + \sigma_{x,i}$ doesn't represent anything meaningful. One issue in formulating any "errors in variables" model like this concerns what you're trying to achieve: do you want to estimate the $(x_i,y_i)$ relationship or are you interested in predicting the $y_i$ from the observed ("noisy") values of the $x_i$? — whuber, Jan 25 '19 at 21:49
I am trying to predict $y_i$ from the observed $x_i$ which may be noisy. Otherwise, how else would you suggest accounting for errors in the independent variable while trying to find the correct $(x_i,y_i)$ relationship? — Mathews24, Jan 25 '19 at 22:51
You describe a model $$Y_i=\alpha+\beta(X_i+\delta_i)+\gamma_i=\alpha+\beta X_i + (\gamma_i+\beta\delta_i)$$ with random errors $\delta_i$ and $\gamma_i.$ Writing $$\epsilon_i = \gamma_i+\beta\delta_i$$ casts this as a standard linear regression. Fit it using any procedure appropriate for the assumed joint distribution of $(\gamma_i,\delta_i)$ and predict $$\hat Y_i = \hat\alpha+\hat\beta(X_i+\delta_i).$$ — whuber, Jan 26 '19 at 14:18
Yes, because the errors $\delta_i$ (or $\sigma_x$) and $\gamma_i$ (or $\sigma_y$) are random and (possibly) independent in the above. I did not write $\epsilon_i$ as you did since this is not necessarily linear regression. I am asking explicitly about the correct procedure to fit the model, $f$, given some arbitrary and possibly different distributions from which $\delta_i$ and $\gamma_i$ can be sampled. — Mathews24, Jan 26 '19 at 21:27
Could you be more specific about *how* the regression is nonlinear? For details of what I mean by this, see my characterizations of forms of nonlinearity in regression models at https://stats.stackexchange.com/a/148713/919. — whuber, Jan 26 '19 at 22:27
The model is nonlinear in its parameters. For example, $f = e^{\beta x}$ is nonlinear in the free parameter, $\beta$, as can be observed when performing the Taylor expansion of $f$. According to your link, this would be a nonlinear model of a nonlinear relationship with additive errors (along with error in the independent variable). — Mathews24, Jan 26 '19 at 23:08

score 1 · Accepted Answer · answered Jan 25 '19 at 21:39

Maximum likelihood is a common way to estimate the parameters of a probability density function. By definition, the likelihood of a sequence $x_1, x_2, .., x_n$ drawn i.i.d of a distribution that has for probability distribution function $f$ can be written as : $$L = \prod_{i=1}^n f(x_i)$$

In your example, the Gaussian form comes from the fact that it is assumed that $y_i \sim \mathcal{N} \left(f \left( x_i \sigma_{x,i}, \beta\right), \sigma^2_i \right)$ and therefore if you take the formula I wrote above, then it gives you what you wrote.

As you said, in the case of regression with the Gaussian assumption, then maximum likelihood is equivalent to the ordinary least square (OLS) method. It can be shown quite easily by taking the logarithm and then considering only the term that concerns the parameters you want to optimize, $\beta$ is this case.

If, however, you use a generalized linear model, that means that you consider that the underlying distribution belongs to the exponential family (which includes Poisson, for instance), the distribution can then be written as :

$$ f \left(y_i, \theta_i, \phi \right) = \exp \left\{ \dfrac{y_i \theta_i - b \left( \theta_i \right)}{a \left( \phi \right)} + c(y_i, \phi) \right\}$$

The associated log likelihood function follows :

$$ \log \left(L \right) = \sum_{i=1}^n \dfrac{y_i \theta_i - b \left( \theta_i \right)}{a \left( \phi \right)} + \sum_{i=1}^n c(y_i, \phi) $$

Note that there is not always an explicit solution, sometimes numeric methods need to be used.

The issue in this question concerns the fact that the variables $x_i$ are not directly observed. How does your answer address this? The likelihood in the question is incorrect, so if your formulation also yields it (as you claim), there must be something wrong with it, too. Basing the regression on the products $x_i\sigma_{x,i}$ needs justification: why does that make any sense? — whuber, Jan 25 '19 at 21:52
Also, since I haven't assumed a particular form for $f$ (e.g. $f = e^{\beta x}$ is a possibility), this is generally equivalent to non-linear least squares and not just ordinary least squares, no? — Mathews24, Jan 25 '19 at 23:14
You're right @whuber, I did not think about it but yes the formulation is wrong, which also leads to misspecification in the distribution I wrote, the answer was more about to show how to get from a distribution to its associated likelihood, but you're right. — MDF, Jan 26 '19 at 08:49

Selecting appropriate likelihood during non-linear regression

1 Answers1