Linear regression with error dispersion dependent on the independent variable

Question

Suppose $y=ax+z$ where $x, y, z$ are random variables with range in $\mathbf R$, $\mathbf E[x]=0$, the probability distribution $p(z|x)$ is

1) normal distribution $N(0,\sigma(x)^2)$ with mean $0$ and standard deviation $\sigma(x)$ as an unknown function of $x$;

2) student t-distribution $t_{\nu(x)}$ with degrees of freedom $\nu(x)$ an unknown function of $x$,

and $a$ is an unknown constant. Suppose $(x_i,y_i)_{i=1}^n$ is a set of tuples of sample observation of $(x,y)$. How do we estimate the following functions?

1) $(a,\sigma(x))$;

2) $(a,\nu(x))$.

Note: This is not the heteroscedasticity problem in the conventional sense where the dispersion parameter depends on the index $i$. The dispersion parameter now depends on the independent variable $x$.

a little more context would be nice. It would be fairly straightforward to write down the maximum likelihood equations for this, and *maybe* to solve for the MLE (I haven't tried). Computationally, you could consider this (at least the first, and maybe the second) a *generalized least squares* problem, and fit it (e.g. with `gls()` from the `nlme` package in R. — Ben Bolker, Feb 22 '19 at 03:46

score 3 · Accepted Answer · answered Feb 22 '19 at 05:10

3

My guess is you can reasonably estimate $a$ with OLS in (1) and a maybe a more robust estimation in (2) like like IRLS (though maybe OLS might still be ok).

Estimating $\sigma(x)$ and $\nu(x)$ is harder. I think you need to decide how to parameterize or quantize wrt $x$. For instance, if you choose some function with parameters $\theta$, and define your estimate to be $\hat{\sigma}(x)=f(x;\theta)$, then you can numerically fit $\theta$ by maximizing the log-likelihood over the dataset $(x_i, z_i)=(x_i,y_i-ax_i)$. The choice of $f$ represents some level of "prior" over $\sigma(x)$ or $\nu(x)$.

I guess you can also jointly estimate $a$ and $\sigma$ or $\nu$ together by combining the two optimization approaches above in an alternating manner.

Related Links

answered Feb 22 '19 at 05:10

user3658307

1,754
1
13
26

+1 and thank you for the prompt and informative answer. Let me read through the links. – Hans Feb 22 '19 at 08:51
@Hans hopefully it helps. PS I think you can get/estimate $\sigma$ from robust standard errors in (1). The same might also work for (2). Perhaps see https://stats.stackexchange.com/questions/275925/linear-regression-with-changing-variance or maybe https://stats.stackexchange.com/questions/258485/simulate-linear-regression-with-heteroscedasticity – user3658307 Feb 22 '19 at 13:10
Sorry for the late acceptance of your answer. Thank you again for your richly informative answer. – Hans Apr 14 '19 at 22:05

Linear regression with error dispersion dependent on the independent variable

1 Answers1

Linked