1

I have read in several papers, that one can regress the squared residuals of some conditional mean regression of a variable $X$ on a set of predictor variables and interpret the fitted values as the conditional variance of $X$. E.g.:

One can regress excess returns onto a set of conditioning variables. The resulting squared residuals then will be regressed onto the same set of conditioning variables. The conditional variance will be the fitted values from the second regression (Filipovic & Khalilzadeh, 2021).

What is the underlying logic here? The squared residuals are interpreted as the variance of $X$, correct? Also, why does it have to be the same set of conditioning variables. I have not yet come across this approach and I am unaware of the reasoning behind it.

shenflow
  • 750
  • 8
  • 20

1 Answers1

1

You can check my question Analysing the residuals themselves, what you are describing is a variation of residual model or residual index model. What the authors seem to be doing is a poor man's latent variable model, where instead of using a proper latent variable model, this is done in two stages. It can be done, but in many cases approaches like this would be suboptimal. Notice that with first regression you assume that the errors are independent and identically distributed, but the second model makes the opposite assumption, so obviously one of the models needs to be biased.

More reasonable approach would be to have a single model that models both mean and variance without the assumption of constant variance. Such "custom" models can be easily fitted using probabilistic programming frameworks like Stan or PyMC, or R's nlme package that let's you define regression models with custom variance components.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you for your answer. Since I am usually using Python, PyMC seems interesting. However, what would be a such a "custom" model? (That I could then implement via PyMC.) – shenflow Feb 15 '22 at 10:15
  • @shenflow it depends what assumptions you wish to make about the model. But in general, something like $$ \mu = f(X) \\ \sigma = g(X) \\ y \sim \mathcal{N}(\mu, \sigma) $$ whre the $f$ and $g$ are linear regressions (or something else). – Tim Feb 15 '22 at 10:18
  • Okay I understand. And what would be my LHS variable in $g(X)$, as opposed to the squared residuals resulting from $f(X)$ as in the abovementioned approach? – shenflow Feb 15 '22 at 10:31
  • @shenflow there is no LHS, you can write the same as $y \sim \mathcal{N}(f(X), g(X))$. – Tim Feb 15 '22 at 10:34