Can I predict the variance of a random variable using a machine learning regression model that predicts expected outcomes?

Question

For example, suppose I'm using some machine learning model like gradient boosting that, given some input $x_i$ predicts the expected output $f(x_i) = y_i$.

However, I'm also interested in estimating the expected variance of each sample input, $Var(y_i | x_i)$. The expected variance of each sample input should be different, because each sample input is made up of multiple features, and so they each belong to their own unique probability distribution.

Since I know $Var(X) = E[X^2] - E[X]^2$, can I estimate the variance of each sample using the following process?

Fit model to the training data to generate estimates for $E[Y \mid X]$.
Fit model to the training data, but use a transformed target, $y* := y^2$ to generate estimates for $E[Y^2 \mid X]$.
Now, given some input $x_i$ I can predict $y_i$ and $y_i^2$ and use $y_i^2 - y_i$ to estimate the variance of $y_i$ given $x_i$.

This doesn't feel correct, but I can't put my finger on why. My more general question is, if I have a model that predicts expected outcomes, can I use the same model to predict variance?

Do you mean predicting $\operatorname{Var}\left[ y_i \mid x_i \right]$? I don't understand what you mean by $\operatorname{Var}(x_i)$; it seems that $\operatorname{Var}(X)$ should be the same across the input distribution unless you know some additional context information or something. — Danica, Jan 07 '20 at 19:19
A similar approach is e.g. used in the Breusch-Pagan test for equal variance in a linear model. I'd suggest to fit squared residuals in 2) to be less dependent on biased out-of-sample predictions. Ideally, you could train models 1 and 2 on different samples to not underestimate the variance. — Michael M, Jan 07 '20 at 19:34

score 2 · Accepted Answer · answered Jan 07 '20 at 19:33

If both of these models are perfect, your predictions will indeed become the true conditional variance.

When they're sometimes wrong (as they always will be), you may end up with e.g. negative variance estimates. This might or might not matter to your setting.

There are some alternative approaches:

Fan and Yao (1998) do a similar thing, modeling the squared residuals $(y_i - \hat y_i)^2$, and show that asymptotically you get results as good as if you knew the true $\mathbb E[Y \mid X]$ function.
You can use a variant of e.g. Bayesian linear regression to predict a normal distribution $\mathcal N(\hat y_i, \hat \sigma_i^2)$, jointly learning predictors $\hat y_i$ and $\hat \sigma_i^2$ as e.g. a neural network, and maximize the likelihood of the data under this model. (Or you could of course use a non-normal likelihood here.)
There are many general approaches for conditional density estimation; you can use one of these and then pull the variance out from the estimated conditional density.
Probably many more possibilities.

Thanks and great answer! My preference is actually your 3rd bullet, to predict the entire probability distribution of each sample using something like Continuous Rank Probability Score. — Ben, Jan 07 '20 at 19:42
If you really only care about the variance, you're doing more work than you need to there, but the full conditional distribution can also be really useful! Not super familiar with CRPS but cool. :) — Danica, Jan 07 '20 at 19:46

Can I predict the variance of a random variable using a machine learning regression model that predicts expected outcomes?

1 Answers1