For example, suppose I'm using some machine learning model like gradient boosting that, given some input $x_i$ predicts the expected output $f(x_i) = y_i$.
However, I'm also interested in estimating the expected variance of each sample input, $Var(y_i | x_i)$. The expected variance of each sample input should be different, because each sample input is made up of multiple features, and so they each belong to their own unique probability distribution.
Since I know $Var(X) = E[X^2] - E[X]^2$, can I estimate the variance of each sample using the following process?
- Fit model to the training data to generate estimates for $E[Y \mid X]$.
- Fit model to the training data, but use a transformed target, $y* := y^2$ to generate estimates for $E[Y^2 \mid X]$.
- Now, given some input $x_i$ I can predict $y_i$ and $y_i^2$ and use $y_i^2 - y_i$ to estimate the variance of $y_i$ given $x_i$.
This doesn't feel correct, but I can't put my finger on why. My more general question is, if I have a model that predicts expected outcomes, can I use the same model to predict variance?