What are methods for evaluating the predictive probability or something equivalent in a regression problem? In classification problems a predictive probability will be outputted given an input. For example in a neural network or a logistic regression if the problem is to classify a cat or a dog given a flattened image as a vector input into the model, it would output 70% dog and 30% cat and that could be used as a way to know how "confident" the model is in it's prediction. What is the equivalent in a non-classification regression problem and how is it calculated?
-
Could you please give an explicit example of what you mean for a probability prediction (e.g., logistic regression)? – Dave Sep 13 '21 at 19:37
-
@Dave updated post – user8714896 Sep 13 '21 at 19:44
-
2Perhaps you could use [prediction intervals](https://en.wikipedia.org/wiki/Prediction_interval)? – Adrià Luz Sep 13 '21 at 20:01
3 Answers
A fully specified regression model (of any kind) gives you a parameterized model of the regression function $E[Y | X]$ from which you can construct a model of the distribution $P(Y | X)$, perhaps together with other information.
In logistic regression $P(Y | X)$ is assumed to be Bernoulli (or Binomial) so $E[Y | X]=P(Y | X)$ (the conditional mean just is the conditional probability) so that is all you need to characterize the Bernoulli (or Binomial) PMF.
In regression $P(Y | X)$ might rather be Normal, so you'll still get $E[Y | X]$ (and it's still a conditional mean) but you'll also need (and get) a model of $\text{Var}[Y | X]$ (a separate conditional variance). Together these quantities are enough to characterize any Normal PDF.
There are, of course, lots of ways to evaluate probabilistic predictions, but the details will always depend on the distribution in question. For discrete regressions like Bernoulli: calibration, precision, recall, etc. For continuous ones like Normal: mean square error, predictive checks, etc.
The bottom line is that you're getting $P(Y | X)$ from all regression models, one way or another. It just looks different depending what you assume about the nature of Y. So you evaluate those probabilities in a way that makes sense for Y.
if you don't want to evaluate the confidence of a prediction but rather just express it, then @adrià-luz 's suggestion is a natural one. That boils down the conditional distribution to a mean and an interval that is expected to contain some proportion, e.g. 95% of observations of Y for a particular value of X. Prediction intervals try to fold in parameter estimation uncertainty into the construction of the $P(Y | X)$ model but it's the mean plus interval construction that's doing the expressive work.

- 19,431
- 1
- 55
- 83
-
Do you have any good suggestions for constructing model $P(Y|X)$ normal for a regression that isn't say something like a neural network? How would you construct a model like that? I'm assuming input is $X$ and output is $Y$ along with a $\mu$ and $\sigma$? – user8714896 Sep 14 '21 at 19:46
-
@user8714896 What people call 'neural network's is rather broad lately, but I imagine that plain old linear regression isn't one e.g. what Eoin describes, nor is a regression tree (or better, a regression forest), or a kernel smoother. Support Vector Machines aren't neural networks either. For some of these you'll have to adjust the outputs to get conditional uncertainty out of them, for others not. – conjugateprior Sep 15 '21 at 12:55
-
Re notation: think of it maybe like this. The true expected value of $Y$ (case output) for some value of $X$ (case input, which maybe a vector of attributes) is $E[Y|X]$ by definition. That's the regression function. You're going to approximate it from sample data with some function $f(X)$. That function is going to be augmented with something like $\sigma^2$ (a variance) to express the fact that the $Y$s aren't going to be exactly equal to $f(X)$. Now you're committed to $P(Y|X)$ being approximated by $\text{Normal}(f(X), \sigma^2)$. However the choice of $f$ is yours. – conjugateprior Sep 15 '21 at 13:34
A simpler answer - you could communicate confidence in a regression prediction using any of the following.
For clarity, let's say your regression model is
- $\mu = \alpha + \beta X$ (Expected value)
- $y = \mu + \epsilon$ (Observed values are expected values plus residuals)
- $\epsilon \sim N(0, \sigma)$ (Residuals are Normally distributed with SD $\sigma$)
You could report
- Mean absolute error,
mean(abs(ε))
: In the training data, how far, on average, was each point from the regression line. - Mean squared error, $\sigma$: In the training data, what's the standard deviation of distances from the line.
- Confidence Interval: Reflects uncertainty in the the expected value $\mu$.
- Prediction Interval: Reflects both uncertainty in $\mu$, and the spread of the data due to $\sigma$.

- 4,543
- 15
- 32
- One method for regression is MDN, mixture density network. The idea is to model the output of the network as mixture of gaussians. i.e. Instead of outputting a single number each time, the network output is several numbers describe the gaussian distribution for example: output $\mu$, $\sigma$ of the normal distribution. Then your goal is to maximize the average log likelihood (i.e. your loss is the negative log likelihood). In this case for each prediction you will get different $\sigma$ that will describe the confidence in the prediction.
You can read more about it here: https://towardsdatascience.com/a-hitchhikers-guide-to-mixture-density-networks-76b435826cca
- Another method which is good both for regression and for classification is bayesian neural network. The basic idea here, is to model the weights as normal distribution, instead of simple numbers.These weights will describe the posterior distribution. Then in order to get the confidence you sample from the posterior weights and calculate the standard deviation of the results.

- 1,008
- 5
- 9