What methods can we use to predict probability distributions?

Question

I'm wondering what methods we can use to predict a probability distribution. Essentially, given some observation $x$, I'm interested in calculating quantities such as $P(y = 3 | x)$ or $P(y = -2 | x)$ and so on. I know most ML methods are focused on giving point estimates rather than distributions. Does anybody have any advice on methods to look into?

The most basic procedure, OLS regression, does this. Almost all parametric regression models and procedures do this. The idea is that these "point estimates" pin down the conditional distributions. — whuber, May 07 '21 at 13:27
There are several methods for this. In case you would like to predict a *discrete* probability distribution you could consider Naive Bayes or Logistic/Multinomial regression. I suppose you actually would like to predict a *continuous* probability distribution (density function)? In that case you could look upon Quantile regression (https://en.wikipedia.org/wiki/Quantile_regression). Please specify your type of observation data $x$, it is numerical or categorical data? Is it low or high dimensional? And do you also have some assumptions for the relations between $x$ and $y$ in mind? — Bas van der Reijden, Nov 24 '19 at 19:35

kjetil b halvorsen · Accepted Answer · 2021-09-07T18:44:46.583

Other answers say that traditional regression models, like linear or logistic regression, already does this, as regression is to model conditional expectations (or conditional probability, conditional hazard, conditional ...). As soon as you are calculating a prediction interval with some regression model, you are entertaining some kind of probabilistic forecasting. See also Definition and delimitation of regression model

But that term seems to be more used when probability distributions are forecasted/predicted in some more flexible or nonparametric way. That is a far too large topic for one answer, but in the following a few links:

score 0 · Answer 2 · answered Nov 24 '19 at 19:32

You need a method to estimate the conditional distribution $p(y|x)$. For example, bayesian interpretation of linear regression can calculate $p(y=3|x),p(y=-2|x)$ etc. Note that this is not a probability but a density value if $y$ is continuous. In general, Bayesian perspectives reinterpret most ML methods and calculate $p(y|x)$. Similarly, some methods such as logistic regression or softmax layer in a neural net aim to estimate $p(y|x)$.

What methods can we use to predict probability distributions?

2 Answers2