Evaluating quality of predicted distributions

Question

I have a set of data points $X_i, y_i$ where $x$ are the independent variables and I believe each $y_i$ can be modeled as being drawn from a exponential distributions with parameters $\lambda_i$.

If I use $X_i$ to predict $\lambda_i$, how can I evaluate the quality of my predicted distributions with respect to the observations $y_i$?

Edit: This is essentially the same question as How to evaluate quality of probability estimator for Bernoulli experiments? but in a continuous context rather than a binomial context. It's not obvious to me what to use in this case instead of cross-entropy.

Matthew Drury · Accepted Answer · 2018-07-10T13:53:38.707

The standard approach to this is using the log-likelihood of the exponential distribution. This is actually exactly how the cross-entropy is derived, it is the log-likelihood of the Bernoulli distribution.

In the case of an exponential distribution, the pdf is:

$$ f(y; \lambda) = \lambda e^{-\lambda y} $$

So the log-likelihood is:

$$ LL(\lambda_i; y_i ) = \log(f(y_i; \lambda_i)) = \log(\lambda_i) - \lambda_i y_i$$

So, if $y_i$ are your true values, and $\lambda_i$ are your predictions, an exponential model would minimize:

$$ LL(\{\lambda_i\}; \{y_i\}) = \sum_i \log(\lambda_i) -\lambda_i y_i$$

Fitting models by maximizing the log-likelihood in this way leads to the the theory of generalized linear models; the exponential model is a special case.

score 3 · Answer 2 · answered Jul 10 '18 at 06:13

The standard way to assess predictive distributions is via scoring rules. The log-likelihood that Matthew Drury recommends is one example, it's the logarithmic scoring rule. There are also others. Merkle & Steyvers (2013, Decision Analysis) discuss how different scoring rules hang together, and how to choose one.

More information can be found in the tag wiki, and we have a number of questions carrying the scoring-rules tag.

Evaluating quality of predicted distributions

2 Answers2