Are prediction and distribution-fitting ever not the same thing?

Question

Intuitively, it seems to me that, if one is able to make accurate predictions about a variable, then one has also (perhaps implicitly) produced a good estimate of its marginal or conditional distribution.

Conversely, it seems that if one has fitted a distribution function and has ground-truth knowledge that the estimate is fairly accurate, then one must necessarily be able to generate good predictions.

I'm deliberately leaving the prediction and distribution-fitting methods unspecified.

Are there any counterexamples to my intuition?

Regarding the "Conversely,..." statement: this makes sense if you are predicting the past. But what about predicting the future? Once you are restricted with regards to the covariates you can use (e.g. only past covariates allowed), your choice of distribution-fitting is also restricted and your nicely-fitting distribution might not be permissible (due to inclusion of forbidden covariates). But if you start by delineating what kind of covariates you are allowed to use and then ask the question, prediction and distribution fitting look much like the same. (Does that extend beyond time series?) — Richard Hardy, Jan 25 '15 at 16:59
@RichardHardy I wasn't specifically thinking about time series. I guess I was implicitly assuming that the data-generating process is the same in the past and future, and that the same set of variables is available at all times. That also sounds to me like a sufficient characterization of what makes "past" data different from "future" data, so maybe I _am_ asking only about past data. Maybe a detailed answer could bring up that distinction. — shadowtalker, Jan 25 '15 at 17:03

score 6 · Answer 1 · edited Apr 13 '17 at 12:44

one has also (perhaps implicitly) produced a good estimate of its marginal or conditional distribution.

Point predictions don't necessarily do this.

one must necessarily be able to generate good predictions.

Possibly -- depending on how we define "good"

I'm deliberately leaving the prediction and distribution-fitting methods unspecified.

Okay, then consider prediction using a linear model with a single predictor - one fitted by choosing the slope of the line so as to make the Spearman correlation between residuals and $x$ as close to 0 as possible (if there's an interval at 0, choosing the center of that interval). Rather similar to what was done in this answer to fit a line

enter image description here

(choosing the slope at which the red 'curve' crosses 0, yielding a slope estimate of $3.714$), except we then proceed with a different intercept obtained from the residuals from the one used in that previous answer. Instead consider this:

Given that slope, estimate the intercept from the $y-\hat{\beta}x$ values using a 3-part Hampel redescending M-estimator of location. (We could do the whole line fit via M-estimation, but I wanted to give some idea of the sheer variety of perfectly reasonable approaches to prediction that are available.)

A point prediction at some $x$, say $x_\text{new}$, is then obtained from the fitted value for that $x$.

So doing that on the cars data in R that I fitted at the other link (using the defaults in robustbase::lmrob with psi="hampel"), I obtained an intercept of $-15.79$.

Resulting in this fitted line:

enter image description here

The prediction at $x=21$ is marked in ($62.2$). It appears to be a perfectly reasonable prediction.

We've certainly assumed linearity, but there's no distributional assumption made in obtaining that prediction - the slope was obtained nonparametrically (ie with a distribution free method), while the intercept used M-estimation (and while that grows out of ML estimation, the $\psi$-functions which redescend to 0, such as the Hampel, correspond to no actual distribution).

Clearly point prediction at least needn't involve or relate to a distributional fit, and so the answer to the title question is (demonstrably) "not so".

--

Indeed if we then generated a confidence interval or a prediction interval by bootstrapping, we would have interval prediction without fitting a distribution (unless you call using/re-sampling the ECDF 'fitting', it might well count as estimation depending on what you intend the question to encompass). [However, I think there are also ways to get intervals for some fits generated along similar lines that don't use bootstrapping. For example, we can generate a confidence interval from the slope by inverting the critical values in the Spearman test; at least some kinds of prediction should allow us do something similar for intervals. There are nonparametric tolerance intervals, for example.]

I actually _was_ thinking of using the ECDF of the fitted/predicted values. Even if we impose potentially restrictive assumptions like linearity, if the predictions are somehow "good" then I don't see how the ECDF of the predictions can be anything but similarly "good." This of course is limited by the "maximum possible goodness of fit" for the model, eg a linear model must have some maximum possible performance on quadratic data — shadowtalker, Jan 25 '15 at 20:39

Are prediction and distribution-fitting ever not the same thing?

1 Answers1