How important are confidence intervals?

Question

I am writing a research paper about time series forecasting using neural networks. In my results I created tables containing error values (RMSE, MAE and RMSSE) for the predictions and made plots showing the predicted values over the original data.

Now, I have been told that I need to add confidence intervals to my predictions because limited amounts of data can make confidence intervals very wide. My time series is approximately one year long.

I understand what confidence intervals are, but I don't understand how they will make my results better.

How important are confidence intervals? How do they change how you analyse the predictions of a model?

Consider you have some money to invest. Your bank says, that investment A will give a guaranteed annual profit of 2% whereas investment B will may give you a profit somewhere between - 8% to + 12% with an expected value of 2%. Will you throw dice over which investment to take, because both have the same expected profit or does it make a difference, that in one case the profit is fixed while the other one is taking chances? — Bernhard, Apr 20 '21 at 08:43
Some people think using confidence intervals, so find them an aid to interpretation of results. Typically they will look at how wide they are (as a proxy for uncertainty), whether they contain $0$ (as a proxy for significance) and what is the scale of the minimum and middle change suggested by the interval (substance). Some will confuse confidence intervals with prediction intervals — Henry, Apr 20 '21 at 08:46
I strongly suspect that whoever you are talking to is asking for a *prediction interval*. [PIs are *not* confidence intervals.](https://stats.stackexchange.com/tags/prediction-interval/info) When you write that you understand what CIs are, this difference may be tripping you up. @Bernhard: do you want to post your comment(s) as an answer? [Better to have a short answer than no answer at all.](https://stats.meta.stackexchange.com/a/5326/1352) Anyone who has a better answer can post it. — Stephan Kolassa, Apr 20 '21 at 08:53
@StephanKolassa Augmented (improved?) and posted as an short answer that may be better then no answer. — Bernhard, Apr 20 '21 at 09:04
i remember one of the 1st examples i learned was age. i'm 100% sure that random people in a certain room are within the ages 0 to 200, but i'm only 95% sure they're all within 5 to 90 — BCLC, Apr 20 '21 at 23:00
@BCLC: and that is yet a third thing, neither a confidence interval, nor a prediction interval. — Stephan Kolassa, Apr 21 '21 at 13:19
@BCLC: a confidence interval is used in estimating an *unobservable parameter*. You draw a random sample from the population, fit a model, and estimate the parameter with a CI. It's essentially an *algorithm* that, when you apply *the entire procedure* (sampling, fitting & testing) many times, the resulting CIs (each one will be different, of course), will cover the true parameter value (say) 95% of times. A prediction interval is an interval that will cover a *single* as-yet unobserved but *observable* actual with probability (say) 95%. ... — Stephan Kolassa, Apr 21 '21 at 14:27
... Your interval does not have a common name, but of course 2.5% and 97.5% *empirical quantiles* do have a name, and your interval could consist of two such quantiles. (Or of a 1% and 96% quantile.) It covers a certain *proportion* of an *observed sample*. As such, yes, it is closer to a PI, but the difference I see is that you are describing an already observed sample, not predicting for a new one. PIs are usually too narrow, because of DGP drift and similar; your concept does not suffer from this. — Stephan Kolassa, Apr 21 '21 at 14:30
ok look this wasn't actually me. this was from a series of statistics lecture series i watched back in 2010-2011, over a decade ago. anyway, i quit applied maths for pure maths over 5 years ago, and i quit probability 3 years ago, sooo.... thanks anyway — BCLC, Apr 21 '21 at 14:32

score 13 · Accepted Answer · edited Apr 20 '21 at 09:10

Consider you have some money to invest. Your bank says, that investment A will give a guaranteed annual profit of 2% whereas investment B will may give you a profit somewhere between -8% to +12% with an expected value of 2%. Will you throw dice over which investment to take, because both have the same expected profit or does it make a difference, that in one case the profit is fixed while the other one is taking chances?

Imagine the same scenario, but in addition, you can borrow money for 1.6%. You could borrow as much as possible, invest it in A and have a guaranteed .4% for yourself. With investment B there is a risk of losing money so you should probably not borrow as much as possible but only so much that you could afford the possible loss. So different decisions will be taken for the same expected gain depending on its uncertainty.

So very often it is of utmost importance to not only give a point prediction but to also communicate some measure of uncertainty around that point prediction. Does this necessarily have to be confidence intervals or predictive intervals or credible intervals or standard errors? Depending on the circumstances and the habits within your field people may have certain expectations partly based on tradition. If you can meet these expectations it will improve your communication of results.

@StephanKolassa Thank you very much for language editing. Much appreciated. — Bernhard, Apr 20 '21 at 10:58
I think a key point is if one can reasonably know what the confidence interval is with time series. You have to know what the distribution is, how reasonable is that? I create ranges around the data 5 and 10 percent off to show rather than use confidence intervals, this make no distribution assumptions. But then I am hardly an expert on time series assumptions.:) — user54285, Apr 20 '21 at 16:40

score 2 · Answer 2 · answered Apr 20 '21 at 22:04

My view may be a bit unorthodox, but for me, a confidence interval (and similar uncertainty measurements) is for telling me what I cannot do with a number.

We people like thinking in numbers (at least in positive integers and decimals). We use them everyday, to the point where we don't even notice the operations we are doing with them, as long as we don't have to pull out a calculator. And in this intuitive, super-easy use, we forgot that numbers are only a representation of the real thing, and sometimes the answer we get by number manipulation is not the answer we are looking for. Confidence intervals can help us avoid some specific misapplications of numbers.

First example: Measurement. I have a kitchen scale whose product description proclaims an accuracy of 2 grams. Let's assume that this statement means that, if I measure out 1 kg of sugar every day to make jam, on 99% of days the actual amount of sugar will be between 998 and 1002 grams, so this is a roundabout way to state a confidence interval about the results of repeated uses of the scale. This is quite sufficient for making great jam, but not at all sufficient if I want to measure 4 grams of gelatin, because using 2 or 6 grams instead of 4 will likely make a recipe fail. So, by knowing the width of the confidence interval, I know that my scale is useless for a certain kind of ingredient.
Second example: Comparison. Let's say that somebody publishes an article on the caffeine content of types of tea, and describes that the average cup of black tea has 39.4 mg of caffeine, while the average cup of green tea has 31.8 mg. Newspapers articles and nutrition guides shorten the information "green tea has less caffeine than black tea" and people who want to reduce their caffeine intake may decide to switch their tea. But if you look into the original research, you are likely to see a confidence interval of 30 to 50 mg for green tea and 25 to 60 mg for black tea*. These are very wide intervals, and they overlap a lot. If you were to switch from black tea to green tea, there is quite a good chance that your caffeine intake may go up instead of down. This makes the conclusion "black tea has more caffeine than green tea" incorrect, at least the way it is understood in everyday communication - but you cannot know that until you have looked at the confidence intervals.
Third example. Let's leave the kitchen and look at something a real statistician might do - calculate an odds ratio from a medical study. These calculations are quite important, with huge consequences for approval, or at least warnings that affect prescription patterns. Let's say that on a new medicine, the odds ratio for myocardial infarction is 1.7 for women under 30, 1.2 for men aged 50-65, and quite close to one for all other groups. Now what should the regulator do? If you don't look at the confidence intervals, you might think that a warning against prescription should be issued for both groups, or at least for the women. But it may turn out that the correct thing is to issue a warning for the men, but not for the young women - if it so happens that the confidence interval for the young women crosses the 1, but that for men is tighter and is completely above one, meaning that we can be quite certain their risk is indeed higher when on the medicine. In this case, we are not allowed to make a policy decision based on the huge-looking 1.7 odds ratio - but we can only know this when using the confidence interval.

These examples are somewhat randomly picked, I am sure that there are more, or better ones. But in a nutshell, when you start looking at confidence intervals (without even doing fancy calculations with them), you can save yourself from some serious mistakes in interpretation.

* In reality, this type of study is most likely to simply report the range observed in the tested samples, but for the sake of argument, let's assume the authors actually calculated confidence intervals

score 2 · Answer 3 · answered Apr 21 '21 at 04:50

For a slightly more definitional answer:

A 95% confidence interval around a value represents the range of models for which, if you were to test any of those models against that value, you would not reject the model at the 5% significance level.

Said differently, given that a p-value can be interpreted as a measure of compatibility between a suggested model and data (or, how typical that data is for that model), then a confidence interval is the range of models for which that data is compatible above our arbitrary 5% threshold.

In this sense, it is worth reporting both your result and a confidence interval, since the former reflects accuracy/location, whereas the latter reflects precision.

Specifically, if that range is very wide, it means there are many models for which that data would be typical; if that range was so wide that it also included the 'null hypothesis', then it means that your result is so imprecise that it is effectively indistinguishable from a null result, since the null is also compatible with the data given our stated 5% threshold*. Whereas if the range of compatible models is very small, then this reflects high precision around your result.

_{* Note that the null in this case is indistinguishable, only in terms of judging by acceptance/rejection alone - the actual p-value of the Null Hypothesis will tell you more about how its compatibility with the data compares against any other model within this range of compatible models.}

_{Highly recommended article for learning about p-values and confidence intervals and their proper interpretation: Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology. 2016 Apr 1;31(4):337-50.}

+1 for the nice answer and the good reference. But please notice that the original poster did not at all place confidence intervals in a testing context but in a prediction context so that p-values and null hypotheses are not actually a topic here. — Bernhard, Apr 21 '21 at 06:30
@Bernhard thank you for this comment. I disagree slightly with the assertion that the use of p-values and CIs is exclusive to null hypothesis testing settings, but you're right, OP should probably be looking at credible intervals in this case, not confidence intervals ... — Tasos Papastylianou, Apr 21 '21 at 06:51
Could you give me a short hint as to where a p-value is relevant but for testing? I was not concerned about confidence vs credible but for prediction versus testing. — Bernhard, Apr 21 '21 at 07:46
@Bernhard In the more general sense, a p-value is simply a well-defined measure of 'compatibility' between a probability distribution and a datapoint, with a particular interpretation. Given a probabilistic prediction and a target, you can easily ascribe a p-value to describe the compatibility between the two. Alternatively, you can use any other such compatibility measure, e.g. a cumulative rank probability score. The fact that p-values find natural use in NHT doesn't strip it of this more general property. In fact, in theory one could equally perform NHT using CRPS if they really wanted to. — Tasos Papastylianou, Apr 21 '21 at 08:53
I guess I would return to an earlier point of mine. How can one reasonably know what a distribution for a time series is. Is it really very likely to be normally distributed? And the CLT probably does not apply. — user54285, Apr 21 '21 at 18:34

How important are confidence intervals?

3 Answers3

Linked