Why errorbars shouldn't be SEM*√2 long by default?

Question

There is no standard recommendation for length of errorbars to be used while showing spread of data or means in graphics. Standard deviation (SD), standard error of mean (SEM) and 95% confidence intervals (CI) are all used.

It is also common to have 2 means with errorbars shown in a single graph. An obvious question arises: whether non-overlap of errorbars indicate that the difference between 2 means is statistically significant (P<0.05)?

With reference discussions on this and this questions, it seems that following is usually accurate:

For 2 Means to be significantly different (P<0.05), the errorbars of length SEM*√2 should not be overlapping.

In contrast to above, errorbars of length 2*SEM (95% confidence interval) may be overlapping even if difference between 2 series is significant (P<0.05).

On the other hand, errorbars of length SEM may be non-overlapping even if difference between 2 series is not significant (P>0.05). Hence, SEM errorbars often give misleading impression of a significant difference between two similar series.

In view of above, why shouldn't errorbars of length SEM*√2 be used as a standard for graphical purposes?

Also, is there any specific name for this value: SEM*√2 ?

Considering your comment under the answer by @dariober, it is not clear waht your question is: Are you asking why error bars are drawn *one* SE in each direction and not *tow* SEs? — cdalitz, Nov 25 '20 at 12:06
Since confidence intervals if computed correctly tend to be asymmetric, it is hard to get too interested in methods that assume symmetry of errors. — Frank Harrell, Nov 25 '20 at 12:24
@cdalitz I am suggesting that error bar on each side should be `SEM*√2` — rnso, Nov 25 '20 at 12:38
@FrankHarrell For asymmetric confidence intervals, what would be the length where overlap starts at P=0.05 ? — rnso, Nov 25 '20 at 12:40
Error bars are a way to visualize mean and sd in a single plot, and are not meant to visualize confidence intervals. — cdalitz, Nov 25 '20 at 12:43
@Frank One might think that, but when you consider that comparison of two CIs compares the *upper* limit of one to the *lower* limit of the other, the asymmetries tend to balance out. Thus, this kind of application is much more robust than one might initially imagine. — whuber, Nov 25 '20 at 17:30
No, it is not appropriate to compare two CIs. Need to form a customized CI for the difference. — Frank Harrell, Nov 25 '20 at 21:32
"not appropriate" means there are limitations. Are these limitations so great that the use of `SEM*√2` is completely useless? — rnso, Nov 26 '20 at 01:20
Never said it's useless, just that it shouldn't be used as often as it is. Recognize that both sampling distributions and posterior distributions are typically asymmetric. Regarding comparing two CIs that is almost never appropriate. Compute the single CI on the right estimand. — Frank Harrell, Nov 26 '20 at 12:26
@Frank For an account that discusses (a) when one might be obliged to compare CIs (it is a common occurrence) and (b) how one might do it in a reasonable and principled way, see https://stats.stackexchange.com/a/18259/919. — whuber, Nov 29 '20 at 22:53
@whuber that is extremely well written. I just don't favor having to manipulate individual confidence intervals for the purpose of making comparisons, and rather would like to see confidence intervals constructed for the specific comparison of interest. — Frank Harrell, Nov 30 '20 at 12:03

score 1 · Accepted Answer · answered Nov 25 '20 at 10:48

why shouldn't errorbars of length SEM*√2 be used by default for graphical purposes?

Maybe what you are saying is related to the least significance difference? Crawley's R Book (pdf seems to be freely available here) has a nice description under the Analysis of Variance chapter, page 514 (t-test can be considered a special case of ANOVA, right?):

With standard errors we could be sure that the means were not significantly different when the bars did overlap. And with confidence intervals we can be sure that the means are significantly different when the bars do not overlap. But the alternative cases are not clear-cut for either type of bar. Can we somehow get the best of both worlds, so that the means are significantly different when the bars do not overlap, and the means are not significantly different when the bars do overlap?

The answer is yes, we can, if we use least significant difference (LSD) bars. Let us revisit the formula for Student’s t test:

t = (a difference) / (standard error of the difference)

We say that the difference is significant when t > 2 (by the rule of thumb, or t > qt(0.975,df) if we want to be more precise). We can rearrange this formula to find the smallest difference that we would regard as being significant. We can call this the least significant difference:

LSD = qt(0.975,df) × standard error of a difference ≈ 2 × sediff.

In general, though, I would keep in mind that CI and p-values express two different characteristics (all this in the perspective of comparing just two groups of course):

CI tells you how precisely the mean in this group has been estimated and it is therefore unrelated to the mean and precision of another group
pvalue is related to the difference between groups and therefore it must account for the precision of estimates in both groups jointly

(The Q&A you link have a more formal explanation)

So I think in this view it shouldn't be too surprising that CI can overlap and still give p < 0.05. Then depending on what you want to emphasize it may be more appropriate to show CI or LSD or else.

Very well explained. However, LSD is `2*sediff` while errorbars are generally SE of means (not difference). Can we call `SEM*√2` also as LSD? — rnso, Nov 25 '20 at 11:39

Why errorbars shouldn't be SEM*√2 long by default?

1 Answers1