13

I am trying to visualize an appropriate plot for the observations in this table of means and standard deviations of recall scores:

\begin{array} {c|c c|c c|} & \text{Control} & & \text{Experimental} & \\ & \text{Mean} & \text{SD} &\text{Mean} &\text{SD} \\ \hline \text{Recall} & 37 & 8 & 21 & 6 \\ \hline \end{array}

What is is the best way to do that? Is bar chart a good way to do it? How can I illustrate the standard deviation in that case?

Silverfish
  • 20,678
  • 23
  • 92
  • 180
l..
  • 287
  • 3
  • 5
  • 15
  • 11
    If you don't have more data, I would not create a graph. It would be a waste of space. – Roland Oct 19 '15 at 13:33
  • 4
    If you don't have more than this, a full analysis is difficult, as these means and SDs are compatible with many different distributions. – Nick Cox Oct 19 '15 at 17:27

3 Answers3

11

Standard deviation on bar graphs can be illustrated by including error bars in them.

The visualization(source) below is an example of such visualization:

enter image description here


From a discussion in the comments below, having only the error whiskers instead of the error bars setup seems a better way to visualize such data. So, the graph can look somewhat like this:

enter image description here

Dawny33
  • 2,239
  • 1
  • 21
  • 37
  • 4
    The principle is clearly along the right lines, but I'd suggest refinements to your graph. If bins are for touching intervals, then the bars should touch too and indicating bin boundaries alone is sufficient. Regardless of that, the cross-hatching is, in my view, just a distraction here. BTW, how would you denote error for a zero observed count? – Nick Cox Oct 19 '15 at 12:29
  • 2
    At least this example has the error bars on both sides, the worst "[dynamite plots](http://biostat.mc.vanderbilt.edu/wiki/pub/Main/TatsukiKoyama/Poster3.pdf)" don't even have those, see [here](http://stats.stackexchange.com/q/1173/1036) for one example. – Andy W Oct 19 '15 at 12:32
  • @NickCox I just lifted it up from the web. And yeah, the bins should touch for touching intervals(but here they aren't, so I think it's alright here). And yeah, the hatching is distracting. I need to plot one for getting my head around how to get a zero observed count on it. – Dawny33 Oct 19 '15 at 12:33
  • @AndyW After reading that article, I too hate dynamite plots now :D These are good enough then. So, are these dynamite plots with less opacity? (Maybe ;-) ) – Dawny33 Oct 19 '15 at 12:35
  • This plot comes with a HUGE caveat that it is only meaningful if the zero mean has some special meaning! Because yellow bars all stretch down to zero and so emphasize it immensely. An (often preferred) alternative is to remove the bars altogether, leaving only points and whiskers. – amoeba Oct 19 '15 at 12:40
  • Me and Nick touched upon the points I think are the most noteworthy. Even ignoring the show the data point and just discussing showing the point estimates and the standard errors, the bars are generally superfluous, and they are even worse when they "cover up" half of the interval. Also bar conventions like anchoring to zero and having discrete bins can impede the viz. as well. Tukey has a great example of that in EDA. – Andy W Oct 19 '15 at 12:40
  • @amoeba So, you mean having only the error bars, instead of the _error whiskers + bars_ setup? – Dawny33 Oct 19 '15 at 12:43
  • 1
    No! I meant plotting error whiskers without plotting the bars. Bars are bad. – amoeba Oct 19 '15 at 12:43
  • @amoeba Yeah, I meant _whiskers_. That should look neat, but you need to have the dotted lines in the graph though. – Dawny33 Oct 19 '15 at 12:51
  • 3
    I think bars can be fine for small counts, as in this example, and for some other measured quantities also with natural origin and reference level zero, so long as they don't occlude error bars. But bars can be silly and distracting (rather than bad) when it's not an issue whether values are or aren't zero. – Nick Cox Oct 19 '15 at 13:33
  • 2
    Another possibility is a Cleveland dot plot ([pdf](https://www.perceptualedge.com/articles/b-eye/dot_plots.pdf)), which is essentially the same as your dot & whisker version, except they go horizontally. Error bars are less common on dot plots, but are certainly acceptable. – gung - Reinstate Monica Oct 19 '15 at 15:31
  • @NickCox Replaced Image-1 with a better one. – Dawny33 Oct 19 '15 at 17:10
  • Thanks for removing the cross-hatching. Cutting out comments that are now irrelevant is (mostly) too difficult, – Nick Cox Oct 19 '15 at 17:25
  • 1
    @NickCox I'd let them be there, as people can look at the edit history and learn why poor images shouldn't be used in answers :D – Dawny33 Oct 19 '15 at 17:27
9

I'd suggest a dot plot:

Although there is still some room for improvement (perhaps dimming the edges of the big rectangle surrounding the data), almost all of the ink is being used to display information.

ToughTea
  • 91
  • 1
  • 1
    How does this answer the OP's question? How do you use dotplot with means and standard deviations? – kjetil b halvorsen Oct 19 '15 at 18:36
  • 1
    [This Stack Overflow page](http://stackoverflow.com/a/18207651/5044791) discusses how to generate dotplots from means and SDs. – EdM Oct 19 '15 at 18:46
  • 3
    @kjetilbhavlorsen: The mean is the dot, and the standard deviation (or optionally, standard error of the mean) is shown using the length of the lines adjacent to the dot. –  Oct 19 '15 at 19:05
  • 3
    (+1) The term "dot plot" is rather overloaded, my first thought was that you were going to suggest drawing dots for each data point (which of course the OP can't do, not having the raw data). I suspect this is what @kjetil wondered too. Does this variety of "dot plot" have a more specific name which distinguishes it from the "dot for each data point" type of plot? – Silverfish Oct 19 '15 at 20:27
1

Perhaps the best way to visualise the kind of data that gives rise to those sorts of results is to simulate a data set of a few hundred or a few thousand data points where one variable (control) has mean 37 and standard deviation 8 while the other (experimental) has men 21 and standard deviation 6. The simulation is simple enough in a spreadsheet or your favourite stats package. You can then graph the two distribitions to get an impression of the extent that the two sets of recall scores vary.

simple Excel graph

With a simuated data-set you can also easily construct summary graphs like box-plots or histograms with error bars.