5

I have some data like

A  2
A  4
A  76
B  8
B  13
.. ....

Basically, what is the spread of A, B, etc. What would be a suitable graph to visualize such information in R?

Gala
  • 8,323
  • 2
  • 28
  • 42
Kumar Vaibhav
  • 205
  • 2
  • 6
  • 2
    Can you provide some more information about your data. How many points might there be for "A", "B", etc? Are the values in the second column presumably continuous? – Ananda Mahto Jun 30 '13 at 09:42
  • Sure. Thanks for your response. The numeric values are age, so numeric data is not continuous but discrete. There can be any number of points for A, B, etc., ranging from, say, 1 to 150. – Kumar Vaibhav Jun 30 '13 at 09:46
  • How many values in each group? – Glen_b Jun 30 '13 at 09:49
  • So, there can be varying number of values in each group. Say, 20 for A and then 100 for B and then maybe 67 for C, and so on. – Kumar Vaibhav Jun 30 '13 at 09:52

3 Answers3

11

There are numerous possible displays, depending on what more specifically you want.

One example would be a boxplot for each group (A, B, ...) (assuming there are enough values in each group to support one*):

boxplot(len~supp,data=ToothGrowth,horizontal=TRUE,boxwex=.7)

boxp

But you might want to look at histograms, ecdfs, or a number of other possibilities

* Edit: from your later comments, it looks like there's enough data for boxplots.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
9

You already got some excellent answers but let me suggest another plot that was not mentioned yet (this is an example that I created to answer another question):

Dot plot

In R, it is available e.g. through stripchart() or ggplot2's geom_point() or geom_jitter(). (Jitter adds a little bit of noise to avoid too much overlap.) This plot allows you to look at the data somewhat more directly than histograms (which can be badly misleading, see Glen_b's great answer to another question) or boxplots (which are great but a little more complicated to understand and explain).

In GGPlot, you can also combine boxplots and jittered dots, see the documentation.

Gala
  • 8,323
  • 2
  • 28
  • 42
  • 1
    +1. There are many, many names for this plot. Dot and strip plots or charts are two of the most common. As "dot" is widely used in other senses, the name element "strip" is perhaps preferable, at least in searching for implementations (there are several, not just in R but also in all leading packages or languages, e.g. Stata `stripplot`). – Nick Cox Jun 30 '13 at 12:17
  • This is awesome! – Kumar Vaibhav Jun 30 '13 at 12:28
  • @NickCox +1 Thanks for the precisions, I was so confused by the names that I tried very hard not to choose one… Your comment reminded me of another R function to create a stripchart (now added to my answer). – Gala Jun 30 '13 at 12:48
  • But how do I make this if one of the axes doesn't have any numeric data? – Kumar Vaibhav Jun 30 '13 at 14:02
  • No problem i did it - ggplot() + geom_point(data=mydata, mapping=aes(x=mydata[,3], y=mydata[,2])) + coord_cartesian(xlim = c(5, 90)) + xlab("Age") + ylab("Something") + ggtitle("Demographic Analysis of Age and something") – Kumar Vaibhav Jun 30 '13 at 14:15
  • As the number of points increases, I would also suggest adding transparency *as well as* jitter. Or, you can use hollow circles.... These are created somewhat, by default, when using the `densityplot()` function from "lattice". – Ananda Mahto Jul 01 '13 at 09:03
8

As mentioned by Glen_b, there are a number of possibilities.

Here is an example of a histogram and density plot using the "lattice" package. I've also provided some sample data.

set.seed(1)
mydf <- data.frame(V1 = sample(LETTERS[1:5], 500, replace = TRUE),
                   V2 = sample(0:50, 500, replace = TRUE))
head(mydf)
tail(mydf)
library(lattice)
histogram(~V2 | V1, data = mydf)

enter image description here

densityplot(~V2 | V1, data = mydf)

enter image description here

Both are with default settings applied.

Ananda Mahto
  • 295
  • 4
  • 11