6

Suppose I have samples drawn from categories A, B, C. Within those categories, I have subcategories d,e,f which are found in all 3 categories. I want to visualize how many samples I have form categories A, B, C and the proportional composition of subcategories d,e,f within each category.

One way to do this is a bar plot (I'm using ggplot2, not that it matters too much) with bars for A, B, C, heights proportional to their total number of samples. Within each bar I partition it by fill color based on the composition of d,e,f within the category. The problem with this is that since A, B, and C will be different heights, it's almost impossible to visually compare the proportions - e.g. proportion of d in A with the proportion of d in B.

To see the proportions, I can renormalize the heights to 100% instead of the sample count so that bars for A, B, and C are now equal height. However, now I can't visualize the counts in A, B, and C.

Is there an elegant way to visualize both of these piece of information simultaneously?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user4733
  • 2,494
  • 2
  • 20
  • 31
  • 2
    Hadley Wickham's product plots seems to a fruitful approach to what you are suggesting, see [this answer](http://stats.stackexchange.com/a/20900/1036) with an example and further reference. – Andy W May 31 '12 at 18:25
  • Thanks @Andy W ... a spine plot is actually perfect for this since the subcategories are the same across categories (I also tried mosaic plots, but a spine plot is better for comparisons). Using the horizontal width for the category counts seems so obvious in retrospect. Now if only I could do this in ggplot2 without those messy horizontal and vertical stacking calculations (there's a spineplot function in the base package, but it looks pretty ugly). – user4733 Jun 01 '12 at 17:29

1 Answers1

3

This example of embedded/layered bar plots may represent one alternative. The three main categories are represented by individual bars, then embedded within are subcategory bars (created in ggplot2).

Blog Link (Learning R)

John
  • 246
  • 2
  • 6