0

I'm a beginner at R. I was trying out some code for implementing a basic qplot() on the inbuilt 'mtcars' dataset in R.

#Kernel Density Plot for mpg (miles per gallon)
#grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(0.5),
      main="Distribution of Gas Milage", xlab="Miles Per Gallon", 
      ylab="Density")

and the graph is as follows: enter image description here

I'm unable to understand what those Density decimal values mean on the Y-axis. What do they say about the Miles Per Gallon distribution(X-axis)???

LearneR
  • 101
  • 1
  • 1
  • 1
  • You probably should buy a book on univariate statistics. –  Apr 01 '15 at 01:22
  • Ok...thanks.. shall do that.. But what does that Density thing mean on Y-axis?? Is it telling that 16 Miles Per Gallon mileage is provided nearly a 11% of the times for 3 geared cars?? something like that?? –  Apr 01 '15 at 01:50
  • 2
    Why using density if you don't know what it is? `density = counts / sum(counts * bar width)`. –  Apr 01 '15 at 02:05
  • 1
    Short answer is that the absolute numbers there probably aren't telling you anything that's very useful. The plot itself and the relative points are useful, the y axis is hard to interpret and you probably don't need to interpret it. This isn't an R-specific question, just google "what does density plot y axis mean" or something like that :) –  Apr 01 '15 at 02:35
  • True, it's not an R specific question, but FALSE that it doesn't tell you anything useful. @Pascal got it right. It's a per-x-unit estimate of counts. – DWin Apr 01 '15 at 04:49
  • possible duplicate of ["The total area underneath a probability density function is 1" - relative to what?](http://stats.stackexchange.com/questions/133369/the-total-area-underneath-a-probability-density-function-is-1-relative-to-wh) or http://stats.stackexchange.com/questions/4220/a-probability-distribution-value-exceeding-1-is-ok?lq=1 – Tim Apr 11 '15 at 08:09

1 Answers1

5

Your comment has the right idea:

Is it telling that 16 Miles Per Gallon mileage is provided nearly a 11% of the times for 3 geared cars?? something like that??

You are pretty much right.

You might find it instructive to compare these graphs. First, a side note: In my version of R, the mtcars dataset has gear has a numeric variable, not a factor, so to get a plot that looks like yours, I have to do it this way:

qplot(x=mpg, data=mtcars, geom='density', fill=as.factor(gear), alpha=I(0.5))

You might find it instructive to compare the plot you made to these:

qplot(x=mpg, data=mtcars, geom='histogram', fill=as.factor(gear),
     alpha=I(0.5), binwidth=2)

That graph "stacks" the histogram for each gear category on top of each other, so to get a histogram more comparable to your density plot, try:

qplot(x=mpg, data=mtcars, geom='histogram', fill=as.factor(gear), 
     alpha=I(0.5), binwidth=2, position='identity')

Now it's a bit hard to see the histograms, so try

qplot(x=mpg, data=mtcars, geom='histogram', fill=as.factor(gear), 
     alpha=I(0.5), binwidth=2, position='identity', color=I('black'))

to clearly see the outlines of the histogram.

You might not know it, but the default value of the y aesthetic in geom_histogram() is equal to the count of the values in the data that are in the histogram bin. Thus this produces an identical plot to the one above:

qplot(x=mpg, data=mtcars, geom='histogram', fill=as.factor(gear), 
     alpha=I(0.5), binwidth=2, position='identity', color=I('black'),
     y=..count..)

Now, instead of plotting the absolute counts, you can plot the percentage of counts by dividing by the total:

qplot(x=mpg, data=mtcars, geom='histogram', fill=as.factor(gear),
     alpha=I(0.5), binwidth=2, position='identity', color=I('black'),
     y=..count../sum(..count..))

Does that y-axis now look anything like the density plot you asked about?

qplot(x=mpg, data=mtcars, geom='density', fill=as.factor(gear),
     alpha=I(0.5), position='identity', color=I('black'),
     y=..count../sum(..count..))
Curt F.
  • 269
  • 3
  • 13
  • I'd be happy for downvoters to comment so I could have a chance to improve the answer. Thanks! – Curt F. Apr 01 '15 at 03:12
  • Already answered in comments. And this question should be closed because this as nothing to do with programming. –  Apr 01 '15 at 03:21
  • 2
    Thanks for the comment Pascal. I was under the impression that my answer provided more detail than the comments. Apologies that you did not find it to be the case. Are there other questions that you think should be closed that are currently not? I will make sure not to answer those either. – Curt F. Apr 01 '15 at 03:29