4

I had a recent question which I probably should re-formulate to a more general one.

I came across this: Using scatter plots to understand multiple values of Y for a given X and thought the accepted answer was very good, but what's unclear to me is: given a nasty scatter plot, how would you visually get the idea what sort of relationship the x:es and y:s have? In my mind, there's no way of telling if a linear, quadratic, etc regression is appropriate.

If we look at the plot that is currently confounding me: scatterplot

My idea was to somehow plot the AVERAGE of f(x), with x clumped together in intervals, instead of each observation. Is this how you would go about it? If not, what other way do you visually make sense of this data?

  • Just an observation on your particular plot, but the y values appear to fixed at integer intervals. If the y values are known to be accurate you only need regression on the x values. Or to look at it another way, as in Glen_b's bins example, you already have bins. I'm no expert on this stuff though. – Jaydee Jul 31 '14 at 15:27

2 Answers2

4

You might want to consider a lowess (/loess) or similar local smooth.

consider:

enter image description here

This one was generated in R with scatter.smooth. It's an estimate of the local mean, but in a way that they vary smoothly; directly akin to using a kernel density estimate instead of a histogram.

If you must have bins, see here which describes how to do something like this regressogram (bin smoother):

regressogram of mcycle data

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Very interesting! I think your answer in the other question was what I was looking for here aswell though. – Benjamin Lindqvist Jul 31 '14 at 12:30
  • 1
    +1 - a slight amount of jitter and making the points smaller and semi-transparent in the original plot would make it more informative as well (which takes minimal data manipulation). I can't figure out the precision of the values on the x-axis - but they already appear to be "clumped together" as the circles overlap exactly. – Andy W Jul 31 '14 at 16:24
  • @AndyW By 'original plot', I presume you mean the one in the question (I didn't attempt to reproduce that aspect of the x-variable); I agree completely with your thoughts about making the plot more informative. – Glen_b Jul 31 '14 at 16:54
  • Yes I meant the plot in Benjamin's question. – Andy W Jul 31 '14 at 17:15
0

Your idea is called 2d density. Here are some examples in R and ggplot. Contouring in ggplot. I recently needed contouring including expected joint distribution so here is example code in R.

 ggplot(mtcars, aes(x = hp, y = wt)) + stat_density2d () + 
stat_density_2d(mapping = aes(x = mtcars$hp[sample(1:length(mtcars$hp))], y = mtcars$wt), color = "green",
geom = "density_2d", position = "identity" , contour = TRUE, n = 100, h = NULL, na.rm = FALSE, show.legend = F, inherit.aes = F)
ran8
  • 149
  • 8