1

I'm a bit confused with interpretation of Sample Sample Q-Q plot produced by qqplot R function.

OUT_number_F=c(440, 461, 379, 518, 402, 470, 599, 543, 330, 537, 683, 397, 428, 531, 655)
OUT_number_P=c(530, 409, 296, 474, 305, 567, 580, 579, 358, 594, 530, 440, 619, 622, 527)
qqplot(OUT_number_F,OUT_number_P,xlab = "F sample quantiles",ylab = "P sample quantiles")
abline(0,1,col="blue",lwd=3)

The resulted image is below: Sample-Sample qqplot

So, based on the image above, i could make the follows conclusion: a) The samples came from the two completely different distributions (only just for instance, let's say, normal and poisson's distributions) b) The samples could came from the same distribution (e.g. normal), but be skwed relatively each other (as i understand, data in this case came from two different normal distributions)

Which statements above are correct? The second question. In the code example to add blue line to the Q-Q plot i used command :

abline(0,1,col="blue",lwd=3)

As i know there is not any special Rfunction for that (i can not apply qqline() function in this case). Is my approach the most accurate and correct way to add line to the Sample Sample Q-Q plot?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Denis
  • 439
  • 2
  • 9
  • 1
    At such a small sample size the deviation from the line might only be chance. If you do a number of plots of random data from a normal distribution and compare it with your plot, does it look unusual? Note that if you have data from two different normal distributions the only effect is to change the scale (in effect, the numbers on the axes); it has no impact on the shape at all. You can adapt the code that produced the plot [here](http://stats.stackexchange.com/a/111013/805) ... ctd – Glen_b May 18 '16 at 21:36
  • 1
    ctd ... (which does a set of random sample-vs-theoretical Q-Q plots) to your case (sample-vs-sample) by having two lines like the "`xz = `..." line, one for each data set, and then in the loop putting both the x and the y arguments into the `qqnorm` call, and finally replacing the `qqline` with an `abline` call. Please see that post of additional explanation. Fifteen points is barely enough to discern shape information above the noise, so things will tend to look pretty random even when they're from the same distribution. ...ctd – Glen_b May 18 '16 at 21:43
  • 1
    ctd... If you use different normal distributions, the relationships look the same but their "lines" may be far from the line with intercept 0 and slope 1. If you experiment with other distributions in place of the second one, you'll tend to see more distinct deviations from a linear relationship. – Glen_b May 18 '16 at 21:45
  • Thanks! Much appreciated for your response. If i understand you correctly, the direct answer to the my question will be "A". Right? What is the minimal sample size should be for testing normality of the data with `qqnorm` (`qqplot`) or `Kolmogorov-Smirnov test`? As i know, for `Shapiro-Wilk test` even three observations in the sample is enough. – Denis May 18 '16 at 23:32
  • Besides, each time i have to accommodate parameters for the `abline` function (`intercept` and `slope`), because they will depend on the scale of my data and Sample Sample Q-Q plot as consequence. So, there is not any simple way to produce the appropriate line authomaticaly. Is it correct? I gues it could be done by `lm` function. – Denis May 18 '16 at 23:46
  • 1
    1. If you're trying to compare samples from normal distributions, you *want* to keep the (0,1) abline, since that's showing the differences. 2. There's no specific "cut off" in sample size, it's an issue of how much information you have to determine what the shape is. Fifteen points isn't much, especially when you're trying to judge by eye. Yes you can physically carry out (say) a Shapiro-Wilk test at n=3 because you can compute a statistic at that sample size ... but what's the power like? Looking at a plot you're not relying on a pretty efficient statistic but less efficient judgement by eye – Glen_b May 19 '16 at 01:07
  • 1
    3. If you're not interested in anything but whether the two groups have the same distributional shape you can adapt the code that qqline itself uses (it just joins the first and third quartile points by default). In fact if you're clever you could almost do it directly with qqline (by making the theoretical distribution function the function returned by `ecdf`, though you may want to adjust the quantiles along the lines of the `ppoints` function). – Glen_b May 19 '16 at 01:15
  • Thanks again. I'll check out the `ecdf` and `ppoints` functions and try to figure out it. – Denis May 19 '16 at 11:25
  • If i understand you correctly, i have to adapt the code from the link you provided above. Particularly i should 1) generate multiple artificial samples by means `rnorm` function (although i'm not sure that it's completely correct, because i have not any asumption about data normality) 2) draw multiple Sample Sample Q-Q plots with `qqplots` and `abline` functions (plot artificial samples quantiles against each other) and one Sample Sample Q-Q plot for the real samples pair. 3) Explore the produced plots manually. – Denis May 19 '16 at 11:52
  • 1
    You don't have to do normal -- do any other distributions you like -- but you mentioned normal as an example (and also said "as i understand, data in this case came from two different normal distributions") so it would be a good idea to at least do that case, surely. You don't *have* to do any of the things I suggested but it's a good way to develop an intuitive understanding of how "linear" these things should look at some given sample size (since you lack that intuition presently). I don't suggest that as a way to do anything other than build your own intuition about Q-Q plots – Glen_b May 19 '16 at 12:02

0 Answers0