1

First of all, I already read the answer to this question Testing two independent samples for null of same skew?, but I would like to know how to apply it to my data. I have two sets of p-values (approximately 16,000 values) and both distribution looks really similar but I want to test if the tails are different (ideally I would like to test the null-hypothesis if they are coming from the same distribution). Also, would it be possible to solve this problem in R? I was thinking in convert pvalues to Z-scores and then set a threshold, say 2 standard deviations and use a two Z-score test to the tails.

Any help would be really appreciated

user2380782
  • 135
  • 7
  • There are generic methods to test if two distributions are the same (Kolmogorov-Smirnov test, Mann-Whitney U test, etc.), but these wouldn't focus exclusively on the tails or take advantage of the fact that these are p-values, which might be important for interpretation. – jwimberley Nov 07 '16 at 13:16

1 Answers1

1

How about a permutation test using the difference of skewnesses of the z-values as a test statistic? It's an exact test, which you can approximate to arbitrary precision by sampling repeatedly from the permuted data (sample the group labels without replacement, and re-compute the difference statistic. Repeat many times and compare your observed difference with the distribution of differences obtained by permutations.) The null hypothesis here is technically that the distributions themselves are identical, but you will have a test that is sensitive to differences of the skewnesses of the distributions.

You could do this with any other statistic, eg, kurtosis. The basic story remains the same.

BigBendRegion
  • 4,593
  • 12
  • 22
  • thanks @peter, could you provide me with a pseudocode? – user2380782 Nov 08 '16 at 15:42
  • I played around with this using R, and after looking at the results it does not seem so great. It's fine under the null case, but in the alternative case, the resampled skewness and kurtosis difference statistics have very high variance, thus low power. It seems like this is perhaps a fertile area for research. Maybe you just need really big n's for this problem. – BigBendRegion Nov 27 '16 at 17:14