Determine if there is a difference between two large vectors of different sizes of non-normal quantiative data in R

Question

I have two vectors that are of unequal sizes. They contain quantitative data that does not form a normal distribution. I would like to see whether or not they are statistically different. Is there a good program to use in R to achieve this?

I am thinking I should not use t-test (as that assumes normality) or Wilcoxon (as that uses pair groups of equal sizes).

Can you say more about the variables you're observing? Are they discrete, for example? Bounded? How are they obtained? Also, there are many ways that distributions can differ; what kinds of difference are you interested in? — Glen_b, Mar 04 '15 at 06:35

Glen_b · Accepted Answer · 2015-03-06T00:21:19.423

There are two tests named after Wilcoxon.

The one you mention is the signed rank test, but there's also the rank sum test, which might well be suitable for your problem. It's sometimes called the Mann-Whitney test.

It can be used to test for a location shift, or a scale shift, or more general alternatives, including stochastic dominance. For more general differences still, you might consider a two-sample Kolmogorov-Smirnov test.

You might also consider a permutation/randomization test (particularly if you're interested in a difference in means).

(In R, both wilcoxon tests can be obtained via wilcox.test, and the two Kolmogorov-Smirnov tests can be obtained via ks.test. Randomization tests are also very easy to carry out.)

I think still have the problem described here [Are large data sets inappropriate for hypothesis testing?](http://stats.stackexchange.com/questions/2516/are-large-data-sets-inappropriate-for-hypothesis-testing) — Haitao Du, Apr 05 '17 at 17:55

Determine if there is a difference between two large vectors of different sizes of non-normal quantiative data in R

1 Answers1