7

From a larger sample of tabular data, I have picked certain rows that meet a certain condition (this condition is unrelated to the actual data in the rows).

Now, I want to know if the distribution of this subset that I have created is similar to the distribution of the original, larger sample.

What test(s) can I use for this purpose?

Thanks! I appreciate the help.

Debi
  • 71
  • 1
  • 1
  • 2

2 Answers2

4

You could test whether several statistics that are descriptive of a distribution are the same in the subsample and the remaining sample. For example you could conduct tests for:

  • mean difference
  • median difference
  • stochastic dominance
  • different variance
  • shape

While you are at it, since you are interested in similarity, I would also explore tests for equivalence of all such measures (for example, using tost), probably combining inferences from difference and equivalence tests.

Something else you may want to consider: why are you interested in this similarity? The answer to this question may help you decide which, if any, such tests you may like to explore. For example, if you sample size is smallish, you may not have enough power for the Kolmogorov–Smirnov test mentioned by soakley, although you might still have power enough to make inferences about, say, the sample mean. If you are only interested in comparing sample means, that may be OK for your purposes.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • Hi, as a follow up: The tost test (in R) asks for a magnitude of region of similarity. Any idea what this is? – Debi May 13 '14 at 19:00
  • Yup: you need to make a decision about *how large a difference is meaningful*. In other words, the minimum size of difference that you are willing to accept as important. See for more info: http://stats.stackexchange.com/tags/tost/info (and pay attention to what $\Delta$ is). – Alexis May 13 '14 at 19:36
4

Since you want to compare the entire distributions, I'd recommend the two sample Kolmogorov-Smirnov test.

More information can be found here:

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

soakley
  • 4,341
  • 3
  • 16
  • 27
  • 3
    One of the assumptions is that the two samples are mutually independent. Given your description, it may be more appropriate to perform a one-sample KS test of the sample versus the parent population. – soakley May 13 '14 at 18:37