0

I have two sets, A and B ,containing height of students. Set A contains height of all students in the class and Set B contains height of some students from the same class. Hence, Set B is a subset of Set A.

I am interested in finding whether the distribution of Set B is same as that of Set A or not and do hypothesis testing on it. But am unsure of what test (KS, Mann-Whitney, etc.) to use given the relation between Set A and Set B.

It would be great to know what test to use and rationale behind it.

Edit: A similar question focuses on mean comparison but it differs in a way that this question is asking for suggestions on tests that compares the distribution which may or may not depend on mean comparison.

  • If A is made up of disjoint subsets B and C, it is better to compare B with C that to compare A with B. The latter would compare some individuals with themselves. – BruceET Jul 17 '19 at 14:42
  • The duplicate makes the salient point that you merely need to compare the subset to its complement. That applies no matter what kind of comparison you are making and reduces your question to "how do I compare two datasets?" That's too broad (as evidenced by a great many different answers in other threads on this site). – whuber Jul 17 '19 at 17:26

1 Answers1

1

Brief demonstration. x1 represents part of a class, x2 represents the rest of the class, and x3 represents the whole class.

set.seed(717)
x1 = rnorm(20, 90, 15);  x2 = rnorm(10, 120, 20) 

A Welch t test (in R) finds a significant difference between the two parts; P-value below 1%.

t.test(x1,x2)$p.val
[1] 0.007744975

However, no significant difference at the 5% level is found between the whole class and the larger part of it. This second t test is not only inconclusive, it is improper. The two "samples" are not independent, as required for a two-sample t test. In fact, they consist largely of the same 20 students.

x3 = c(x1,x2)
t.test(x3,x1)$p.val
[1] 0.08184957

Here are boxplots:

boxplot(x1,x2,x3, col="skyblue2")

enter image description here

Hote: Disadvantages of comparing the whole with one of its parts are discussed elsewhere on this site. Here is one such Q & A.

BruceET
  • 47,896
  • 2
  • 28
  • 76