1

Let's say I measure the height of 100 individuals every week in both Illinois and Texas while allowing for the same individual to be measured multiple times in the following weeks. This way, my samples within each state are not completely independent as there is some overlap in the composition of individuals over time.

For example, sample 1 included 100 individuals. Sample 2 included 100 individuals, but 2 individuals were also measured in sample 1. Sample 3 can be fully independent. Sample 54 can have 5 individuals that were also measured before. Thus, my 100 samples from each state can have some overlap. (I have an ID for each individual, so I know in which weeks they appear.)

In this scenario, the samples are neither fully dependent nor fully independent. What test can I perform to test for statistical significance in the mean height differences between Illinois and Texas. I imagine both t-test and Wilcoxon assume samples are completely independent of each other.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Voldemort
  • 11
  • 1
  • 1
    The search term is 'partially dependend data' but unfortunately that is also used in data base theory so you must find more specific search queries to find papers like this https://journals.sagepub.com/doi/10.1177/0962280215577111 – Bernhard Nov 22 '20 at 08:09
  • 2
    Do you have subject ID's? See also https://stats.stackexchange.com/questions/25941/t-test-for-partially-paired-and-partially-unpaired-data – kjetil b halvorsen Nov 22 '20 at 14:10
  • Thanks. In my case, both Texas and Illinois have different individuals. There is no overlap between the two populations. Within Illinois and Texas, I have 100 samples each that can have some overlap. In other words, I have drawn multiple samples of size X from Illinois with replacement and also the same for Texas. Does it still meet the definition of partially dependent data? – Voldemort Nov 22 '20 at 15:46
  • But, for each of the states, do you have the 100 ID's for each of the weakly samples? Maybe you can show a few lines of data as an example ... – kjetil b halvorsen Nov 22 '20 at 18:36
  • Yes I do. Actually, this example is for communicating. The real problem is field specific and I thought would be difficult to demonstrate. I have subject IDs. I have drawn 100 samples from Illinois and 100 from Texas. A person from Illinois is not part of any Texas samples. The same is true for Texas. However, a person from Illinois can be part of multiple Illinois samples. Does it clarify better? apologies for confusion. I am looking for a test to compare the Illinois and Texas distributions. Not sure how the independence assumption is violated in such a case – Voldemort Nov 22 '20 at 20:07
  • Welcome to Cross Validated!. Please edit questions to include clarifications requested in comments - I've added in that individuals are identified. – Scortchi - Reinstate Monica Nov 23 '20 at 08:43

0 Answers0