What to do with paired data when a minority of pairs have more or less than two members?

Question

Let's say I have two sets of measurements that go on like this:

Subject Measurement A    Meas.  B How many measurements of this type do I have?
1       a1,a2            -      1
2       -                -      1
3       a3,a4,a5         b1     1
4       a6               b2,b3  1   
5       a7               -      6
6       a8,a9            b4     16
7       a10,a11,a12,a13  b5     3 
8       -                b6     3
9       a13              b7     76

I would like to run a paired test on them but I am not sure what to do about those more than two measurements for one pair. Should I take and average if there is more then one measurement? Should I pair each measurement with the corresponding measurement so some values would be count more than once (like a8 with b4 and a9 with b4)? Should I pick one of each measurements where there are more than one measurements (so pick either a3,a4 or a5 with b1)?

I feel that some weighting would be appropriate: i.e. use b5 paired with a10 a11, a12 and a13 with weight 1, measurements like for subjects 4 and 6 with weight 2, for subject 43 with weight 1.33 and for the rest with weight 4. But what would I do with unpaired data like subjects 1,2,5 and 8?

It is random if I have a nice paired measurements like for subject 9 (as listed in the last column, I have these suitable measurements for 76 subjects, so almost 70 % of the sample). If I run paired tests on these 76 results (Wilcoxon, the underlying data is almost normal but not completely), I get p values of 0.000001 and lower. Is it even worthy to try to use the rest? I am using scipy (and numpy).

Well, most of them are, I have about 70 pairs and then about ten examples like above (one measurement of A corresponds to two or three measurements of B). — sup, May 28 '15 at 15:16
So this is a duplicate of http://stats.stackexchange.com/questions/127393/paired-t-test-when-each-data-point-was-repeatedly-measured-different-number-of-t — sup, Jun 16 '15 at 16:50

Greg Snow · Accepted Answer · 2015-06-15T15:13:42.370

3

When you have more than 2 measurements linked then you are working with a randomized block design rather than matched pairs (matched pairs are a special case of randomized block).

Depending on what software you are using, there are several options (all better than your suggestions) for analyzing this type of data. Mixed effects models are a popular and general solution (but there may be something more specific in your software that will accomplish what you want).

Here are some links to get you started (I don't use python, so don't have anything specific to Scipy):

Randomized Block Design

Mixed Effects Models

edited Jun 15 '15 at 15:13

answered May 26 '15 at 17:11

Greg Snow

46,563
2
90
159

I am using Scipy. I found out that even if I exclude the data that are not paired, the p-values are in one milionth and less range (and it is random where there are more measurements per block than two). I was not able to google anything seeming ly useful regarding keywords "Mixed effects models" and "randomized boock design" with scipy (and without it, everything seems awfully general, I am doing no rocket science). – sup Jun 13 '15 at 15:03
1

@sup, I added a couple of links above. – Greg Snow Jun 15 '15 at 15:13
1

Yes, I saw those but it gives me a headache. I exanded the answer greatly so it is more specific. It still feels like using a cannon to hunt a pigeon, I am only a linguistic-department major who does not want to write his thesis in a way "this number is higher than that number, so it means something and I made a great discovery"). – sup Jun 15 '15 at 16:08

What to do with paired data when a minority of pairs have more or less than two members?

1 Answers1