2

I have a question regarding paired and unpaired tests. I know the difference between both tests. I am using R for Wilcoxon test. My test is a paired test, however the X and Y do not have the same length and R is giving an error. I do not want/think I should use unpaired test.

For example, I have a subject that makes some cookies every hour. I use some special kind of treatment to increase the stamina. Before the treatment he can only work 4 hours (after day he gets tired) and produces some cookies. After the treatment he produces more cookies per hour and work 7 hours without getting tired.

X contains the number or cookies for each hour before the treatment and Y contains the number of cookies after the treatment. X contains 4 values and Y contains 7 values. Now if I want to use paired test, R gives error.

What should I do? Is there any solution or explanation for such kind of situations? Can I add just NA NA?

"This is just an example please do not point out mistake in the example, it is just to give you an example."

Thank you.

EDIT

Here is basic R script that I use

someData <- read.csv(file="cookie_data.csv",head=TRUE,sep=",")
wilcox.test(someData$X, someData$Y, paired=TRUE)

Sample Data:
X,Y
2,3
3,2
3,3
2,2
,3
,7
,2

When I use this script, R does not give any error. However, when I print someData$X, it prints 4 values and after that it start writing NA NA NA. I noticed R automatically filled blank values with NA. This script gives me p-value but I do not know if it is correct.

mdewey
  • 16,541
  • 22
  • 30
  • 57
user3900
  • 85
  • 2
  • 4
  • Is there some significance to the division into hours? If you lumped the times into days instead of hours, it seems like you wouldn't have this problem anymore, and the expense of losing time specificity. Replacing your missing values with NA would not solve the problem, since the Wilcox paired test computes a difference of the pairs. Since one of those values is unknown, the difference will also be unknown. If the rate of cookie-making is the most important, only compare the four hours of overlap using the Wilcox test, but using a day instead of an hour may work with your data better. – Christopher Aden Sep 02 '11 at 22:46
  • Thank you Christopher for your comment. I updated the question, please have a look. The example was to show the problem of total number of X and Y values. The main question is that how to handle paired test when X and Y values are not equal. – user3900 Sep 02 '11 at 22:58
  • Why not just create a subset of the data that has only complete cases? – Peter Flom Sep 03 '11 at 10:51
  • Peter could you please explain little bit more, what do you mean by create a subset? You mean, I just discard the 3 data points? Don't you think it will bias the results? Also if you see the example, the last 3 points are important. – user3900 Sep 03 '11 at 15:32
  • Your last 3 data points have null importance unless they are complete pairs. You chose paired test, so provide it with pairs. Or choose independent-sample Mann-Whitney test that will take all your data (but independent-sample test is less powerful). – ttnphns Sep 03 '11 at 18:35
  • 1
    If you don't want people objecting to your example (which is unpaired), edit to make a better example - or better, describe your actual situation. – Glen_b Nov 20 '12 at 23:36
  • 1
    See [this question](http://stats.stackexchange.com/questions/25941/t-test-for-partially-paired-and-partially-unpaired-data) (which refers to paired and unpaired observations in a t-test) - not an exact duplicate but the same basic issue. Also, while a bit further away, [the same issue again](http://stats.stackexchange.com/questions/12343/anova-with-some-paired-and-some-unpaired-subjects) – Glen_b Nov 20 '12 at 23:42
  • 1
    Possible duplicate of [t-test for partially paired and partially unpaired data](https://stats.stackexchange.com/questions/25941/t-test-for-partially-paired-and-partially-unpaired-data) – kjetil b halvorsen Jan 16 '19 at 21:00

2 Answers2

3

If you're just analyzing the one subject it's not a paired test. There's nothing to pair across. You also need to be careful in how you describe it because you can only make inferences about the performance of the individual subject and not subjects in general.

If you're analyzing multiple subjects then you need to actually have paired data, which means aggregating across subjects to comparable paired measures, such as how many cookies/day or mean cookies/hr. You're not allowed to have more than one measure per predictor/level in a paired test for each subject. You have two levels, stamina0 and stamina+. Therefore, you can only have two measures / subject.

Alternatively, you could use mixed effects modelling that will allow you to use the number of cookies and hours and generate a much more precise model of what is going on.

John
  • 21,167
  • 9
  • 48
  • 84
  • John, thank you for answer. I mentioned in my question that it is just an example. My main problem is that how to use Wilcox paired test when X and Y has different number of values. Could you please also explain that why the R script in my question is working fine? Why R adds NA for blank values and computes test without any error? – user3900 Sep 02 '11 at 23:37
  • 1
    The short answer is... because R just computes and doesn't babysit you for statistical correctness... much of the time. If you're going to analyze days or hours you need to explicitly make them a predictor in your model and you need something much more complex than a t-test. – John Sep 03 '11 at 02:47
1

If you have 7 days with data, with data after treatment on all 7, but data before treatment on only 4, I don't think there's a simple non-parametric test that you can use, except to omit the days that lack before-treatment data.

You could use a parametric test (an analogue of the paired t-test), but it would require a mixed model (such as with the lme4 package for R).

Karl
  • 5,957
  • 18
  • 34
  • Thank you for answer. Why the script in my question is working fine in R with paired parameter? It is simple Wilcox paired test. Data has different number of values for X and Y. – user3900 Sep 02 '11 at 23:35
  • `wilcox.test` will just drop the last three data points (for which `X` is missing). – Karl Sep 02 '11 at 23:52
  • Ah!!! that is too bad! How can I use Ime4 with t.test? – user3900 Sep 03 '11 at 00:04