3

I have a dataset of the following form:

Month     2000     2001
-----    ------   ------
March    100.71   101.54
April    102.56   103.01
May      101.45   101.23
June     102.78   103.37
July     104.79   105.62
August   105.61   105.63

Which is the average amount of a chemical taken at different months over 2000 and 2001.

I want to formally test whether the average amount of the chemical during 2001 is no greater than that of 2000. I was considering using the Wilcoxon Signed-Rank test but was unsure whether this test is the most suitable. I was also unsure whether this data is paired.

Because the sample size is small, I was under the impression that a t-test is out of the question and that a non-parametric test is required. So is the Wilcoxon Signed-Rank Test the best test to use in this case?

  • 1
    It's paired if there's a substantial "month" effect (months tend to have different means in a regular way). If not\*\* then I would presume they're not paired. It's time-series, so you also have to worry about whether there's serial correlation (which would invalidate your signed-rank test assumptions), or *autocorrelation*. $\qquad$ \*\*(assuming there was also not a strong lag-12 autocorrelation in the series outside an obvious month-cycle, though) – Glen_b Dec 28 '14 at 22:12

1 Answers1

4

The Wilcoxon signed rank test is a nonparametric test for two populations when the observations are paired. Using the Wilcoxon signed rank test with two samples, s1 and s2 will allow you to test for the null hypothesis that s1 – s2 comes from a distribution with zero median and density that is symmetric about that median (thanks @ttnphns for spotting this). It is not concerned with averages (ie. means) at any point.

My main concern would be that that the Wilcoxon signed rank test asks for each pair to be chosen randomly and independently. Your data appears to be part of a timeseries so I would suspect a seasonal component to come into play.

I do not think that the Wilcoxon signed rank test assumptions are fulfilled for your particular case. You might want to "bend the rules" and say that each pair is random and independent of the others (so you are OK to use the W.s.r. test) but this is your choice to make.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • 1
    +1 for the thoughtful analysis. But wouldn't a seasonal component (assuming it's *additive*) be irrelevant in comparing the pairs? I suspect your concern might more forcibly be directed at the possibility of positive temporal autocorrelation among the seasonally-adjusted values. Even a little bit of correlation would reduce the effective degrees of freedom. That could turn what is at the moment a marginally significant difference ($p\lt 0.05$) according to either test into a potentially insignificant difference. – whuber Dec 28 '14 at 18:13
  • "...comes from a distribution with zero median. It is not concerned with averages (ie. means) at any point." Not sure I quite agree with the way this is put. The median *is* an average too! To see whether whether you can make this a test of median, you need to check some assumptions - see http://stats.stackexchange.com/questions/19524/what-are-the-assumptions-and-h0-for-wilcoxon-signed-rank-test – Silverfish Dec 28 '14 at 18:19
  • 1
    I'm not sure this answer considers the original poster's query about whether the data are "paired" or not. If you think there is a seasonal component then pairing makes sense for the reason given by whuber. If there is no reason to suspect a monthly pattern then pairing seems less reasonable. But I think this answer is right to point towards time series analysis, which seems a better way to deal with dependence over time. – Silverfish Dec 28 '14 at 18:25
  • @Silverfish: When one says "*average amount*" he usually means an arithmetic mean and not a median. Both statistics are valid descriptors but only one of them is used in the W.s.r. test. I noted this distinction as the OP never mentions the concept of "order" or "median" in the original post and I wanted $H_0$ to be clear. – usεr11852 Dec 28 '14 at 19:12
  • @usεr11852, I must confess I don't quite agree with `null hypothesis that s1 – s2 comes from a distribution with zero median`. The null for Wilcoxon is that this distribution is _symmetric about zero_, which is a bit different thing. Indeed, the observed median of `s1 – s2` may be exactly zero yet the test be significant. But if you choose to _assume_ that the distribution in the population is symmetric in shape then the null hypothesis is that its mean (=median) is zero. – ttnphns Dec 28 '14 at 19:21
  • @whuber: I agree with your comment. As I said one might just bend the rules, say it's all independent and go his merry way. If he accepts a temporal component then proper correction for $\alpha$ should be done. In light of this I think disregarding the W.s.r test is more straightforward. This is why I was not concerned with the issue of pairing (Silverfish). Clearly if we assume that the data are not paired then the test is inapplicable to begin with. – usεr11852 Dec 28 '14 at 19:22
  • @ttnphns: You are right. I should have mentioned that the distribution has "a density that is symmetric about the median $\mu$" and we "wish to test $\mu = 0$". I will correct it; thanks for the careful read. – usεr11852 Dec 28 '14 at 21:01
  • In light of those changes, the comments about averages are a little puzzling: the median of a symmetric distribution always equals its mean, so a test of the median is indeed a test of the mean. – whuber Dec 28 '14 at 21:33
  • @whuber: I was thinking about that; Wilcoxon's 1945 paper refers to the means actually. I guess in literature (eg. *Statistical Models* (2008) by Davison - Ch. 7.4, *Nonparametric Statistical Inference* (2003) by Gibbons & Chakraborti - Ch. 8.2, etc.) medians are used because they appear more natural within a non-parametric framework. Also in the case of "approximately symmetric" the "always equal" would be violated. – usεr11852 Dec 28 '14 at 22:42