In what situation would Wilcoxon's Signed-Rank Test be preferable to either t-Test or Sign Test?

Question

After some discussion (below), I now have a clearer picture of a focused question, so here is a revised question, though some of the comments might now seem unconnected with the original question.

It seems that t-tests converge quickly for symmetric distributions, that the signed-rank test assumes symmetry, and that, for a symmetric distribution, there is no difference between means/pseudomedians/medians. If so, under what circumstances would a relatively inexperienced statistician find the signed-rank test useful, when s/he has both the t-test and sign test available? If one of my (e.g. social science) students is trying to test whether one treatment performs better than another (by some relatively easily interpreted measure, e.g. some notion of "average" difference), I am struggling to find a place for the signed-rank test, even though it seems to generally be taught, and the sign-test ignored, at my university.

The t.test performs much worse in this situation (around 70% false positives). But the high rate of false positives with the Wilcoxon signed-rank test is disturbing.. — JonB, Nov 22 '16 at 11:00
Thanks for the comment @JonB! However, I think the reason you are getting such bad false positives for the t-test is that you are not adjusting for the change in null hypothesis. For the t-test, we should change the first line to `data = replicate(1000,rlnorm(50))-exp(1/2)` so that we have a _mean_ of zero. If I do that, I get a false positive rate closer to 10%. (Still not great, but much better than the Wilcoxon). — justme, Nov 22 '16 at 11:05
The rank sum test does not assume symmetry. The point of such nonparametric tests (e.g. signed rank, Kruskal-Wallis, etc.) is that the only distributional assumption they make is i.i.d. — Alexis, Nov 22 '16 at 16:19
@Alexis -- thank you for commenting! Though I'm talking about the signed-rank test, not the rank sum test. Though, what you are saying about nonparametric tests is the prevailing teaching from everyone I know. But yet, looking at the simulation above, at the way the statistic is calculated, and at the fact that it is mentioned as an assumption in passing in many places (e.g. assumption #3 at [Laerd](https://statistics.laerd.com/spss-tutorials/wilcoxon-signed-rank-test-using-spss-statistics.php)), make me suspect that the conventional wisdom might not be all it's cracked up to be? — justme, Nov 22 '16 at 21:52
@JonB it's a sneaky blighter hiding in the code waiting to trip you up!! — justme, Nov 22 '16 at 21:54
It depends on *whose* conventional wisdom you're looking at; my experience of it is very different from yours. Certainly it's easy to find resources that clearly state that symmetry of difference scores is assumed under the null (and that it matters). But note that this is *under the null* -- as a result, finding lack of symmetry in difference scores in a sample isn't necessarily relevant -- you're not required to have symmetry under the alternative. If you're highly confident that if the null were true the symmetry would hold -- and in many cases it's a highly plausible assumption -- ... ctd — Glen_b, Nov 23 '16 at 03:10
ctd... then there's no issue. The problem is, if you're not prepared to assume it beforehand you don't know if a rejection was caused by assumption failure; the obvious thing to do then is simply *not* assume it. — Glen_b, Nov 23 '16 at 03:10
@Glen_b -- thank you, this is very helpful, but I'm struggling to wrap my head around a few details, if I may ask? If understand you correctly (which I might not), symmetry is (part of) an often sensible null, and assymetry allows us to reject that? But if we do not assume symmetry in the alternative, how do we judge whether our "changed distribution" represent an improvement or a degradation, since the differences could still have zero median? Also, when you say the obvious thing is not to assume it -- do you mean to use a different test (e.g. sign test) instead? Thank you! — justme, Nov 23 '16 at 10:09
@Glen_b -- to give you some context -- I work for very young university that has no statistics department, and only recently started a maths department. The standard approach taught to the students is that they should normality test their data, and if that fails to show "normality" then they should use the "equivalent" nonparametric test and compare medians. I'm very aware of the nonsensicalness of normality testing in this context, but it seems to me that in the one-/paired-samples context it is even more dangerous, because it seems to scoop off the worst performance from both tests... — justme, Nov 23 '16 at 10:13
Looking your second comment first: (on top of what you already mentioned), note that 1. normal assumptions don't exhaust parametric tests. 2. The signed rank test isn't actually a test of medians but of one-sample Hodges-Lehmann statistics / pseudomedians (though if you add the assumption of symmetry to the alternative, it will also test for medians, and where means exist, also for means, among many other things). Similarly the rank sum test is not a test of medians but of median pairwise differences. You're right that the level of the signed rank test can be quite sensitive to asymmetry. — Glen_b, Nov 23 '16 at 11:18
On your earlier comment: 1 Symmetry isn't generally seen as part of the null, but as part of the assumptions you need in order that the permutations be exchangeable under the null. 2. as previously mentioned, it's not actually a test of medians, but of pseudomedians, and this holds true even under an asymmetric alternative. It's true that interpretation is sometimes easier if you make some restrictive assumptions, but the restrictions required to make it a reasonable test for medians needn't be as strict as assuming symmetry under the alternative. — Glen_b, Nov 23 '16 at 11:26
I'd like to turn this into an answer but I don't know that I really address everything necessary. — Glen_b, Nov 23 '16 at 11:28
@Glen_b -- apologies for the slow reply, I needed time to digest and experiment! But this is SO helpful, thank you! Especially the pseudomedian comment, which I had missed. But, does this mean that my simulation above should achieve lower false positives if I gave my distribution zero pseudomedian rather than zero median? I have tried to estimate the pseudomedian using simulation, but I'm not seeing much improvement in the simulation; I'm not sure if that is because my estimate is too noisy, or because we still require symmetry somewhere, even when we are looking at pseudomedians? — justme, Nov 26 '16 at 18:36
@Glen_b I also rephrased the question in case that makes it focused enough for a proper answer! -- Though it is possible that my rephrasing misunderstands something of what you have said... — justme, Nov 26 '16 at 18:38
I think the OP is still misunderstanding "symmetry." $H_{0}: P(X_{A} = X_{B}) = 0.5$ is a symmetric **probability** under the null, but implies nothing about a **symmetric distribution**, since both (a) asymmetric distributions can exist under the null, and (b) symmetric distributions can exist under a set of alternative hypotheses. — Alexis, Jan 10 '17 at 00:55
Thanks @Alexis! I'm really trying to wrap my head around this, but struggling to piece together all the nuances I'm getting from people's comments and my own simulations. Am I right in saying that that should be a ">", i.e. $H_0: P(X_A>X_B)=0.5$ ? I can see how this could solve our problems with a **rank-sum** test, but I still struggle to see how it changes anything for a **signed-rank** test, because presumably this is equivalent to $H_0: P(X_A-X_B>0) = 0.5$, i.e. that the median of differences is 0? But, both my simulation in the original post (will copy that below) and examination... — justme, Jan 10 '17 at 12:36
...of the formula used for the signed-rank test both seem to point towards that not being robust to asymmetry in terms of false positives. Glen_b's comments about pseudomedians seemed to make sense as a possible way out of this, but I haven't managed to make the simulation much better using pseudomedians instead of medians, and I'm still struggling to wrap my head around where the pseudomedians would fit in. So I'm still (as you say) confused as to what our real null and assumptions are and what kind of reinterpretation of the test allows me to ignore symmetry in the distn of differences... — justme, Jan 10 '17 at 12:42
This is the simulation: `data = replicate(1000,rlnorm(50))-exp(0)` `is.false.positive = function(x)wilcox.test(x)$p.value<0.05` `false.positives = apply(data,2,is.false.positive)` `false.positive.rate = mean(false.positives)` `print(sprintf("%d runs gives false positive rate %f", length(false.positives), false.positive.rate))` — justme, Jan 10 '17 at 12:43
@justme The null hypothesis I articulated **is not a null assuming median equality** and rejecting it **does not test for median difference *without the additional assumptions that (1) the distributions have the same shape, and (2) have the same variance.*** The basic null for both sign-rank says a randomly selected paired difference between group A and group B is equally likely to be positive as negative. — Alexis, Jan 12 '17 at 22:02
@justme the $>$ as opposed to $\ge$ is because, since these tests assume continuous distributions true equality has a zero probability. Either because of precision issues, or because the data are not actually continuous, data with *ties* require special adjustments to the test statistics (for the rank sum test). — Alexis, Jan 12 '17 at 22:05
Thanks @Alexis! But in your comment you used a $=$ rather than a $>$ or $\geq$, which is where that confusion lay (rather than between the two latter). Regardless, in my simulation, $P(X_A \geq X_B) = 0.5$ by design, but when I crank up the asymmetry in the distribution of $X_A-X_B$, the false positives go through the roof. So it must either be an assumption, or incorporated into the null? This is also what I see on [Wikipedia](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test#Test_procedure) where it is incorporated into the null. — justme, Jan 18 '17 at 17:46
@justme Whoops! You are right... I totally did. Properly: $H_{0}:P\left(X_{A} > X_{B}) = 0.5\right)$ and $H_{a}:P\left(X_{A} > X_{B}) \ne 0.5\right)$ I would be happy to see the deets on your simulation and discuss. My email is alexis.dinnoHIPPOPOTAMUS@pdxAARDWOLF.edu (minus the quadruped-ish bits). — Alexis, Jan 18 '17 at 20:03

Glen_b · Accepted Answer · 2017-05-01T16:38:06.650

4

Consider a distribution of pair-differences that is somewhat heavier tailed than normal, but not especially "peaky"; then often the signed rank test will tend to be more powerful than the t-test, but also more powerful than the sign test.

For example, at the logistic distribution, the asymptotic relative efficiency of the signed rank test relative to the t-test is 1.097 so the signed rank test should be more powerful than the t (at least in larger samples), but the asymptotic relative efficiency of the sign test relative to the t-test is 0.822, so the sign test would be less powerful than the t (again, at least in larger samples).

As we move to heavier-tailed distributions (while still avoiding overly-peaky ones), the t will tend to perform relatively worse, while the sign-test should improve somewhat, and both sign and signed-rank will outperform the t in detecting small effects by substantial margins (i.e. will require much smaller sample sizes to detect an effect). There will be a large class of distributions for which the signed-rank test is the best of the three.

Here's one example -- the $t_3$ distribution. Power was simulated at n=100 for the three tests, for a 5% significance level. The power for the $t$ test is marked in black, that for the Wilcoxon signed rank in red and the sign test is marked in green. The sign test's available significance levels didn't include any especially near 5% so in that case a randomized test was used to get close to the right significance level. The x-axis is the $\delta$ parameter which represents the shift from the null case (the tests were all two-sided, so the actual power curve would be symmetric about 0).

As we see in the plot, the signed rank test has more power than the sign test, which in turn has more power than the t-test.

edited May 01 '17 at 16:38

answered Apr 22 '17 at 17:22

Glen_b

257,508
32
553
939

Thanks a lot for this @Glen_b ! I'm still struggling to work out where it fits in our syllabus, when we have students for whom even the concept of power is beyond the scope of their studies, and why we teach Wilcoxon as the main alternative to the paired t. But this does give some useful motivations. Thank you! – justme Jun 08 '17 at 11:43
Incidentally after considering what distributional feature impacts the asymptotic variance of the median (and hence the power of the sign test), an example occurred to me where the relative positions of the t and sign test are reversed; as a result I think there's a good possibility of constructing a case where the signed rank test may do considerably better than either of the two other tests. I'll play with it some more when I can and maybe write something on it. – Glen_b Jun 08 '17 at 23:04
As far as your syllabus goes, it's clear there are definitely cases where the signed rank outperforms both other tests (as I outlined in my answers - distributions that are somewhat heavier tailed than normal, but not especially peaked); the t is better at the normal or lighter, and the sign test is better when the distribution has a strong peak (which often tends to go along with very heavy tails, but doesn't have to). [Beware, however, confusing these ideas with mere changes in spread, which doesn't alter their relative properties.] ... I'm sure you could squeeze a few such sentences in – Glen_b Jun 08 '17 at 23:30
Thanks a lot @Glen_b ! The trouble is I'm not teaching the syllabus, just supporting it! The syllabus in most departments seems to be: (i) use a hypothesis test of normality (kill me now) and based on that (ii) either use Wilcoxon or t-Test. So the finer details of the shoulders of the distribution etc are never even touched, and nor is power, just whether assumptions are met (in a slightly rubbish way). But your thoughts are very helpful for me personally, at least! – justme Jun 09 '17 at 11:18
Great post @Glen_b! So in terms of selecting from the two tests, can I conclude that we should always compute power first? Rather than following the assumption that always uses Sign Test if the difference distribution is not normal? Thanks! – Lumos May 29 '18 at 06:48
@Kay You can only compute power if you know the distribution you're sampling from, which in practice you don't (if you did, you could do better than these tests). If you've seen other samples from this or closely related populations you may be able to garner enough information about distributional characteristics to make a reasonable choice of test. The present sample should not form the basis of that choice, because if you use the sample itself to choose between tests the test will no longer have its nominal properties (both significance level and power will be impacted from what you expect). – Glen_b May 29 '18 at 11:09
I would not always choose the sign test; in a range of situations that could reasonably occur in practice the signed rank test would be better. On the other hand if we were dealing with *light tailed* symmetric non-normality (like a beta(2,2) distribution, say), the t-test will beat the other two. On the gripping hand, depending on the precise nature of the alternatives of interest, I might choose something different from all three. – Glen_b May 29 '18 at 11:28

In what situation would Wilcoxon's Signed-Rank Test be preferable to either t-Test or Sign Test?

1 Answers1

Linked