Why does using a non-parametric test decrease power?

Question

I am thinking about using the Mann Whitney U test over Student's classic t-test. But I was warned that I'd lose power and would require a higher sample size to compensate.

I don't understand: Why does using a non-parametric test decrease power?

Can you add a reference to this? Where did you read about this? — Greenparker, Mar 27 '16 at 19:29
http://www.graphpad.com/guides/prism/6/statistics/index.htm?stat_the_power_of_nonparametric_tes.htm — , Mar 27 '16 at 19:31
see http://stats.stackexchange.com/questions/163915/why-would-parametric-statistics-ever-be-preferred-over-nonparametric/163928#163928 — , Mar 28 '16 at 09:30
For a more quantitative approach, you can consider the asymptotic relative efficiencies. For instance, [Why is the asymptotic relative efficiency of the Wilcoxon test 3/π compared to Student's t-test for normally distributed data?](http://stats.stackexchange.com/q/130562/22228) ... But note the ARE will often favour the non-parametric test instead if the data are not drawn from a normal distribution. — Silverfish, Mar 28 '16 at 10:50

score 12 · Answer 1 · answered Mar 27 '16 at 21:34

12

The reason that parametric tests are sometimes more powerful than randomisation and tests based on ranks is that the parametric tests make use of some extra information about the data: the nature of the distribution from which the data are assumed to have come. However, their power advantage is not invariant, as it is often minimal but sometimes they have less power.

See pages 96 and onwards of David Colquhoun's old but still golden textbook Lectures on Biostatistics. It is available as a free pdf here: http://www.dcscience.net/Lectures_on_biostatistics-ocr4.pdf

Non-parametric tests are usually almost as powerful as parametric tests in the circumstances where the parametric tests are appropriate. However, in circumstances where the parametric test may not be appropriate because its assumptions are too badly violated, the non-parametric test may be more powerful.

answered Mar 27 '16 at 21:34

Michael Lew

10,995
2
29
47

8

+1, but I'll give parametric tests a plug: they often have *much* more power when dealing with small samples. For example, you couldn't possibly reject the null hypothesis at $\alpha = 0.05$ with a Wilcoxon Signed Rank Test without $n \ge 6$. You can with $n = 2$ using a t-test. And for many bio experiments, $n = 5$ is considered an expensively large sample size! – Cliff AB Mar 27 '16 at 22:31
@CliffAB Yes, true enough, but you are talking about samples of as few as 5 for a permutations test to be less powerful than a t-test. Are the properties of statistical tests with samples of just 2 useful for inference? Surely you know more from prior and external information than the data can tell you in total. I would be very concerned about inferences based on a P-value from a sample of n=2. – Michael Lew Mar 28 '16 at 20:13
2

I'm talking about sample sizes as small as 5 for a permutation test to be *completely useless*. And there are plenty of researchers out there who don't get more than 5 samples! For larger samples, I don't know exactly what the power curve would look like, but I'm guessing it takes awhile before the permutation test catches up to be nearly equivalent when the data is truly normal. Don't get me wrong, I prefer non-parametrics! But there is a very real need for parametric tests as well. – Cliff AB Mar 28 '16 at 20:18
In regards to $n = 2$, most bio journals require $n \ge 3$, so I assume your faith is restored? I agree that I wouldn't put 100% faith in a conclusion based on 3 samples, but in some studies, that's literally all that is possible. In the (kind of) defense of the bio world, they will usually test, say, 4 different aspects that all inspect their fundamental hypothesis. Obviously we could put more faith in the results if the $n = 100$ instead, but do we make science come to a halt because we cannot afford the sample size to make asymptotic tendencies kick in? – Cliff AB Mar 28 '16 at 20:24
@Cliff Can you give some more specific examples of situations in biology where such low sample sizes are normal? I work in neuroscience, here a typical experiment will usually have $n$ on the order of magnitude of several dozens if it's rats or, say, batches of fruit flies (and hundreds if it's neurons). When you say that $n=5$ is already expensively large, this is five *of what?* – amoeba Mar 29 '16 at 15:55
@amoeba: this came up quite often when comparing KO mice with WT mice, especially in preliminary studies (ie grant writing). – Cliff AB Mar 29 '16 at 16:59
@Cliff I see. I don't quite understand what is so expensive about running several additional mice, but of course in pilot experiments one often has very small sample sizes. – amoeba Mar 29 '16 at 17:02
@amoeba: I couldn't tell you the exact costs behind it, but I've been told many times that "we can't afford more mice". Again, these are usually pilot studies for proposing a grant. – Cliff AB Mar 29 '16 at 21:39

score 11 · Answer 2 · edited Jun 11 '20 at 14:32

I am thinking about using the Mann Whitney U test over Student's classic test.

Generally speaking there's a lot to be said for the Mann-Whitney

But was warned that I'd lose Power and would require a higher sample size to compensate.

I don't understand: Why does using a non-parametric test decrease power?

It doesn't, generally. In many cases, quite the opposite.

If the assumptions of the t-test hold perfectly, and the nonparametric test you use is the Mann-Whitney, then you lose a tiny amount of power$^\dagger$, because the t-test is the most powerful test at the normal under a location-shift alternative. (The t-test uses all the available information in the sample, if the assumptions hold - equal variance, normal distribution, independence, etc. But if you don't have normal distributions, it doesn't; and in many such cases the Mann-Whitney actually makes a more efficient use of the available information)

And even if you were exactly at the normal, the power loss is quite small (in large samples, it corresponds to needing 4.7% more observations to get the same power ... less than one in 20).

$^\dagger$ (There are other nonparametric tests that don't lose power against the t-test at the normal, but that doesn't mean the Mann-Whitney is a bad choice for a test of location shift, even if you're confident you have a population distribution close to the normal distribution.)

[This argument would be like arguing against buying very cheap insurance on the basis that if nobody was ever involved in an accident it would be cheaper.]

Do you know that the data were drawn from a normal distribution? Otfen it's possible to tell -- even without looking at the data -- that they can't be (often one can tell simply by knowing that the variable is bounded; if it can't be negative, for example, it can't actually be normal). And if the distribution that the data were drawn from is even a little heavier tailed than the normal, the t-test is likely to be less powerful, not more; and it can be much less powerful

(In very small samples other considerations than power come into play and I sometimes argue for a parameteric test then, even though they can be sensitive to assumptions)

thanks glen. but if nonparametric methods rarely lose power, why dont we always use the over parametric ones? — , Mar 28 '16 at 11:53
I don't think I said *rarely*; it depends on the situation even if you're comparing MW to t (for example, the MW *does* lose more power with lighter tailed distributions), but as you phrased it you're going even further and generalizing from a discussion about specific tests to something much more general, and there's no support for that. For example, the power loss of the sign test against the parametric equivalent under normality is not small. You may want to be more specific about the situations in which you're now proposing to always use nonparametric tests. (Isn't this a new question?) — Glen_b, Mar 28 '16 at 12:06
@zero Another reason a person may choose a parametric method is for decision rule parsimony on future data. A simple linear regression with M covariates and N training samples only requires me to store between O(M) and O(M^2) data (the coefficients, intercept, and possibly covariances for standard errors), while most non-parametric models (for example, some kind of locally weighted regression) require me to keep O(N) data (the actual training samples themselves). For cases when N is extremely large, this can be impractical even if the non-parametric method is desirable. — ely, Mar 28 '16 at 12:55
@MrF The discussion here is about a nonparametric model for the distributional form, rather than a nonparametric model for the relationship with another variable or variables (like locally weighted regression). However, you could make a similar point to the one you made, at least in respect of some kinds of nonparametric procedure. — Glen_b, Mar 29 '16 at 16:02

Why does using a non-parametric test decrease power?

2 Answers2