Significance tests

Question

I have two sample means from two different samples. I purport that the true population means are identical.

$H_{0}$: $\mu_{1}=\mu_{2}$ and $H_{1}$: $\mu_{1}\ne\mu_{2}$

What significance test would I use to determine evidence for or against the null hypothesis?

Sample 1: {4,3,1,3,5,2,2,3,5,3}

Sample 2: {5,3,2,2,3,3,1,4,5,5}

When you ask a homework question on here, if you tag it as homework and you also show what you've tried to do to solve it then you'll usually get better help. Actually, if you include what you tried that usually helps for most questions. — John, Mar 20 '12 at 02:05
@John, I would tag it as homework if it was homework. But it's not homework. — dplanet, Mar 20 '12 at 12:33
ok... seemed very much like homework, or an assignment, or something you're doing in a class. Regardless, it's a good idea to put in what you've tried to do to solve the problem. — John, Mar 20 '12 at 17:10
Are the samples paired or not? For example, does the first number 4 of the Sample 1 have any correspondence with the first number 5 of Sample 2? Maybe the first patient had 4 bad nails before the treatment, and after treatment he has 5 bad nails. Tell us a little more about the meaning of your data. — Zen, Mar 20 '12 at 19:29
@Zen: Sorry for not replying first time. Both sets of data are completely independent from eachother. — dplanet, Mar 20 '12 at 20:14

score 9 · Answer 1 · answered Mar 22 '12 at 16:31

This reply describes two good solutions, a permutation test and a Student t-test, and compares and contrasts them.

Michael Lew recommends a permutation test. This is good advice: such a test is conceptually simple and makes few assumptions. It interprets the null hypothesis as meaning it makes no difference which sample a value is from, because both samples are drawn from the same distribution. (Notice that this adds an unstated but common assumption; namely, that the distribution with mean $\mu_1$ has exactly the same shape as that with mean $\mu_2$.)

Because this dataset is so small--only 20 numbers are involved in two samples of 10 each--no simulation is needed to carry out the permutation test: we can directly obtain all $\binom{20}{10} = 184756$ distinct ways in which $10$ of the values can be drawn from all $20$ numbers. In each case we can compare the mean of the $10$ values (taken to represent possible values of $x$ under the null hypothesis) to the mean of the $10$ values that remain (i.e., the values of $y$): this is a natural statistic for comparing two means.

Here is a working R example:

x <- c(5,3,2,2,3,3,1,4,5,5)       # One sample
y <- c(4,3,1,3,5,2,2,3,5,3)       # The other sample

# Construct a test statistic
sum.all <- sum(c(x,y))
n.y <- length(y)
test.statistic <- function(u) mean(u) - (sum.all - sum(u)) / n.y

# Apply it to all possible ways in which x could have occurred.
perms <- combn(c(x,y), length(x))
p <- apply(perms, 2, test.statistic)

# Display the sample distribution of the test statistic.
hist(p)

Histogram

To use this histogram, note that the value of the test statistic for the actual observations is 0.2:

> test.statistic(x)
[1] 0.2

It is apparent in the histogram that many of the permutation results are larger in size than 0.2. We will quantify this in a moment, but at this point it is clear that the difference is relatively small.

It is worth noticing that the test statistic can only have a value in the range $[-2,2]$ in multiples of $0.2$: its sampling distribution is discrete.

> table(p)
p
   -2  -1.8  -1.6  -1.4  -1.2    -1  -0.8  -0.6  -0.4  -0.2     0 
   35   154   560  1502  3316  6320 10356 15192 19679 23164 24200 
  0.2   0.4   0.6   0.8     1   1.2   1.4   1.6   1.8     2 
23164 19679 15192 10356  6320  3316  1502   560   154    35

(The numbers running -2 -1.8 ... 2 are the values of $p$ and beneath them are the numbers of times each occurs.)

We find, easily enough, that (a) 86.9% of the values are equal to or exceed the observed test statistic in size:

> length(p[abs(p) >= abs(test.statistic(x))]) / length(p)
[1] 0.8690164

and (b) 61.8% of the values strictly exceed the observed test statistic in size:

length(p[abs(p) > abs(test.statistic(x))]) / length(p)
[1] 0.6182641

There is little basis to choose one of these figures over the other; we might indeed just split the difference and take their average, equal to 0.744. This tells us that randomly dividing the 20 data values into two groups of 10 each, to simulate conditions under the null hypothesis, produces a greater mean difference either 87%, 62%, or 74% of the time, depending on how you wish to interpret "greater." These large results indicate the difference that has been observed could be attributed to chance alone: there is no basis for inferring the null hypothesis is false.

Anyone carrying out the calculations shown here would likely wait a few seconds for them to complete. They would not be practicable for larger datasets: in such cases there are just two many possible ways that sample $x$ could have occurred among all the numbers. That's why, when the two groups look similar and do not present a terribly skewed distribution, we often look first to a Student T test. This test is an approximation to the permutation test. It is intended to produce a comparable result while circumventing the large number of calculations needed to run the permutation test.

First we check that the t-test results may be applicable to these data:

> require("moments") # For skewness()
> sd(x)
[1] 1.418136
> sd(y)
[1] 1.286684
> skewness(x)
[1] -0.06406292
> skewness(y)
[1] 0.1385547

The two groups have comparable standard deviations and low skewnesses. Although they are small in size (10 numbers each), they are not too small. The t-test should therefore work well. Let's apply it:

> t.test(x,y, var.equal=TRUE, alternative="two.sided")

        Two Sample t-test

data:  x and y 
t = 0.3303, df = 18, p-value = 0.745
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -1.072171  1.472171 
sample estimates:
mean of x mean of y 
      3.3       3.1

The output is instantaneous, because little calculation is needed. As we saw before, the means differ by $3.3-3.1 = 0.2$. The p-value of 0.745 is remarkably close to the permutation test's result of 0.744 (q.v.).

Michael Lew · Answer 2 · 2012-03-22T01:47:50.287

2

You might consider a permutations test.

A permutations test assumes that the observations are drawn from one population and then treatments are randomly allocated. Thus in the context of a permutations test for a difference in the means the null hypothesis (no treatment effect) becomes equivalent to a statement that any difference between the groups under the null hypothesis is a consequence of only the random allocation of the values into the groups. The significance of the observed differences between the treatment groups is thus just a measure of how unusual the observed allocation is relative to all possible random allocations.

The significance is thus calculable by enumerating all possible allocations and finding from that list the distribution of random differences between group means. The probability under the null hypothesis of obtaining a difference as great as that observed or greater is equal to the proportion of the population of differences that is as as that observed or greater.

More detail, some references and free software (a bit archaic...) can be had from my webpage: http://www.pharmacology.unimelb.edu.au/statboss/permutations%20test.html

edited Mar 22 '12 at 01:47

answered Mar 20 '12 at 05:32

Michael Lew

10,995
2
29
47

1

How, specifically, does one test a *mean* with a permutation test? Merely permuting the samples won't change the mean of either sample or their difference, so you must have something more complex in mind. – whuber Mar 20 '12 at 16:01
As the question appears to be a homework assignment, I chose to give only a direction rather than a full explanation. If the mean of each permutation is taken as the statistic of interest then the test will test the mean. The permutation test does not require the data to be normally distributed or continuous, so it is probably a good choice for the data provided. – Michael Lew Mar 20 '12 at 20:16
1

@MichaelLew: I completed the course in which I studied this material 2 years ago, I only needed my memory refreshed. – dplanet Mar 20 '12 at 20:49
1

My point, Michael, is that the mean does not vary when you permute the data, so how can you possibly test any hypothesis about the means with this approach? I am likely misunderstanding your intention (but at the moment don't see exactly how), which is why some clarification would be very helpful. – whuber Mar 21 '12 at 15:36
@whuber: The test I'm suggesting is one where you effectively tabulate all possible permutations of the data in groups of the same sample size as the original observations (resampling without replacement). Calculate the means for each permutation and the difference between the means. Then you compare the observed difference between means to the distribution of the resampled differences between means. The proportion of resampled differences that equals or exceeds the observed difference is the P value. – Michael Lew Mar 21 '12 at 20:21
See this paper: Why Permutation Tests Are Superior to t and F Tests in Biomedical Research John Ludbrook and Hugh Dudley The American Statistician Vol. 52, No. 2 (May, 1998), pp. 127-132. http://www.jstor.org/discover/10.2307/2685470?uid=3737536&uid=2129&uid=2&uid=70&uid=4&sid=55930348183 – Michael Lew Mar 21 '12 at 20:24
Thanks, Michael. Just to be clear: it sounds like you are proposing *grouping* all data first and that a "permutation" in your application consists of a reordering of all 20 individual values (and not, say, reordering 10 ordered pairs of values as suggested in a comment by Zen) followed by (effectively random) segregation into two groups. Your clarification is potentially so helpful that it would be nice to see you insert it directly in your reply :-). – whuber Mar 21 '12 at 20:25
(+1) Thanks for updating the reply: that makes it a good resource in itself rather than just being a pointer to somewhere else. – whuber Mar 22 '12 at 02:21

Significance tests

2 Answers2

Linked

Related