How to report a permutation test?

Question

I want to compare a parameter between two independent groups of volunteers (see MWE below). All volunteers had baseline reading (V1) and a subsequent reading by a later visit (V2) for intervention (Interv; Yes/No). The parameters are gene expression data and expressed in fold change (i.e. V2/V1). For brevity, I mentioned only one Par1. I came from here, but I didn't find what I want though how to report was updated in the question but it was not addressed yet I guess, so it might deserve a separate question like this one here.

To report the central tendency, I am intending to use the median as it is less affected by outliers, may be with MAD (Median Absolute Deviation), though not very often used, but I think is the equivalent to SD for the mean.

Subjects without intervention were (n=15), whereas those with intervention were (n=19). I am not a statistician, but I know this from my field that when it comes to gene expression data it is always good to log transform the data before doing analysis (to make the data mimic the normal distribution). So to report the median I used the fold change reading because it is more intuitive to the reader and makes more sense than reporting log values.

To measure the difference between the two groups as inferential statistics, I am more inclined to use the permutation techniques available in R, like the coin package, than to use t-test, thinking that is more robust and less demanding of assumptions (confession: biologists do not respect these aliens, they will meet anybody except assumptions).

After log transformation, I did permutation test, wilcox test (Mann-Whitney-U-test), and t-test just for sake of comparison and learning.
The dataset (use dget(<code below>) to get the dataset:

> dput(dfFold)
structure(list(Interv = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Yes", 
"No"), class = "factor"), variable = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Par1", 
"Par2"), class = "factor"), Fold = c(181.552149430047, 365.287611324048, 
1.37615543475921, 29.4749766533613, 7.30460900092265, 0.55729573732714, 
0.856344617183982, 12.2151438678328, 8.22023432258123, 7.52091826273063, 
11.3960901635291, 141.40637778502, 1.19646833351009, 4234.12741125762, 
9410.18608208175, 0.11188317488791, 3.39142462873992, 1.54986235136038, 
0.138010480189387, 0.150720706883034, 8.58182709551405, 1.65995914129576, 
1.02664060360863, 40.9872498215584, 51.7412085216083, 3.64546611583102, 
5.64546161492277, 8.92993632885674, 4.71129389070826, 1497.44850394305, 
0.0209326180515642, 2.90945271783875, 5.09753127386217, 9.85421224939179
), Log = c(5.20154293711853, 5.90068501952248, 0.31929369443907, 
3.38354165434038, 1.98850551891357, -0.584659233229834, -0.15508239368987, 
2.50267648260455, 2.10659871505853, 2.01768823991267, 2.43327032846195, 
4.95163785700696, 0.179374162082163, 8.35093254365055, 9.14954800731289, 
-2.19030003343294, 1.22125007758044, 0.438166121413757, -1.98042565330713, 
-1.89232677812372, 2.14964683902473, 0.506792988388275, 0.0262919218996518, 
3.71326103837802, 3.9462545340732, 1.29348423542785, 1.73085196838103, 
2.18940926483575, 1.54996258170508, 7.31151794138033, -3.86644666427115, 
1.0679649940249, 1.62875635858491, 2.28789900330443)), .Names = c("Interv", 
"variable", "Fold", "Log"), class = "data.frame", row.names = c(NA, 
-34L))

How it looks like:

> head(dfFold)
  Interv variable        Fold        Log
1    Yes     Par1 181.5521494  5.2015429
2    Yes     Par1 365.2876113  5.9006850
3    Yes     Par1   1.3761554  0.3192937
4    Yes     Par1  29.4749767  3.3835417
5    Yes     Par1   7.3046090  1.9885055
6    Yes     Par1   0.5572957 -0.5846592

To report the central tendency (I included mean also):

> median(dfFold[dfFold$Interv=="Yes",]$Fold,)
[1] 11.39609
> median(dfFold[dfFold$Interv=="No",]$Fold,)
[1] 3.645466
> mean(dfFold[dfFold$Interv=="Yes",]$Fold,)
[1] 960.8452
> mean(dfFold[dfFold$Interv=="No",]$Fold,)
[1] 86.71587

Now the the tests, permutation, then wilcox test from coin, and t-test from stats package:

> pvalue(oneway_test(Log~Interv, data=dfFold ,distribution=approximate(B=9999),conf.int=TRUE))
[1] 0.03590359
99 percent confidence interval:
 0.03128298 0.04097196 

Warning message:
In independence_test.IndependenceProblem(object, check = check,  :
  additional arguments conf.int will be ignored
> set.seed(1234)
> pvalue(wilcox_test(Log~Interv, data=dfFold,distribution=approximate(B=9999),conf.int=TRUE))
[1] 0.05890589
99 percent confidence interval:
 0.05300214 0.06523398 

> t.test (Log~Interv, data=dfFold)

    Welch Two Sample t-test

data:  Log by Interv
t = 2.1519, df = 27.43, p-value = 0.04036
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.09772892 4.04392270
sample estimates:
mean in group Yes  mean in group No 
         3.183037          1.112211

I got a warning message from the oneway_test(), anybody can help explain that?
So permutation test with approximation (B=9999) the difference was significant at 0.05 level, in agreement with the t-test, but not with the wilcox test.

Now the question:
If I were to report this difference based on the permutation test what should I mention? Do I have to include the confidence intervals of 99% of oneway_test()? (see the output), how to change it to 95%? and what is the difference in concept of statistic between oneway_test() and t-test? isn't permutation a computer-intensive technique that can be used with many statistical tests? but when I do it with oneway_test() what kind of test this function eventually uses? Any help to raise my statistical knowledge would be appreciated. Also few lines of comment on the result as one would be expected to write in a report would be much appreciated alike.

At least some of the answers you will find looking at the help of the coin-package. For instance, the c.i. is for the p value and not for the true shift. That's why you get an error when you try to change the nominal level via 'conf.level'. — Michael M, Nov 09 '13 at 18:50
there are 4 vignettes, plus the manual, to which one are you referring? but I was expecting an answer without referring to math intensive statistical texts. What do you mean by the nominal level, can you pls elaborate more on your comment, while hoping for detailed practical answers. — doctorate, Nov 09 '13 at 19:15
I was thinking about the html-help in R, like '?oneway_test'. For instance you will see that you don't have the conf.int argument there. The c.i. in the output gives the possible range for the simulated p value, not for the difference between samples. — Michael M, Nov 10 '13 at 07:20
@Glen_b, but the warning was after `oneway_test` not the `wilcox_test`! the warning message itself is vague. The difference between the two is I think the transformation with `rank()`. — doctorate, Nov 11 '13 at 07:22
@MichaelMayer, thank you but do I need to report this p.value c.i.? and how can one change it from 99c.i. to 95c.i. for example? I didn't find this in the help. — doctorate, Nov 11 '13 at 07:24

How to report a permutation test?

0 Answers0

Linked