0

I have around 200 subjects and I would like to check the power for 90 of those subjects. It is a case control study.

I did this:

library(ssizeRNA)

fc <- function(x){exp(rnorm(x, log(2), 0.5*log(2)))}
#fc=2
size1 <- ssizeRNA_single(nGenes = 544, pi0 = 0.8, m = 90, mu = 3489, disp = 0.17, fc = fc,  fdr = 0.05,power = 0.8, maxN = 120)
 size1$ssize
 p=size1$power
 p=as.data.frame(p)
 p[p$n==90,]

I got this:

p[p$n==90,]
n   0.8
90 0.949

 pi0 ssize power
0.8    17   0.8

Plot is attached. Does this mean that to achieve power of 80% I wold need 17 cases and 17 controls? Or to achieve power of 90% I would need 90 cases and 90 controls?

Does it make more sense to fix fc=2 in that case I would be to detecting log fold changes of at least 2 or greater given FDR=0.05 as opposed to detecting log2 fold changes across the entire fold-change distribution (as I did above)? Is this a paired analysis for power/sample size?

Package used is: https://cran.r-project.org/web/packages/ssizeRNA/vignettes/ssizeRNA.pdf

I should mention that I do have to test this for paired analysis and this software doesn't seem to have option for "paired"

Do you have any recommendation for software which would do power analysis for paired study for RNAseq data? enter image description here

anamaria
  • 17
  • 5

1 Answers1

1

It's good to see investigators examining power before they do the experiments.

This method for calculating power in RNA expression experiments is based on this paper, which seems to contain the answers to some of your questions. The work in that paper was for 2-sample tests, not paired tests, and the paper examines the situation where $n=n_1=n_2$, so the reported sample size is for each group. The manual for this function agrees that size1$ssize represents "sample sizes (for each treatment) at which desired power is first reached."

As I read the power graph, the 80% power with 17 in each group is correct. If you need to restrict analysis to a total of 90, that would be 45 per group or a power of 0.92 as I read the graph. With 90 per group you seem to have 0.95 power from the graph.

Having paired case/control selection to balance with respect to other covariates is probably a good idea, but it's not clear that you would gain anything with paired analysis. With $n$ cases in each group a simple unpaired analysis would have $2n-2$ degrees of freedom, while the paired analysis would only have half as many, $n-1$. For a paired analysis to be more effective you thus need the pairing to explain a lot of the variance among the observations. I wonder whether that's the case with RNA expression data.

You have to judge for yourself whether it makes more sense to "fix fc=2 ...to [detect] log fold changes of at least 2 or greater ... as opposed to detecting log2 fold changes across the entire fold-change distribution." That would depend on what you are trying to accomplish with this study. Discuss that with your colleagues.

My hunch is that the power estimates depend heavily on the other parameter settings: numbers of genes nGenes, proportion not differentially expressed pi0, mean counts in control group mu, and expression dispersion disp. I'd recommend playing with those settings rather than putting too much faith in any one power estimate. If the budget allows, go up even higher

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Hi Thank you so much for this elaborate answer. So after consulting with my PI it seems we will be doing paired test. So let's say I want to test this for 90 subjects and they will be subjected for say high and low glucose treatment. In that case would it make sense to calculate power for that by doing: check.power(m = 90, mu = 3489, disp = 0.1, fc = 2, sims = 10) – anamaria Jul 31 '20 at 19:12
  • @anamaria according to the manual page for that function, `m` is the number per group, not the total number. I can't say anything about whether the values of `mu` and `disp` are appropriate, that depends on your experimental situation. I would err on the high side for the number of simulations, so stay with the default `sims` of 100 or go even higher. You run a risk by accepting the default values `nGenes = 10000` and `pi0 = 0.8` if they don't represent your situation. Again, play around with the parameter settings to see how sensitive your power estimates are to changes in the values. – EdM Jul 31 '20 at 19:45
  • Thank you so much for these inputs! Just I am still not clear can I use these results for paired analysis and how to interpret that. If I use check.power(nGenes =19742,m = 90, mu = 3489, disp = 0.17, fc = fc, sims = 10,pi0 = 0.5) I get pow_bh_ave=0.948. Can I say than that the power is 95% for paired analysis for these 90 subjects? I will increase simulation number, this is just for testing purposes. – anamaria Jul 31 '20 at 23:13
  • Sorry to bother you but do you have any any recommendation for software which would do power analysis for paired study for RNAseq data? – anamaria Aug 01 '20 at 00:35
  • @anamaria when you set `m = 90` that means 90 _in each group_, or 180 total. This program does not handle paired analysis, the stated power is for unpaired 2-sample tests. I don't know software for paired analyses for RNAseq. That would require an additional parameter to set, an estimate of the correlations of expression levels between each of a pair. If you plan for unpaired 2-sample tests then you will have even more power if the pairing is effective. See [this page](https://stats.stackexchange.com/q/38102/28500) for discussion of paired versus 2-sample tests. – EdM Aug 01 '20 at 02:08
  • Thank you so much I will look into that. – anamaria Aug 01 '20 at 02:45