1

Suppose a gene is mutated in 30% of the samples.

I want plot the number of samples required to see 30% of them mutated at various level of power.

I want to do this through a simulation so I generated different sample sizes and for each sample size I generated a 1000 trials of simulated number of mutated samples based on p.

p = 0.3
# from sample size 1:100, I generated 1000 binomial random variable
for (n in 1:100) {
  k = rbinom(1000, n, p)
}

How do I construct the power analysis after I have generated some simulated data?

kylelyk02
  • 113
  • 4
  • 2
    Did you want to run some test? Eg, to show that the observed proportion is <.5 .3="" a="" ci="" did="" get="" given="" include="" of="" or="" that="" the="" time="" to="" want="" width="" would="" you=""> – gung - Reinstate Monica Mar 14 '15 at 23:15
  • I want to see how many samples would I need to see 30% samples mutated at 80% power. – kylelyk02 Mar 16 '15 at 21:12
  • What does that mean? Do you want to run a binomial test against a null of .5? – gung - Reinstate Monica Mar 16 '15 at 21:15
  • 1
    i think my null p0=.01 which a sample will be mutated in a gene in a background model, and my p1=.3 – kylelyk02 Mar 16 '15 at 21:37
  • i think my problem is quite similar to this [question](http://stats.stackexchange.com/questions/38439/power-analysis-for-binomial-data-when-the-null-hypothesis-is-that-p-0) except my P0 is .01, so the minimum number of success to reject null should be some number greater than 1 i think. – kylelyk02 Mar 16 '15 at 21:58
  • What language is that? Is it supposed to be `R`? – gung - Reinstate Monica Mar 17 '15 at 17:08
  • Yes. it's R. "The first step is to identify a threshold c for the number of successes such that the probability to get at least c successes in a sample of size n is very low under the null hypothesis " I'm not sure how to determine my threshold c for my case. – kylelyk02 Mar 17 '15 at 17:54

1 Answers1

1

You may be able to figure out everything you need to know from my answer here: Simulation of logistic regression power analysis - designed experiments, which is quite comprehensive.

The basic procedure for simulating a power analysis is to:

  1. Simulate data according to your preferred scenario (the alternative hypothesis).
  2. Run the test you intend to use.
  3. Do this many times, and see what percentage of the time your results are significant.
  4. If you want to solve for the required $N$ to achieve a given power, do the above with different $N$s and find the one that yields the power you want.

In your case, you need to apply the binomial test with the null proportion set to $.01$.

Here's an example in R (note that your code had errors):

set.seed(8063)
p = 0.3
# from sample size 1:100, I generated 1000 binomial random variable
k = matrix(NA, nrow=1000, ncol=100)
for(n in 1:100){
  k[,n] = rbinom(1000, n, p)
}

p.mat  = matrix(NA, nrow=1000, ncol=100)
for(i in 1:1000){
  for(n in 1:100){
    p.mat[i,n] = binom.test(x=k[i,n], n=n, p=0.01)$p.value
  }
}
power = c()
power = apply(p.mat, 2, FUN=function(x){ mean(x<.05) })
min(which(power>.8))
# [1] 5
power[4]
# [1] 0.763
power[5]
# [1] 0.82

Brute force search:

tl = NULL
tl[paste("n", 1:10, sep="")] = list(NULL)
pl = tl
for(n in 1:10){
  tl[[n]] = 0:n
  for(h in 0:n){
    pl[[n]][h+1] = binom.test(tl[[n]][h+1], n, 0.01, "g")$p.value
  }
}
msig = lapply(pl, function(x){ min(which(x<.05))-1 })
pow = c()
for(n in 1:10){
  pow[n] = 1-sum(dbinom(0:(msig[[n]]-1), n, .3))
}
names(pow) = paste("n", 1:10, sep=" = ")
min(which(pow>.8))
# [1] 5
pow[4:5]
#   n = 4   n = 5 
# 0.75990 0.83193 
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Wouldnt it be binom.test(x=k[i,n], n=n, p=0.01, alternative="greater") in my case? – kylelyk02 Mar 17 '15 at 20:06
  • @kylelyk02, you could do that if you wanted. – gung - Reinstate Monica Mar 17 '15 at 20:22
  • i noticed that the power flucuate when sample is low, would it be safer to say power > .8 at power[9] instead of power[5]? – kylelyk02 Mar 17 '15 at 20:51
  • @kylelyk02, I don't follow that comment, but it occurs to me that your situation is so simple it can easily be found analytically by brute force search. I updated by answer. Note that the analytical results agree very closely w/ the original simulation. – gung - Reinstate Monica Mar 18 '15 at 00:27
  • here's what i meant by [power fluctuate](http://imgur.com/1GEiwua). As you can see there are occasional dips in the power as sample size increases. – kylelyk02 Mar 18 '15 at 23:17
  • @kylelyk02, did you get that from expanding on the simulation? Notice that the analytical solution found by brute force search also yields n=5 as where you get 80% power for your scenario. – gung - Reinstate Monica Mar 18 '15 at 23:20
  • yes i generated that from the simulation and plotted the power at each sample size. – kylelyk02 Mar 19 '15 at 00:38
  • @kylelyk02, it could be anything. It might be due to the discontinuous nature of counts, or something in the code, or would be smoothed out by more iterations. But at any rate, you can get your answer from the search of the analytical solution. – gung - Reinstate Monica Mar 19 '15 at 00:42
  • Thank you very much for your thorough answer. I learnt a lot. – kylelyk02 Mar 19 '15 at 16:03
  • @gung-ReinstateMonica How do I run this test when there is uncertainty about `p`. In a binomial scenario, `p = number of successes/number of trials`. What if the number of trials(and number of successes) keep increasing? How do we incorporate those uncertainties into the simulation – Effective_cellist Sep 24 '20 at 09:30
  • @dpk, when you simulate, you decide what `p` you are interested in up front. It doesn't change, because it's the value that you are theoretically interested in, not an empirical value. – gung - Reinstate Monica Sep 24 '20 at 11:34