1

I'm doing a multiple regression analysis in R on an epidemiological data set with five predictor variables as follows:

y ~ x1 + x2 + x3 + x4 + x5

With my data (n = 240), I get highly significant p values (< 10^8) for x1, and p values in the range of 0.09 to 0.35 for the remaining four variables.

Power analysis using pwr.f2.test() in the pwr library in R shows that the power is near 1. (Incidentally, allowing factor interactions does not significantly improve the fit. The residuals are Gaussian.)

I need to determine whether we should collect more data and, if so -- this is the crux of the problem -- what the sample size should be in order to provide a power of >=0.90 for each individual predictor variable. For instance, p value for x2 is 0.091. How much additional data do we need to collect to avoid a Type II erorr (at >=0.90 level) for this variable? Or, more generally, how to determine these requisite individual sample sizes (preferably in R)?

I'm considering the following two approaches:

  1. Use simple linear regression: Model each of remaining variables individually (e.g., y ~ xi, where xi refers to x2, x3, x4 or x5 by itself), and use the results as inputs to pwr.f2.test().
  2. Work off the above multiple regression model: Calculate the partial regression values corresponding to each individual variable from the multiple regression model, and use them as inputs to pwr.f2.test().

Is either of these approaches statistically sound or is there a better way for doing this? Thank you very much in advance.

gtp7061
  • 11
  • 2
  • Closely related: [Simulation of logistic regression power analysis - designed experiments](https://stats.stackexchange.com/q/35940/7290). – gung - Reinstate Monica Aug 19 '19 at 19:05
  • 3
    Because legitimate p-values are *always* between 0 and 1, by "very high p values (< 10^8)" what do you mean?? Since power (and Type II errors) are functions of effect size, what do you mean by "power is near 1"?? – whuber Aug 19 '19 at 21:10

0 Answers0