2

I'm planning to run a survey experiment where I have a pre-measurement of my outcome variable, then I'm randomly assigning my respondents to one of six treatments/stimuli ("NO" = baseline group, Tier A, Tier B, Tier C, Premium 1, Premium 2) followed by a post measurement of my outcome variable. Please find below a completely made-up example of my data in R code.

(Just a side note: am I correct that this is still a between-subjects design, because each respondent sees a different treatment? Or is it within-subjects because I'm measuring my outcome two times (pre and post)?)

My analysis idea is to turn the factor variable "stimulus" with the six stimuli into dummies (well, R does it automatically) and then run a regression with:

  • pre measurement of the outcome as one predictor
  • the six stimuli as predictors
  • the interaction between pre measurement an stimuli as predictors (I plan to do this to "control" for potentially different pre measurement values across the different stimuli)
  • post measurement as my outcome / dependent variable.

My research question is:

  • Which stimulus has the largest effect on the outcome?

With this as some background, I want to calculate the minimum required sample size for such an analysis, however, I'm not sure how to do that. I checked GPower, but am not sure which of the tests is the correct one here or if GPower is able to give me the sample sizes at all.

Specifically, I'm interested in making sure that I can call out differences in the regression coefficients as significant on the 95% level assuming a relatively low effect size of the predictors (let's say d = 0.1).

Any idea/help how to conduct such a sample size calculation (preferably in R or GPower)?


Data + some initial analysis:

library(broom)

df <- structure(list(pre_outcome  = c(0.43, 0.41, 0.51, 0.49, 0.55, 0.56, 0.49, 0.55, 0.52, 0.58, 0.4, 0.5, 0.43, 0.5, 0.47, 0.56, 0.43, 0.53, 0.58, 0.43, 0.6, 0.56, 0.57, 0.6, 0.53, 0.54, 0.59, 0.59, 0.59, 0.55, 0.51, 0.51, 0.5, 0.44, 0.55, 0.6, 0.52, 0.48, 0.54, 0.47, 0.6, 0.48),
                     post_outcome = c(0.54, 0.61, 0.59, 0.54, 0.56, 0.56, 0.65, 0.54, 0.67, 0.76, 0.59, 0.49, 0.6, 0.64, 0.54, 0.75, 0.48, 0.72, 0.71, 0.47, 0.65, 0.66, 0.73, 0.76, 0.57, 0.61, 0.71, 0.7, 0.66, 0.57, 0.66, 0.53, 0.6, 0.64, 0.72, 0.71, 0.55, 0.57, 0.53, 0.62, 0.8, 0.57),
                     stimulus     = c("NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2", "NO", "Tier A", "Tier B", "Tier C", "Premium 1", "Premium 2")),
                class = "data.frame", row.names = c(NA, -42L))

lm_mod <- lm(post_outcome ~ pre_outcome * stimulus, data = df)

tidy(lm_mod, conf.int = TRUE)
deschen
  • 479
  • 3
  • 12

0 Answers0