8

I recently completed a study whereby I randomly assigned participants to one of two treatment groups. I tested participants at baseline, immediately post-intervention, 1 months, and 4 months on a somewhat large number of outcome variables. I was planning on running several mixed ANOVAs to examine group x time interactions. Some of the comparisons will be 2 (group) x 2 (time: baseline and post-intervention) comparisons and some will be 2 (group) x 3 (time: baseline, 1 month, 4 month) comparisons.

Before beginning my analyses, I compared the two treatment groups on all baseline variables. I found that the groups differ on 4 baseline variables if I use an alpha level of .05 or 2 baseline variables if I use an alpha level of .01 to compare the groups.

I have two questions about this:

  1. What alpha level should I be using to compare the groups at baseline? I was thinking an alpha level of .01 because I am comparing the two groups on 24 baseline characteristics and I thought I should chose a more stringent alpha level than .05 to reduce family-wise error rate seeing as a large number of tests are being performed, but from my readings it seems most people use .05. What do you recommend?

  2. What do I do about these differences? I could include these variables as covariates, but my sample size is quite small and using 4 covariates does not seem appropriate (which is also partly why I am favouring only accepting differences if they are significant at the .05 level)

Any help on this would be very much appreciated!

Momo
  • 8,839
  • 3
  • 46
  • 59
Rachel
  • 81
  • 3

3 Answers3

8

As Stephen Senn has written, it is not appropriate to compare baseline distributions in a randomized study. The way I like to talk about this is to ask the question "where do you stop?", i.e., how many other baseline covariates should you go back and try to retrieve? You will find counter-balancing covariates if you look hard enough.

The basis for chosing a model is not post-hoc differences but rather apriori subject matter knowledge about which variables are likely to be important predictors of the response variable. The baseline version of the response variable is certainly a dominating predictor but there are others that are likely to be important. The goal is explaining explainable heterogeneity in the outcome to maximize precision and power. There is almost no role for statistical significance testing in model formulation.

A pre-specified model will take care of chance differences on the variables that matter - those predicting the outcome.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thank you all for your responses. The baseline variables that differ from one another based on the multiple t-tests are the baseline levels of some of the outcome variables (e.g., baseline depression scores differed and depression at 1 and 4 months is one of the outcome measures). – Rachel Nov 17 '12 at 15:53
3

Normally what you should care about in comparing the two groups at baseline is not so much statistical significance of differences but size of differences: is any of these differences large enough to matter to the study? Large enough to affect the group comparisons and variable relationships that are the focus of the research? Large enough that adjusting for it (by using it as a covariate) is necessary?

Now, your case is a little bit interesting in that, even with random assignment, you've got 4 out of 24 variables showing differences significant at the .05 level (17% instead of the expected 5%). That may seem concerning for your randomization process or some other aspect of the study. But theoretically, if the randomization were done flawlessly and there was no attrition in either group afterwards, a result this extreme or more so should occur 2.4% of the time, based on 24!/(4!(24-4)!) (.05^4) (.95^(24-4)). That is not really such a rare occurrence after all. What you have could well be a set of random differences. I'd stick with judging based on magnitude of differences.

rolando2
  • 11,645
  • 1
  • 39
  • 60
  • 2
    Excellent point about multiplicity. Regarding assessing differences, I think that looking for large differences is very highly correlated with looking for small P-values; I don't recommend either. – Frank Harrell Nov 17 '12 at 15:19
  • How do I know if is any of these differences is large enough to matter to the study and large enough that adjusting for it (by using it as a covariate) is necessary? The effect size for each of the four differences at baseline (using cohen's d) is 0.78, 0.64, 1.06, and 0.89 respectively. – Rachel Nov 17 '12 at 15:51
  • 2
    You don't and can't. Think about formulating the right model up front rather than post hoc adjustments. – Frank Harrell Nov 17 '12 at 15:53
  • Okay that makes sense. Should I analyze my results in another way then rather than by using a mixed model design? Or is it enough to mention the differences but not adjust for them? – Rachel Nov 17 '12 at 15:55
  • 3
    My only thought is to ask a subject matter expert what the important predictors of the response variable are likely to be, without telling the expert about the differences you found, then adjust for these predictors. – Frank Harrell Nov 18 '12 at 04:26
  • 1
    Pl see related question: https://stats.stackexchange.com/questions/486139/how-to-handle-large-baseline-difference-in-a-randomized-trial?noredirect=1#comment897710_486139 – rnso Sep 06 '20 at 04:06
2

+1 to @FrankHarrell. I might add one small point. If you randomly assigned your participants to the groups, any 'significant' differences in covariate values prior to intervention are necessarily type I errors.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    Nicely put, and your comment points out the difficulty of specifying exactly what population inference baseline difference testing is aimed at. – Frank Harrell Nov 17 '12 at 15:11
  • 2
    @gung -hello! What about this point of view: In a RCT, the 2 groups are all we have. Of course they come from the same population: there are not 2 populations about which to make any errors, Type I or otherwise. So statistical significance is irrelevant, but large differences could well matter and could well require adjustment via the use of covariates. – rolando2 Nov 17 '12 at 15:22
  • 2
    I like the first part, but the last part is more complex that it seems, and post-hoc adjustments can create bias while failing to adjust for large response heterogeneity explainers. In addition, the data are incapable of telling us which set of covariates to adjust for. – Frank Harrell Nov 17 '12 at 15:27
  • @rolando2, the way I think about it is this: Your population is the population from which your sample was drawn; the 'treatment' is your random assignment procedure; & the response variable is the covariate you are checking. The t-test checks to see if random assignment procedure is associated with the mean value of the covariate. Now, if your assignment procedure is flawed, it is perfectly reasonable that it may be associated w/ resulting covariate values, but if it is truly random, by definition it can't be & thus every 'significant' finding is a type I error. – gung - Reinstate Monica Nov 17 '12 at 15:37
  • As for your second point, it has a lot to speak for it. I often think it's more important to concentrate on how large an effect is & what that might mean substantively, than whether it's significant. However, I agree w/ FrankHarrell that large effects & small p-values tend to go together, so I don't think that gets us out of the problem. I can imagine someone looking at the data, and thinking that there is a risk of getting a result due to a failure of the randomization procedure, but the appropriate response would be to run another study, not make post-hoc adjustments. – gung - Reinstate Monica Nov 17 '12 at 15:42
  • 2
    You are adjusting for your outcome variable at baseline anyway; that's standard. You then are trusting that your randomization procedure is valid and therefore affords valid inferences. If you believe your assignment procedure was flawed & that your resulting inferences are invalid, you have to start over by gathering a new sample, assigning your participants to treatment groups via a truly random procedure that will allow you to have confidence in your conclusions, & re-runing the study. – gung - Reinstate Monica Nov 17 '12 at 15:53
  • @gung Can I assume that any baseline differences are random error then and proceed with the analyses without adjusting for these values? – Rachel Nov 17 '12 at 16:11
  • You adjust for the outcome at baseline. This is standard procedure. Other than that, you fit the model you had planned on before you began gathering data. If your randomization procedure is valid, then your inferences should be valid. It's entirely possible that this study will yield some type I or type II errors in the final analysis, but that is always part of data analysis, & is accounted for. – gung - Reinstate Monica Nov 17 '12 at 16:38
  • If I am interested in looking at change in outcome variables over time and I control for outcome variables at baseline, am I no longer able to examine changes? – Rachel Nov 17 '12 at 16:46
  • I don't understand your question. I assume you are fitting a mixed effects model w/ random effects for the intercept & time, you adjust for the level of the outcome variable at baseline, & possibly for other covariates you had decided on in advance. The parameter estimates you get from this model are unbiased, & the tests / p-values are valid. – gung - Reinstate Monica Nov 17 '12 at 16:52
  • Thanks that answers my question. One final question: should I use the baseline outcome variable as a covariate only when testing that outcome (i.e., not to test other outcomes) or in all tests? – Rachel Nov 17 '12 at 17:01
  • The model gives you tests of all the covariates, & includes the baseline. Thus it is included in all tests, which is appropriate. Note that all of this info is entailed in Frank Harrell's answer (by my reading of it), so you should probably accept it by clicking on the check mark to its left. – gung - Reinstate Monica Nov 17 '12 at 17:05
  • Pl see related question: https://stats.stackexchange.com/questions/486139/how-to-handle-large-baseline-difference-in-a-randomized-trial?noredirect=1#comment897710_486139 – rnso Sep 06 '20 at 04:06