1

I am trying to investigating data from a randomised control trial, where treatment allocation is done on a 2:1 ratio (2 patients on the experimental treatment for every 1 patient on placebo). 400 on experimental treatment and 200 on placebo, for example.

I want to build a model to investigate which covariates have an impact on death (binary outcome). In the model I also want to include a treatment term (binary) and look at interactions between the treatment term and other covariates. The death rate on both treatment arms is almost identical (25%, for example).

The dataset comprises only of categorical covariates.

My plan is to build a logistic regression model based on AIC (using best subset selection). I will manually add the treatment term and include any interactions between the selected covariates and the treatment term using AIC.

My questions are: Does the fact that the treatment arms are unequal (400 experimental, 200 placebo) have any impact on the conclusions that I can draw from the logistic regression model?

Do any adjustments need to be made or other methods need to be used to account for this imbalance (my immediate thoughts are no)? I have considered using upsampling to balance the treatment arms to ensure that the variance np(1-p) is the same on both arms. Is there anything such as weighted logistic regression or conditional logistic regression that would be suitable?

Huge thanks in advance for any thoughts!

wholmes57
  • 33
  • 5
  • Does https://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression help? – mdewey Dec 28 '20 at 16:01
  • 1
    I rather like this discussion of class imbalance: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he. – Dave Dec 28 '20 at 16:08
  • Thank you both for your suggestions, certainly interesting discussions. However, these mainly focus on imbalance in the dependent variable. I was hoping to focus more on the deliberate imbalance in one of the covariates (the treatment term) – wholmes57 Dec 29 '20 at 09:42

0 Answers0