2

I'm looking at a common situation in medical research, where only aggregate results get published, but we'd really like to get information that is not directly provided.

For a randomized controlled trial (RCT) comparing drug A vs. placebo in terms of some continuous outcome, the journal article will typically give results like these:

Treatment group n Least-squares mean change from baseline from regression model* SE
Overall trial population Drug A 500 22.5 1.1
Overall trial population Placebo 500 17.5 1.1
Subgroup n Treatment difference from regression model* SE
Overall trial population 1000 5.0 1.5
Male 500 4.5 2.1
Female 500 5.5 2.1
Age >= 65 499 4.9 2.1
Age < 65 501 5.1 2.1
Baseline value <10 500 3.0 2.1
Baseline value >= 10 500 7.0 2.1
Some disease severity score <50 490 8.0 2.2
Some disease severity score >= 50 510 2.0 2.0

* Where the regression model is of the form $\text{Change from baseline}_i = \beta_0 + \beta_1 * \text{treatment} + \beta_2 * \text{baseline value} + \epsilon_i$ for i.i.d. $\epsilon_i \sim N(0, \sigma^2)$ applied either to the overall population or each subgroup.

I would also get your typical baseline characteristics table that tells me overall and by treatment group how many of the randomized patients are male, female, as well as mean (SD) (or perhaps median + IQR) for age, baseline and disease severity score.

From other sources (such as another small trial where I have individual patient data), I know something about the joint distribution of baseline value, male/female, age, the disease severity score and so on (but those are definitely not independent), but not on the treatment effect. I'm really interested in the regression coefficients of a "full model for everything", i.e. $$\text{Change}_i = \beta_0 + \beta_1 * \text{treatment} + \beta_2 * \text{baseline value} + \beta_3 * \text{female indicator} + \beta_4 * \text{age} + \beta_5 * \text{disease sev.} + \\ \beta_6 * \text{baseline value} * \text{treatment} + \beta_7 * \text{female indicator} * \text{treatment}+ \beta_8 * \text{age} * \text{treatment} + \\ \beta_9 * \text{disease severity} * \text{treatment} + \epsilon_i.$$

If I'm willing to make all sorts of assumptions (e.g. linear effect of covariates on outcome and on treatment effect, joint normality and whatever else helps, for example I'm perfectly happy to take a Bayesian approach here, to jointly model IPD and the published aggregate data etc.), are there existing solutions for getting there? Can you point me to any solutions/articles that try to solve this problem?

Björn
  • 21,227
  • 2
  • 26
  • 65

0 Answers0