Reconstruct logistic regression for external validation

Question

I want to externally validate a logistic regression model that my former colleague constructed together with another company, and that has been published. I have data from another cohort to which I want to apply the model. I'm using R, but I think my question is more about the concept of logistic regression in general.

I have found a lot of topics on how to validate a newly constructed model in R (glm class), but because I don't have the original dataset, I have to "reconstruct" the logistic regression to determine an individual's probability of the outcome. The article only presents prevalences and odds-ratios.

I thought of two strategies:

Using the proportions and odds-ratios from the article, to work out the model's intercept (it's not given in the article), using $$ intercept = log(baseline\_odds) - ( log(OR_1) * X_1 + ... + log(OR_n) * X_n) $$ with $X_i$ the prevalence of the study population ($0-1$) with the specified characteristic associated with $OR_i$.
I want to use this intercept to calculate the probability of the outcome in all individuals of my new population:
$$ odds = exp(intercept + log(OR_i) * Y_i)\\ p = odds / (1+odds) $$ so with the prevalences in the validation population.
Multiplying the odds in the new population with the odds ratios from the article, if that characteristic applies, so $$ odds = baseline\_odds * OR_i \\ p = odds / (1+odds) $$

Is my way of thinking correct?

I have multiple problems however:

If I calculate the intercept, it's not exactly the same as the original intercept (I checked with a model with all parameters known). Is this all because of rounding errors?
I am aware that the intercept also contains some information from the specific population (like prevalence in the training population), and corrects for over- or underfitting.
The probabilities I compute are different for both strategies. This makes sense because option #2 ignores all the information that was stored in the intercept, but that option allows me to adjust for a different prevalence in the new population.

My question is which strategy I should use, and how I can tackle the problems I encounter.

Thank you in advance!
(of course I searched on StackExchange and Google, and I found a lot of articles about logistic regression, but unfortunately I couldn't get the answer from there.
I saw Help me understand adjusted odds ratio in logistic regression and Odds and odds ratios in logistic regression and Estimating predicted probabilities from logistic regression: different methods correspond to different target populations (but that one was too difficult for me), and many more)

I'm not sure how simple proportions would allow full conditioning on covariates such that you could back-compute the intercept. Also you have cast the problem so that only the special case where the predictors act linearly in the model will work. — Frank Harrell, Mar 09 '16 at 15:29
Thanks for your reply! I figured that algebraically, since `odds = exp(B0 + BiXi)`, that `log(odds) = B0 + BiXi`, ergo `B0 = log(odds) - BiXi`. Isn't this true? Maybe I shouldn't have used `baseline_odds`, but `mean_odds` above? — Jasper, Mar 10 '16 at 07:24
Watch what you mean for log(odds). It is the log of the odds that $Y=1$ when all the covariates are set to zero. It is not an average of anything. — Frank Harrell, Mar 10 '16 at 13:45
Thanks @FrankHarrell for your reply. I'm confused, doesn't the intercept create the odds for a hypothetical subject with all covariates 0? The mean of all predicted odds (with all covariates at the appropriate value) should be close to the observed mean odds in the training population, shouldn't it? — Jasper, Mar 10 '16 at 15:19
yes; no; you are confusing unconditional probabilities with conditional probabilites. Another way of understanding this is that to get the correct intercept computed you need all the raw data and need to use the computed linear predictor (excluding the intercept) as an offset term in a new maximum likelihood estimation procedure to estimate the intercept. — Frank Harrell, Mar 10 '16 at 15:38
Okay thank you very much: something for me to dive into. Seems I need to ask the pharmaceutical company to either provide the intercept, or to ask the entire dataset. You have answered my question, to bad I cannot "accept" a comment as answer. Thanks again! — Jasper, Mar 10 '16 at 16:20
You can still click on 'this comment adds something useful to the post' — Frank Harrell, Mar 10 '16 at 19:18

Reconstruct logistic regression for external validation

0 Answers0