3

I have a question regarding adjusting vs matching when the confounding status is largely different between groups. For instance, men are more prone to have Parkinson's disease and vascular diseases; whereas, females are more susceptible to Alzheimer's disease and MS.

Say that one wishes to assess the vascular risk to Parkinson and dementia. In this case, age and sex are known to be strong confounders to both the risk and the outcome. Should adjusting the confounder in the regression more reliable or matching?

I am asking because I got very different results in a very well sampled population-based cohort. On one hand, the vascular risk was highly associated with the outcomes (OR=14.4 [5.92,35.2]) but it was completely gone after I matched the two groups (disease vs disease-free)(OR=1.29 [0.92,1.82]). The results were pretty robust in the matching groups (I've tried to match with different ratios and different methods several times).

I personally think that with a great difference in age and sex distribution, regression adjustment may not be able to account for confounding fully. Therefore, the results from matching are more reliable. One evidence to it is that after matching, the PD only contributes to a 0.1 increment on the score of vascular risk. Therefore, it is unlikely that the association was real.

Willie
  • 49
  • 4
  • Also this: [Matching vs simple regression for causal inference?](https://stats.stackexchange.com/questions/431939/matching-vs-simple-regression-for-causal-inference) – Noah Dec 23 '20 at 02:47

1 Answers1

5

Generally speaking, matching is suggested if

  • not all data have been collected and you want to save $ or
  • the measurements you want to adjust for are hard to model (usually because of having a large number of distinct categories), e.g. occupation or zip code

Your situation may be more suitable for model-based adjustment, but the modeling exercise will expose the absence-of-interaction assumptions you would need to make depending on having unbalanced data.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • This makes sense. When I saw the results from the matched cohort, I was wondering if the interaction was the way to go. And, it turns out that when I add interaction terms into my analyses, all the weird associations disappeared. And, the interaction terms were not associated with the outcomes either. This means that they are not real. Also, as pointed out in many simulation studies, matching can strip off the effects of the matched factors entirely. – Willie Dec 18 '20 at 19:08
  • 1
    Be sure to not try to interpret interaction terms, or main effect terms, in isolation. Concentrate on "chunk tests" for combined effects, e.g., pooled effect of main effect + interaction effects. Such pooled tests are independent of coding, unlike the separate tests. If the factors involved are A and B, the pooled test of A + (AxB interaction) with at least 2 d.f. tests whether A has an effect for **any** level of B. – Frank Harrell Dec 19 '20 at 13:38
  • Thanks for the comment. I want to make sure that I understand correctly. Since my interaction terms 'cancelled' each other out (i.e., age remained as a risk but not for the cardiovascular risk and the age*vascular term), I should interpret the original positive association with cardiovascular risk as a 'false' association. Am I correct? – Willie Dec 21 '20 at 21:17
  • I wouldn't say it that way. Form the contrasts of interest (single differences or double differences (differential, i.e., interaction effects) and get confidence intervals for them. Don't worry so much about individual terms in the model. – Frank Harrell Dec 22 '20 at 12:51