7

Suppose we have case-control data, where cases have some disease ($Y$) and controls don't and we are interested in the association of some other variable(s) ($X$). I know that in this scenario we cannot use the disease as the response variable because of the experimental design (the marginal distribution of disease is fixed by sampling).

I also know that the odds ratio however can be calculated in such designs because it takes the same value when using the conditional distribution of $X|Y$ or $Y|X$.

My question is: is it appropriate to use logistic regression in this case to model the odds of disease? i.e. $\text{logit}\left\{\dfrac{P(Y=1)}{1-P(Y=1)}\right\} = \mathbf{\beta} X $

context: GWAS (Genome Wide Association Studies) are typically case-control studies, where one wants to assess the association between disease and number of minor alleles of a particular SNP. $P$-values are typically obtained from a chi-squared test of independence. However, this doesn't allow you to add in other covariates. A lot of the packages that offer GWAS analysis also allow you to do logistic regression. I just wanted to verify that it was valid to do such an analysis.

bdeonovic
  • 8,507
  • 1
  • 24
  • 49

1 Answers1

13

Logistic regression is a valid inferential method, because, as you've noted you're modeling the odds. The coefficients on explanatory variables $X$ will also be valid. However, the intercept term $\beta_0$ will not be; this is because the number of positive and negative outcomes are fixed by the case-control design. So the intercept term will be meaningless, but your other estimates are fine. More information is in Agresti, An Introduction to Categorical Data Analysis (second edition; 2007), p. 105.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 3
    +1 - Also see [this response](http://stats.stackexchange.com/a/68726/1036) about how to estimate the correct intercept plus other references. – Andy W Sep 09 '13 at 14:05
  • Thank you sir. This is what I believed. I was also reading Agresti, and just wanted to verify I was interpreting what he was saying correctly. – bdeonovic Sep 09 '13 at 14:15