I'm trying to understand what are the assumptions for logistic regression when you intend to interpret the parameter as causal? The assumptions for causal OLS regressions is well-known but I can't find a good source for similar assumptions for logistic regressions.
From what I can find on the internet, I think the following assumptions need to hold:
- Errors are distributed according to a logistic distribution and are independent of each other
- No multicolinearity
My intuition tells me that the independent variables should not be correlated with the error term (no endogeneity) as is in the case of OLS regressions, but I can't find support of this anywhere. Does anyone have a mathematical argument for this? As in where would estimation go wrong?
- On the same point, when you're interested in the parameter in front of X1 as the causal parameter and X1 is not correlated with the error term, but X2 is correlated with the error term, although you're not interested in the parameter in front of X2 in a causal sense, can you still run this logistic regression and interpret the coefficient in front of X1 as causal? i.e., would the endogeneity of X2 mess up the parameter estimate in front of X1?
Also I read that the errors are not identically distributed but I'm not sure why. Can anyone explain why this is true?
Are there any other assumptions for logistic regressions when you want to use it for causal inference?