0

In my dataset, there's a binary response, some factors, and some covariates. In particular, there are some covariates that are always present when factor1=="A" and these covariates are always missing (NA) when factor1=="B". This missingness is structural, not random; it doesn't make sense for the covariate to have a value when factor1=="B".

To fit this as a logistic regression, I centered the covariates about the mean, and then set the covariate to 0 wherever it had been NA. I reason that in this way, the effect of the covariate on the response will be nullified whenever factor1=="B", which is appropriate.

Is this a reasonable approach to fitting these data?

What is the name for this approach, and where (web, textbook) is there a good discussion?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Jack Tanner
  • 4,552
  • 3
  • 27
  • 39
  • Note, that after removing the mean from your regressor you will not fit it anymore. So you have to potentially include an additional regressor to distinguish NA and non-NA cases. – user12719 Feb 03 '13 at 21:11
  • I know I can [add an indicator regressor](http://stats.stackexchange.com/q/6563/8207), but why is re-centering not sufficient? What do you mean by "you will not fit it anymore"? – Jack Tanner Feb 03 '13 at 23:55
  • Effectively you are only fitting the slope with your new regressor, and no intercept. This might or might not be correct in your case. If you are unsure, I'd fit the model with and without indicator regressor and try to understand the output. – user12719 Feb 04 '13 at 13:44

0 Answers0