1

I am interested in studying the effect of 7 predictors (X1:x7) on one dependent variable (Y). 3 out of the 7 predictors are mutually exclusive (x1, x2 and x3).

Is it OK to include (x1, x2 and x3) along with the other predictors in the same regression equation? OR

Should I use three regression equations so that only one of the three mutually exclusive variables (x1, x2 and x3) can be tested along with the rest of the variables?

They variables (x1, x2 and x3) measured on likert scale from 1 to 7. Conceptually these variables represents experiences that can never happen at the same time.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Abdul1123
  • 11
  • 3
  • What are these mutually exclusive variables? If they are classes or categories of a factor (e.g. gender: male or female), there is no problem with including them all. – Frans Rodenburg Oct 12 '17 at 14:20
  • They are variables measured on likert scale from 1 to 7. Conceptually these variables represents experiences that can never happen at the same time. – Abdul1123 Oct 12 '17 at 14:25
  • Can't you just say *what* those variables are? That makes it easier to provide a clear answer. If they are mutually exclusive, you could consider disregarding them altogether and instead creating a variable with categories x1, x2, x3, depending on which of these mutually exclusive variables were observed. – Frans Rodenburg Oct 12 '17 at 14:33
  • Thank you for your help! They represent how satisfied a volunteer is with the level of work he/she does in a given day (x1: too much work), (x2: too less work), and (x3 No work at all) – Abdul1123 Oct 12 '17 at 14:48
  • Then I would go with the approach I suggested. You could only do inference about the satisfaction scores within those variables for the subsets that filled those in anyway – Frans Rodenburg Oct 12 '17 at 14:50

1 Answers1

2

If by "regression equation" you mean linear regression, than you are in trouble, because linear models are not well suited for missing data. You could try to combine these data with dummy variables if they are just three different measures of something similar, but not in general.

On the other hand you might resort towards different regression methods that can deal with missing data, like regression trees or random forests.

See e. g. "advantages" listed on page 2 in http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf

and the accepted answer in How do decision tree learning algorithms deal with missing values (under the hood)

Bernhard
  • 7,419
  • 14
  • 36