I have a dummy explanatory variable that indicates whether a subject responded "yes" to a question and then the sub-question response (on a continuous scale of 1 to 5) for those that said yes. Obviously for the ones that said "no" to the question, I have no sub question response data (as they were only asked to answer the sub-question if answered "yes"). How would I account for this in a regression?
Asked
Active
Viewed 33 times
1
-
One solution is here (maybe a dup): https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model/372258#372258 – kjetil b halvorsen Sep 16 '20 at 14:56
1 Answers
1
I would suggest either $(1)$ breaking out the "yes" data into a separate regression to understand the effects of the continuous random variable $\in [1,5]$, or $(2)$ assigning $0$ values to all of the "no" responses, and running a regression from there (though this is not an ideal solution, to be sure).
Unfortunately, since you only have the continuous data for a subset of your respondents, it makes a straightforward answer a bit more difficult. You can run a regression on the subset, and then in your analysis write-up, preface the explanation with something like "For users who responded 'yes' to Question (a), simple linear regression found that [... relationship ...]".

ERT
- 1,265
- 3
- 15
-
Also, see [this answer](https://stats.stackexchange.com/a/6565/28500) to a related question. Coded properly, you don't have to run separate regressions, just interpret appropriately the results of a single regression incorporating the dummy for having answered along with the continuous predictor. – EdM Jul 24 '18 at 15:25
-
Thanks @EdM, that answer is very helpful, actually! Although, some may find it easier to interpret the results of separate regressions, especially in the case of writing an analysis for general consumption. – ERT Jul 24 '18 at 15:27