1

I'm running a multinomial logistic regression on 8 independent variables for 180 observations in Stata (version 11). The dependent variable is categorical with 7 outcome categories, categories 1-6 (solving strategies for a household with a resource conflict) and category 0 (no solving strategy because the household doesn't encounter any resource conflicts).

I was advised to exclude the observations of category 0. My questions are

  1. what happens, if I keep the sample as it is - will my regression results be biased?
  2. what the trade-off between selection/sample bias and data exclusion is, because in the latter case, the sample will be reduced more than half.
  3. if a simple way around sequential/nested logit (nlogit)exists, as a solution to this problem (I tried it but I guess it exceeds my Stata skills and my expectations on the this estimation).
Annette
  • 21
  • 4
  • Depends on your definition of "selection bias". I'd call it asking a different question. Clearly, you need to report what you did. Your question 3 is hard to decode. If you're asking for specific Stata solutions, that's off-topic here. – Nick Cox Feb 11 '15 at 10:51
  • I'm thinking of selection bias or selection effect in that sense that data distortion not already occurs during data collcetion. Rather, the selection of individuals, groups or data for analysis such that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed. Is there a better technical term for what I'm looking for?I would appreciate any input as this might help me to find a suitable solution.For question 3,in sequential logits,a first and second run are done to achieve information without data exlusion. – Annette Feb 11 '15 at 12:02
  • I don't understand the problem, so you need a sociologist or economist to weigh in here. At the simplest level, if you were (say) interested in people, but chose women only, clearly that differs from being interested in women and chose women only. – Nick Cox Feb 11 '15 at 12:07
  • I think it is the former case: My data is on people, or more specifically, households, but I'm only interested in the ones in conflict over forest resources (category 0 no conflict, categories 1-6 in conflict). Again, about those in conflict I want to specifically know which of the solving strategies 1-6 of forest conflict they choose. If I use the complete data set, then part of it (coded 0 for conflict) remains unused and only selected data is considered. In my understanding, sequential logit in the first step distinguishes between category 0 and 1-6, and in a second, tells me who has 1 to 6 – Annette Feb 11 '15 at 12:28
  • The term "bias" is thrown around a lot and it can be confusing. Bias is always defined relative to some "true" underlying parameter value. The reason nobody knows how to answer this question is that you haven't expressed what the bias is relative to. You need to state the underlying model you are trying to estimate – shadowtalker Feb 11 '15 at 16:27
  • In the context of causal theory, selection bias is a type of collider bias. See http://stats.stackexchange.com/a/33895/2981 for a brief overview. If your selection variable is a confounder, confounding bias will be removed. If your selection variable is a collider, bias will be introduced. – jthetzel Feb 11 '15 at 16:51
  • Thanks. @ssdecontrol: Yes,I may have misused the term,but somehow I assumed that selection bias/effect is an established concept.I didn't even consider the direction of bias,only,that the arbitrary use of a selected part of my data might cause "some" bias. So the question is rather basic and trivial. @ jthetzel: Thanks for the link. However, I'm unsure if this is what I'm looking for,and maybe I really misunderstood what conventionally is meant by selection bias.I have no specific confounder/collider variable,but I would on purpose only regress on part of my data (according to one criterium) – Annette Feb 11 '15 at 18:00
  • @Annette it _is_ an established concept. But the notion of "some" bias is a misconception based on the sloppy use of the term in econometrics – shadowtalker Feb 11 '15 at 18:02
  • @ssdecontrol: Ok,then I might specify the presumed selection bias to sampling bias - at least,this is what it amounts to in my perception.About the model I'm estimating: I "take" a non-random sample from my data set of 180 observations (which would be all households in conflict with a solving strategy from outcome category 1-6) in that sense that I run a regression on all observations, but am not interested in outcome category 0 (households without conflict,and consequently,no solving strategy).Because I leave these observations out of my interpretation,I wondered if I have to take precautions – Annette Feb 11 '15 at 18:15
  • Could you expand on why you were advised to exclude subjects without exposure to resource conflicts? Is non-exposure to resource conflicts associated with the other independent or dependent variables? – jthetzel Feb 11 '15 at 19:49
  • @jthetzel: It's not that I excluded them yet, it's that I'm not interested in part of my data. Just to reiterate: At the moment, I run a multinomial logistic regression on all of my observations. The dependent variable is categorical and has 7 outcome categories: 0 is for households without conflicts, and thus, no conflict solving strategies. 1-6 are 6 different solving strategies of households in conflict. I'm interested in the households with solving strategies, i.e. outcome categories 1-6. Thus, I conciously exclude households coded 0 from interpretation, even they're part of my regression. – Annette Feb 11 '15 at 20:51

0 Answers0