3

I am running a binary logistic regression with compositional predictors that sum to 100% (demographic categories). I've looked at several postings about this, but can't find a good solution to my problem. Would dropping a single predictor be useful in cases where 0% of the data comes from that category? I.e., if my predictors are race, and I drop "Hispanic/Latino", the hispanic/latino rate in my data ranges from 0% to 6% in each of my cases, so in many/most cases the data is still correlated.

Would a transformation be appropriate here?

I do have the ability to calculate a (rough) number for each category, since I do have the total number of individuals in each case, but I am more interested in effect of the proportion of the racial categories on my independent variable.

I've found these, but they don't present a solution.

What regression model to use when independent variables are percentages to predict % outcome?

Proportions (compositions) in logistic regression

SLAstats
  • 31
  • 2
  • I see explicit solutions in the first thread you reference, so could you please elaborate on what you might be looking for in addition to them? – whuber Jan 10 '17 at 17:02
  • 1
    I'm sorry if it wasn't clear: for a majority of the cases, 5/7 of my categories are 0%. So I'm not certain if dropping a category with low explanatory power, or several categories even, would help: they would still sum to 100%, and be correlated due to the racial population of the city I'm researching. In the first case referenced, it could be expected that none of those categories (bone/muscle/fat) would be 0, which is not the case in my data. – SLAstats Jan 10 '17 at 17:09
  • I'm afraid I don't follow: could you explain the distinctions between a "case," a "category," and a "predictor"? – whuber Jan 10 '17 at 17:12
  • 1
    Each of my cases represents a group that is broken down by demographic data that I'm using as predictors for my binary outcome. In this case, mutually exclusive race categories that sum to 100%. – SLAstats Jan 10 '17 at 17:57

1 Answers1

0

You can use the additive logistic (or multivariate logit) transformation to transform your predictor variables.

$z = log(\frac{x_i}{x_1}), i=2,..,D$ (number of categories) and then perform logistic regression on z.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Michail
  • 65
  • 3
  • 3
    This was your fifth answer which was very short. Please extend your answer and read our tour http://stats.stackexchange.com/tour – Ferdi Feb 28 '17 at 08:11
  • Actually this is the answer. I cannot do anything about it. – Michail Mar 01 '17 at 12:14