0

I've binary (disease) outcome: 0, 1 with certain independent variables in proportions, and other covariates as - age, sex.

There are Packages that model proportions as outcome variable but not other way round.

Data sample:

Disease P1 P2 P3 Age Sex
0 0.1 0 0.9 59 1
0 0.9 0 0.1 59 1
1 0.6 0.2 0.2 79 0
1 0.3 0.2 0.5 59 0
0 0.2 0.8 0 89 1

P1, P2, P3 are proportions ranging from 0 to 1 inclusive.

How do I proceed?

Edit: P1+P2+P3 =1

Edit2: Edm: The denominator is 2. P1, P2, P3, these are copies of parental chromosome from three ancestries. P1, P2, P3 can be 1.72, 0.20,0.07. To make it simpler in analysis, I divide P1, P2, P3 by two.

Death Metal
  • 111
  • 1
  • 4
  • A proportion is just a number, and pretty much all packages will accept numbers as independent variables... what is the issue here? – jbowman Oct 12 '18 at 19:34
  • Can I simply run: `Disease ~ P1 + P2 + Age + Sex` P3 is 1- (P1+P2) so avoid it? The issue is I'm unsure if I can run a logistic regression. – Death Metal Oct 12 '18 at 19:41
  • You can, but you will notice that the software will automatically remove one of the variables among P1, P2, and P3 because of perfect collinearity. – Penguin_Knight Oct 12 '18 at 20:31
  • Sometimes it's better, instead of using proportions as predictors, to use the actual numerator and denominator values that go into the proportions as predictors. If you could provide more information about the numerator and denominator for P1 and P2 (as @Penguin_Knight points out, P3 is redundant) then it might be possible to give a better answer. – EdM Oct 12 '18 at 20:34
  • @EdM Provided details in post about denominator and numerator. Thank you for your replies. :) – Death Metal Oct 12 '18 at 20:42

1 Answers1

1

P1, P2, and P3 evidently represent the fraction of 3 different ancestries represented in the individuals' chromosomes. If you are sure that P1 + P2 + P3 = 1 (that is, there are no other ancestries), then you should only include two of the three in your model, say P1 and P2. If you don't transform the proportions, the intercept in your model will represent the situation when P1 = P2 = 0, which is equivalent to P3 = 1. So you don't lose any information this way.

The proportions might work OK as predictors; from this perspective it doesn't really matter whether you are doing linear or logistic regression. There is a danger (as with any continuous predictor), however, that the assumption of linearity (in logistic regression, of the log-odds) with respect to the proportions as predictors won't hold. In that case you may need to try some transformation of the proportions to come closer to linearity. See this page and this page for some discussion of transforming proportions used as predictors. The simple explanation of the intercept in the logistic regression, however, won't hold; it will be the situation when the transformed values of P1 and P2 are both 0.

EdM
  • 57,766
  • 7
  • 66
  • 187