I have a dataset that is derived in the fashion of 0-sum game (RNA-Seq data: the total amount of reads is fixed, inclusion of one read belonging to one feature means the exclusion of another read belonging to another feature). I imagine such situation yields dependency between variables and violate the linearity assumption required for linear model. right?
Asked
Active
Viewed 15 times
1 Answers
2
Not quite. What it violates is not linearity (i.e., that the response variable is a linear function of the parameters) but collinearity (no linear combination of the predictor variables adds up to a constant).
Lack of collinearity is not a necessary assumption of linear modeling, although it makes some analytical approaches more convenient. Strong (not perfect) collinearity is not as much of a problem as people think (especially for predictive models). Perfect collinearity (as in the case where a set of the predictor variables add up to 1.0) is often handled automatically by statistical software (essentially by throwing out one of the predictors), but you can always drop one category yourself. (See this related question ...)

Ben Bolker
- 34,308
- 2
- 93
- 126
-
Thanks for the clarification. I sense these linear model assumptions (independency, linearity and non-multicolinearity) are somehow connected, right? In such 0-sum game situation, there is a dependency between variables and this dependency causes colinearity and consequently non-linear relationship to Y. Do I get it right? – unicorn Dec 27 '18 at 05:08