1

I am trying to analyze the effect of different land-use cover predictors (% of land area) on my response of interest. The land-use predictors cover the dominant types of land-use and hence collectively almost sum to 100% of the land area. At the moment, I am fitting a standard multiple regression in R:

lm(response ~ land-use1 + land_use2 + land_use3 + land_use4 + land_use4, data=mydata)

However, given my land-use predictors are inherently correlated — when one land-use cover increases, other land-use covers must decrease (since they sum to 100%) — I am not sure if this standard regression model is appropriate.

The pair-wise correlations among the land-use predictors aren't super high (all |r| <0.7). Variance inflation factors suggest some multicollinearity but again it isn't too high (VIF about 4).

But still it troubles me that my land-use predictors sum to 100%. I am not sure how to interpret the regression coefficients associated with each land-use predictor, since it doesn't make sense to estimate the effect of an % increase in each land-cover without factoring in the associated decreases in other land-covers.

Are there suggestions of types of models that might be more appropriate?

MarianD
  • 1,493
  • 2
  • 8
  • 17
  • The intrinsic correlation is not a problem. Indeed, it's routine: everyone who uses categorical regressors faces this problem. It's possible, though, that the relationships between your land cover proportions and `response` aren't linear and that a transformation of the proportions might linearize it. An attractive class of transformations is a natural generalization of the [ILR](https://stats.stackexchange.com/questions/259208). – whuber Dec 21 '20 at 22:19
  • 1
    thanks @whuber that looks really interesting! – user11998664 Dec 22 '20 at 10:22

0 Answers0