Say I'm constructing a GAM for a response variable R
in terms of predictors A
, B
, C
, and D
. Something like this (in quasi-R-code):
R ~ s(A) + s(B) + s(C) + s(D)
Before I construct this model, I check for colinearity by calculating Pearson's correlation coefficients. This shows that A
is slightly correlated with all of the other predictors (e.g., values of around ±0.3). As the correlation coefficients aren't very high individually, I'm okay with proceeding. The best model according to AIC is R ~ s(A)
.
Now, I'm interpreting the model results and they don't make a lot of sense physically (i.e., the shape of the relationship between A
and R
). What I'm concerned about is that the reason the top model was selected is because A
is kind of a composite variable of all of the predictors and therefore provides a proxy for all these variables without incurring the penalty of including all of these extra terms.
My question: is there a test/method that accounts for cumulative collinearity across multiple predictors, like this? I'm sure that my terminology is incorrect, so if someone could tell me what I need to Google, that would be a great help.