I'm using the concurvity function to check for concurvity in my model - a negative binomial GAM using the mgcv package in R. The output I get when comparing to the rest of the model (ie full=TRUE) shows many of the variables have concurvity values higher than 0.9. However when comparing the terms pairwise most values are very small, less than 0.1 (with the worst around 0.5). I am not sure where to go from here, should the full model output be cause for concern and should I refit the model eliminating some terms with high concurvity? or are the pairwise concurvities more informative? is there anything else I can do? The terms in the model are mostly interactions, with a few smooths and one parametric term.
1 Answers
If you have concurvity values that high, I would want to do some additional checks to see if the concurvity was leading to problems in the estimation of the smooths. Much as I would if I was fitting a GLM with highly collinear covariates (i.e. with large VIFs).
The reason for the difference between full = TRUE
(the default) and full = FALSE
is because the former considers whether any particular term is concurved with some combination of the all the other terms in the model. In other words, with full = TRUE
we are concerned with identifying which smooths can be approximated by any combination of the other smooths in the model.
When we have full = FALSE
, we are no longer looking at the combinations of the other smooths but on the pairwise concurvities that combine to give the full = TRUE
concurvity value.
In your example, the 0.9 concurvity with full = TRUE
is telling us that this smooth can be well-approximated by some combination of the other smooths (or parametric terms if the smooths are close to those kinds of functions). The full = FALSE
information breaks down this 0.9 number and tells us which of the other smooths (or parametric effects) are most strongly concurved with indicated variable. So, that 0.9 might be the result of some strong concurvity with the variable where pair-wise concurvity is 0.5 plus the concurvity with some other smooths.
As such, use full = TRUE
to indicate which variables might give you cause for concern. Then use full = FALSE
to see if there is one variable or a small set of variables responsible for the concurvity.
Once you have identified the potential variables associated with high concurvity, you can try dropping one of these variables from the model, refitting and comparing the the estimated smooths for the dropped variable in both models.
You should be watching for example for smooth functions that change sign or shape strongly when different covariates with which the smooth is concurved are included in the model or not. You will need to use your domain knowledge to decide whether to retain the highly concurved smooth or not; perhaps you don't need it as the "effect" represented by the smooth is actually contained in the smooths of other covariates (like including smooths of temperature and altitude and slope in the same model, we probably don't need all...).

- 37,567
- 5
- 110
- 153