0

I am currently using a gam modelling approach to describe the effects of several variables (es_sum, nat_sum, sect_sum, and prop_spec) on another variable (btwn). Each of es_sum, nat_sum, sect_sum, and prop_spec vary with a further factor, bin_deg (4 levels). I've used a factor smooth for each term to address this:

Bgam <-  mgcv::gam(btwn~ 
         s(es_sum, bin_deg, bs="fs",  k=9) + 
         s(nat_sum, bin_deg, bs="fs", k=8) +
         s(sect_sum, bin_deg, bs="fs", k=7) +
         s(prop_spec,bin_deg, bs="fs"),
         data=nz_dat, 
       method="REML", 
       family=tw,
       select=TRUE) #double penalty approach

There is a degree of concurvity (quite a bit!) between these smooths (which to me makes sense, as each varies by the level of bin_deg. In a different approach I'm using GAMLSS (family=gaulss) to deal with residual variance varying with each level of bin_deg, but that's a separate issue. What I'm interested in here is concurvity, which for many smoothers in my model Bgam, is greater than 1. Here's the output of concurvity():

concurvity(Bgam, full=TRUE)
         para s(es_sum,bin_deg) s(nat_sum,bin_deg) s(sect_sum,bin_deg) s(prop_spec,bin_deg)
worst       1      3.943891e+32       3.304069e+32        1.070997e+31            1.0000000
observed    1      8.378792e-01       4.576376e+00        5.610474e-01            0.6003964
estimate    1      9.896317e-01       9.778687e-01        9.632215e-01            0.8870775

Clearly I need to deal with this (I will!), but more fundamentally I'm wondering if anyone could explain how it is possible to get concurvity values >1?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    The values aren't supposed to be >1 so there is clearly some bug or issue here. I suggest you email Simon Wood with this example so he can see what's happening and fix it. The summary suggests most of the smooths are almost entirely unidentifiable, which suggests bigger problems than just concurvity; all these are sums of something, are those somethings related in some way? – Gavin Simpson Dec 04 '20 at 16:24
  • 1
    Thanks @GavinSimpson - appreciate the feedback. I'll email Simon Wood. Agree - there are issues with the dataset itself - the sums are equivalent to 'richness' or 'cumulative species', but are associated with individuals and their network. I.e. how many nationalities you are connected to, or sectors. These all depend to some degree, but not entirely on the number of connections you have. I'm working on solving this, but it's turning into a StackOverflow question! Thanks for help with this! – RMiller Dec 09 '20 at 15:43

0 Answers0