I am modelling a outcome in gam were two variables (x1 and x2) are continuous and the other four are factors. I have a suspicion that x1 and x2 might be collinear so I want to check that. As I understand I need to check with concurvity. The figures of approx. 0.37 which I expect to be OK, or?
A1 <- gam(y~ s(x1) + s(x2), data= df, method = "REML")
concurvity(A1)
para s(x1) s(x2)
worst 1.604547e-25 0.3765562 0.3765562
observed 1.604547e-25 0.3637563 0.2497401
estimate 1.604547e-25 0.2926522 0.2767589
> concurvity(A1, full = FALSE)
$worst
para s(x1) s(x2)
para 1.000000e+00 9.035414e-26 7.145648e-26
s(x1) 9.194812e-26 1.000000e+00 3.765562e-01
s(x2) 7.033065e-26 3.765562e-01 1.000000e+00
$observed
para s(x1) s(x2)
para 1.000000e+00 4.660693e-34 4.259220e-30
s(x1) 9.194812e-26 1.000000e+00 2.497401e-01
s(x2) 7.033065e-26 3.637563e-01 1.000000e+00
$estimate
para s(x1) s(x2)
para 1.000000e+00 3.875674e-28 3.293079e-28
s(x1) 9.194812e-26 1.000000e+00 2.767589e-01
s(x2) 7.033065e-26 2.926522e-01 1.000000e+00
The problem is that when I include my factors I get high concurvity with the parametric terms which I cannot understand. To check, I have made x3-x6 as completely random variables (0 and 1) and I still get the high concurvity.
df$x3 <- sample(c(0,1), replace = TRUE, size=368)
df$x4 <- sample(c(0,1), replace = TRUE, size=368)
df$x5 <- sample(c(0,1), replace = TRUE, size=368)
df$x6 <- sample(c(0,1), replace = TRUE, size=368)
df$x3 <- as.factor(df$x3)
df$x4 <- as.factor(df$x4)
df$x5 <- as.factor(df$x5)
df$x6 <- as.factor(df$x6)
A2 <- gam(y~ s(x1) + s(x2) +x3 +x4+ x5 + x6, data= df, method = "REML")
> concurvity(A2)
para s(x1) s(x2)
worst 0.8005751 0.3817906 0.3921831
observed 0.8005751 0.3701165 0.3551481
estimate 0.8005751 0.2994385 0.2887118
> concurvity(A2, full = FALSE)
$worst
para s(x1) s(x2)
para 1.000000e+00 9.035414e-26 7.145648e-26
s(x1) 9.038433e-26 1.000000e+00 3.765562e-01
s(x2) 7.125601e-26 3.765562e-01 1.000000e+00
$observed
para s(x1) s(x2)
para 1.000000e+00 3.565108e-34 1.338613e-31
s(x1) 9.038433e-26 1.000000e+00 3.393421e-01
s(x2) 7.125601e-26 3.637036e-01 1.000000e+00
$estimate
para s(x1) s(x2)
para 1.000000e+00 3.875674e-28 3.293079e-28
s(x1) 9.038433e-26 1.000000e+00 2.767589e-01
s(x2) 7.125601e-26 2.926522e-01 1.000000e+00
Can someone please explain why and what I am doing wrong?