Currently, I am working with trying to fit a non-linear model with stan_gamm4
. This can be done through specifying smoothing functons, such as:
s(...,k = -1)
. The following is some sample data that demonstrates issues I am unclear on:
library(mgcv)
library(rstan)
library(rstanarm)
data(trees)
> head(trees)
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
> dim(trees)
[1] 31 3
> length(unique(trees$Girth))
[1] 27
> length(unique(trees$Height))
[1] 21
> length(unique(trees$Volume))
[1] 30
Now I will regress Volume
on Girth
and Height
, but with k=22
on Height
:
> out1 <- stan_gamm4(Volume ~ s(Height, k = 22) + s(Girth, k = -1), family = Gamma(link = log), data = trees, iter = 1000)
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum degrees of freedom
I understand that the number of unique covariates on Height
must be less than or equal to 21. However, I am unsure how to determine the optimal knots to be used for each variable.
1) I had assumed that the s(Girth, k = -1)
smoother on Girth
with k=-1
will automatically set it using Generalized Cross Validation according to this post: Selecting knots for a GAM. However, it seems that with k=-1
the default is always k=10
.
2) I have read in some posts that one should use one-tenth of the unique covariates. However, I am unsure whether there are any papers to back this up for an object like stan_gamm4
.
Does anyone have any idea how I can determine what might be the best number of knots k
to use? Is it the case that the closer we get to the unique values above the better? Thanks.