The advice for choosing K is to set it as high as possible, while managing the trade-off with computation time (e.g. Choosing k in mgcv's gam()).
However, is it acceptable to restrict K to avoid overly complicated smooths that are likely to be biologically unrealistic? And would this impact on the model checking process (via randomised quantile residuals)?
For example, I am modelling the daily activity cycles of foxes, using data collected with camera-traps. Camera-traps just provide a snapshot of behaviour when the animal happens to walk in front of them (opposed to something like GPS collars where you get the full picture). A model with k = 10 produces more wiggliness than I believe to be realistic - I think this is more likely an artefact of the imperfect sampling process. On the other hand, a model with k = 5 looks more like what I'd expect, however gam.check() hints that k is set too low. See below:
I guess I am mainly concerned about (i) arbitrarily parametrising models to meet my expectations, (ii) justifying this in the manuscript and (iii) whether this will effect the model-checking process. Am I being sketchy or just overthinking this?
Additionally, I am restricting k for another term which should be specified as a linear term (the activity of one species declining with the activity of another), but this way all my covariates are subject to the same double penalty approach for model selection (as recommended by Gavin Simpson here GAM selection when both smooth and parametric terms are present). You can see how I am specifying this model (without the inclusion of other covariates) below:
model <- bam(fox ~ s(hour, bs = "cc", k = 5) + s(predicted_predator_activity, bs = "ts", k = 3), data = data, family = binomial, select = TRUE)