Signal regression, or scalar-on-function regression is considered a sub-class of functional regression approaches where the response variable is a scalar and the predictors can be one or more linear combinations of functions. i.e,
$Y_i = \int\beta(s)X_i(s)ds + \ldots + \epsilon_i$
where $X_i(s)$ is a function, consisting of an independent variable measured at multiple values of $s$. There are several ways of estimating this function for each observation, $i$, but generally these amount to smoothing.
Questions have been answered in this forum regarding interactions in GAMs (for example here) but the current question is a question regarding functional regression which can be fit using GAMs. In functional regression one or more observed variables are functions and not scalars (the linked question considers scalars). This is achieved by passing matrices as variables (not the typical vectors) when fitting, and as such they require a different formula specification in which the typical specification of GAM interactions does not apply intuitively.
To underline this point, Simon Wood in the 2nd edition of his Generalized Additive Models (2017) devotes several pages to describing a signal regression with mgcv
. It is important to note that this type of regression is distinct from everything else he covers in the book, and is a particular application of GAMs. It is not the much more familiar scalar-on-scalar application of GAMs which have been well covered in this forum.
In R these regressions can be fitted using various packages, including Simon's (mgcv
), and some of these packages are reliant on mgcv::gam()
as the workhorse.
I would like to fit a model of this form where the additive and interactive effects of two functional predictors are included. Because specifying these models in mgcv:gam()
is not at all intuitive to users familiar with the standard application of R's formula interface, or even to the particular dialect used in mgcv::gam()
it is difficult to know how to specify an interaction effect. The refund::pfr()
function provides a very useful wrapper with lots of value-added services, but still it is not clear in this package how to specify an interaction between two functions in the formula.
Here is an example drawn from the refund::pfr()
help page example that illustrates a single functional variable cca
provided as a matrix in the DTI1
data frame regressed against a scalar predictor pasat.
In cca
rows are observations (i.e. curves) and columns are the observed data which are used to fit each curves.
library(refund)
data(DTI)
DTI1 <- DTI[DTI$visit==1 & complete.cases(DTI),]
fit.lf <- pfr(pasat ~ lf(cca, k=30, bs="ps"), data=DTI1)
refund::pfr()
passes the following formula to mgcv::gam()
modifying the dimensions of the provided variables appropriately (and renaming them) so they accomplish a functional regression:
pasat ~ s(x = cca.tmat, by = L.cca, k = 30, bs = "ps")
Where cca.tmat
provides the indices of the domain across which the cca
variable is measured and L.cca
are the measured values. k
and bs
parameters are associated with optimizing/penalizing the GAM fit and are not important here.
The data DTI1
provided with refund
also includes the variable rcst
. My question then is:
How might one specify a formula in refund::pfr()
or more generally mgcv::gam()
to fit an interaction between the functional variables cca
and rcst
both of which are matrices?