I don't care for fda
's use of Inception-like list-within-list-within-list object structures, but my response will abide by the system the package writers have created.
I think it's instructive to first think about what we're doing exactly. Based on your description of what you've done so far, this is what I believe you're doing (let me know if I have misinterpreted something). I'll be continuing using notation and, due to lack of real data, an example from Ramsay and Silverman's Functional Data Analysis and Ramsay, Hooker, and Graves's Functional Data Analysis with R and MATLAB (Some of the following equations and code are lifted directly from these books).
We are modelling a scalar response via a functional linear model, i.e.
$y_i = \beta_0 + \int_0^T X_i(s)\beta(s)ds + \epsilon_i$
We expand the $\beta$ in some basis. We use, say, $K$ basis functions. So,
$\beta(s) = \sum\limits_{k=1}^K b_k\theta_k (s)$
In matrix notation, this is $\beta (s)=\boldsymbol{\theta}'(s)\mathbf{b}$.
We also expand the covariate functions in some basis, as well (say $L$ basis functions). So,
$X_i(s) = \sum\limits_{k=1}^L c_{ik}\psi_k (s)$
Again, in matrix notation, this is $X(s)=\mathbf{C}\boldsymbol{\psi}(s)$.
And thus, if we let $\mathbf{J}=\int\boldsymbol{\psi}(s)\boldsymbol{\theta}'(s)ds$, our model can be expressed as
$y = \beta_0 + \mathbf{CJb}$.
And if we let $\mathbf{Z} = [\mathbf{1}\quad \mathbf{CJ}]$ and $\boldsymbol{\xi} = [\beta_0\quad \mathbf{b}']'$, our model is
$\mathbf{y} = \mathbf{Z}\boldsymbol{\xi}$
And this looks much more familiar to us.
Now I see you're adding some sort of regularization. The fda
package works with roughness penalties of the form
$P = \lambda\int[L\beta(s)]^2 ds$
for some linear differential operator $L$. Now it can be shown (details left out here -- it's really not hard to show this) that if we define the penalty matrix $\mathbf{R}$ as
$\mathbf{R}= \lambda\left(\begin{array}{cccc}
0 & 0 & \cdots & 0 \\
0 & R_1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & R_K
\end{array}\right)
$
where $R_i$ is in terms of the basis expansion of $\beta_i$, then we minimize the penalized sum of squares:
$\left(\mathbf{y}-\mathbf{Z}\boldsymbol{\xi}\right)'\left(\mathbf{y}-\mathbf{Z}\boldsymbol{\xi}\right) + \lambda\boldsymbol{\xi}'\mathbf{R}\boldsymbol{\xi}$,
and so our problem is merely a ridge regression with solution:
$\hat{\boldsymbol{\xi}}=(\mathbf{Z}'\mathbf{Z}+\lambda\mathbf{R})^{-1}\mathbf{Z}'\mathbf{y}$.
I walked through the above because, (1) I think it's important we understand what we're doing, and (2) some of the above is necessary to understand some of the code I'll use later on. On to the code...
Here's a data example with R code. I'm using the Canadian weather dataset provided in the fda
package. We'll model the log annual precipitation for a number of weather stations via a functional linear model and we'll use temperature profiles (temperatures were recorded once a day for 365 days) from each station as functional covariates. We'll proceed similarly to the way you describe in your situation. Data was recorded at 35 stations. I'll break the dataset up into 34 stations, which will be used as my data, and the last station, which will be my "new" dataset.
I continue via R code and comments (I assume you are familiar enough with the fda
package such that nothing in the following is too surprising -- if this isn't the case, please let me know):
# pick out data and 'new data'
dailydat <- daily$precav[,2:35]
dailytemp <- daily$tempav[,2:35]
dailydatNew <- daily$precav[,1]
dailytempNew <- daily$tempav[,1]
# set up response variable
annualprec <- log10(apply(dailydat,2,sum))
# create basis objects for and smooth covariate functions
tempbasis <- create.fourier.basis(c(0,365),65)
tempSmooth <- smooth.basis(day.5,dailytemp,tempbasis)
tempfd <- tempSmooth$fd
# create design matrix object
templist <- vector("list",2)
templist[[1]] <- rep(1,34)
templist[[2]] <- tempfd
# create constant basis (for intercept) and
# fourier basis objects for remaining betas
conbasis <- create.constant.basis(c(0,365))
betabasis <- create.fourier.basis(c(0,365),35)
betalist <- vector("list",2)
betalist[[1]] <- conbasis
betalist[[2]] <- betabasis
# set roughness penalty for betas
Lcoef <- c(0,(2*pi/365)^2,0)
harmaccelLfd <- vec2Lfd(Lcoef, c(0,365))
lambda <- 10^12.5
betafdPar <- fdPar(betabasis, harmaccelLfd, lambda)
betalist[[2]] <- betafdPar
# regress
annPrecTemp <- fRegress(annualprec, templist, betalist)
Now when I was first taught about functional data a year or so ago, I played around with this package. I was also unable to get predict.fRegress
to give me what I wanted. Looking back at it now, I still don't know how to make it behave. So, we'll just have to get the predictions semi-manually. I'll be using pieces that I pulled straight out of the code for fRegress()
. Again, I continue via code and comments.
First, the set-up:
# create basis objects for and smooth covariate functions for new data
tempSmoothNew <- smooth.basis(day.5,dailytempNew,tempbasis)
tempfdNew <- tempSmoothNew$fd
# create design matrix object for new data
templistNew <- vector("list",2)
templistNew[[1]] <- rep(1,1)
templistNew[[2]] <- tempfdNew
# convert the intercept into an fd object
onebasis <- create.constant.basis(c(0,365))
templistNew[[1]] <- fd(matrix(templistNew[[1]],1,1), onebasis)
Now to get the predictions
$\hat{\mathbf{y}}_{\mathrm{new}}=\mathbf{Z}_{\mathrm{new}}\hat{\boldsymbol{\xi}}$
I just take the code that fRegress
uses to calculate yhatfdobj
and edit it slightly. fRegress
calculates yhatfdobj
by estimating the integral $\int_0^T X_i (s) \beta(s)$ via the trapezoid rule (with $X_i$ and $\beta$ expanded in their respective bases).
Normally, fRegress
calculates the fitted values by looping through the covariates stored in annPrecTemp$xfdlist
. So for our problem, we replace this covariate list with the corresponding one in our new covariate list, i.e., templistNew
. Here's the code (identical to the code found in fRegress
with two edits, some deletions of unneeded code, and a couple comments added):
# set up yhat matrix (in our case it's 1x1)
yhatmat <- matrix(0,1,1)
# loop through covariates
p <- length(templistNew)
for(j in 1:p){
xfdj <- templistNew[[j]]
xbasis <- xfdj$basis
xnbasis <- xbasis$nbasis
xrng <- xbasis$rangeval
nfine <- max(501,10*xnbasis+1)
tfine <- seq(xrng[1], xrng[2], len=nfine)
deltat <- tfine[2]-tfine[1]
xmat <- eval.fd(tfine, xfdj)
betafdParj <- annPrecTemp$betaestlist[[j]]
betafdj <- betafdParj$fd
betamat <- eval.fd(tfine, betafdj)
# estimate int(x*beta) via trapezoid rule
fitj <- deltat*(crossprod(xmat,betamat) -
0.5*(outer(xmat[1,],betamat[1,]) +
outer(xmat[nfine,],betamat[nfine,])))
yhatmat <- yhatmat + fitj
}
(note: if you look at this chunk and surrounding code in fRegress
, you'll see the steps I outlined above).
I tested the code by re-running the weather example using all 35 stations as our data and compared the output from the above loop to annPrecTemp$yhatfdobj
and everything matches up. I also ran it a couple times using different stations as my "new" data and everything seems reasonable.
Let me know if any of the above is unclear or if anything is not working correctly. Sorry for the overly detailed response. I couldn't help myself :) And if you don't already own them, check out the two books I used to write up this response. They are really good books.