2

I have done a principal component analysis (PCA) on a dataset (using prcomp in R) and now I want to determine what the principle component scores would be for a new sample(s). How can I do this?

I'm sure the information is encoded somewhere in the prcomp output, but I can't figure out which table, how to tie it back to a function, and I'm sure someone has already devised a command to do this (which I can't find).

amoeba
  • 93,463
  • 28
  • 275
  • 317
Jautis
  • 588
  • 1
  • 4
  • 13

1 Answers1

2

The code

require(graphics)

## the variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
res<-prcomp(USArrests, scale = TRUE, retx=TRUE)

computes the principal components, in the result res$center you find the centers, in the result res$scale the scaling factors and in res$rotation you find the coefficients for transforming the centered data to the pcs.

As I use the option retx=TRUE in the above call to prcomp, res$x contains the prinicpal components computed by prcomp().

You can yourself compute res$x as follows (and this also for 'new' observations, see at the bottom):

# center and scale the data
c.fun<-function(df, center, scale) {
  return((df-center)/scale )
}
centeredData<-apply(USArrests, MARGIN=1, FUN=c.fun, res$center, res$scale  )


# compute the principal components
pcs<-t(res$rotation) %*% centeredData

# compare with results of prcom (option retx=TRUE gives ^cs in x)
head(t(pcs))
head(res$x)
    # check if results are the same
    sum(abs(t(pcs)-res$x))

If you want to compute the components for new data, then (I use fake data) you do:

# some fake 'new data'
newdata<-USArrests[1:10,]
centeredNewData<-apply(newdata, MARGIN=1, FUN=c.fun, res$center, res$scale  )
pcsnew<-t(res$rotation) %*% centeredNewData