2

I would like to know the best way to estimate a principal component's latest value, if I only have partial information about the latest variable data points:

Assuming I have 5 variables:

> head(rr)
                     USD           EUR           JPY           KRW
2010-02-12  2.648171e-03 -0.0016930704  0.0007601882  0.0010594185
2010-02-15  2.789115e-05 -0.0014012767 -0.0002707860  0.0010577844
2010-02-16 -7.923771e-03  0.0037916403 -0.0096992764  0.0005012469
2010-02-17  2.479928e-03 -0.0056491918 -0.0031176212  0.0044569950
2010-02-18  1.302002e-03 -0.0002913731 -0.0034022229 -0.0010499123
2010-02-19  9.438061e-04 -0.0006170278 -0.0047146407 -0.0023523910
                     ZAR
2010-02-12 -0.0041776791
2010-02-15 -0.0038725131
2010-02-16  0.0004297328
2010-02-17  0.0091811611
2010-02-18  0.0009181197
2010-02-19 -0.0038729740

And their principal component loadings:

> eigen(cor(rr))
$values
[1] 1.9858909 1.3388494 0.8530661 0.5337612 0.2884322

$vectors
            [,1]        [,2]       [,3]       [,4]       [,5]
[1,]  0.61173691 -0.23437356  0.1077678  0.2219310  0.7141286
[2,] -0.02097264  0.76835192 -0.3309138 -0.3429975  0.4266665
[3,]  0.57193284 -0.05324265  0.1981990 -0.7311797 -0.3100829
[4,] -0.24235617 -0.58496196 -0.6108094 -0.4126492  0.2360413
[5,] -0.48938168 -0.09843319  0.6830162 -0.3580450  0.3951064

Usually to get the first principal component I would do

> rr %*% eigen(cor(rr))$vectors[, 1] 

so basically apply the PC1 loadings to the variables.

But what if I want to calculate the latest value of the principal component, but I only have 3 of the needed 5 data points for the variables?

So for example say I have missing values for today in ZAR and JPY:

                     USD          EUR           JPY           KRW           ZAR
2014-11-05  0.0047538401  0.000676224 -0.0045230806 -0.0050895701 -0.0038969653
2014-11-06  0.0051497837 -0.002184867  0.0028327631  0.0045528606 -0.0046996766
2014-11-07 -0.0019551509  0.000684713 -0.0001589247 -0.0008318505 -0.0020331115
2014-11-10 -0.0013404183 -0.001190019 -0.0014909916 -0.0018504471 -0.0005352551
2014-11-11  0.0001727452  0.003490628 -0.0070546749 -0.0071174438  0.0030901127
2014-11-12 -0.0008993663 -0.001086504            NA  0.0023363802            NA

What are my options for estimating the value of PC1 for 2014-11-12? I know that if I were to use the covariance matrix, then loadings would be equal to regression coefficients, but that would be for the PC being the independent variable, and here I need it the other way around? So basically, how do I estimate the latest PC1 value with only 3 out of the 5 input variables to hand?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Thomas Browne
  • 819
  • 1
  • 16
  • 28

1 Answers1

3

You need to handle the missing values in some fashion. One of the simplest methods is to replace the NA values with the mean of the particular variable. But a more sophisticated approach is readily available in R using the missMDA package. This imputation method is better than simple mean replacement as it links between variables and similarities between individuals are also taken into account

# create a dataset of similar structure to yours
rr <- read.table(header=F, text='
2010-02-12  2.648171e-03 -0.0016930704  0.0007601882  0.0010594185 -0.0041776791
2010-02-15  2.789115e-05 -0.0014012767 -0.0002707860  0.0010577844 -0.0038725131
2010-02-16 -7.923771e-03  0.0037916403 -0.0096992764  0.0005012469 0.0004297328
2010-02-17  2.479928e-03 -0.0056491918 -0.0031176212  0.0044569950 0.0091811611
2010-02-18  1.302002e-03 -0.0002913731 -0.0034022229 -0.0010499123 0.0009181197
2010-02-19  9.438061e-04 -0.0006170278 NA -0.0023523910 NA
                 ')
row.names(rr) <- rr[,1]
rr <- as.matrix(rr[,2:6])
colnames(rr) <- c("USD", "EUR", "JPY", "KRW", "ZAR")

# impute missing values
library(missMDA)
# estimate number of components
nb <- estim_ncpPCA(rr, ncp.min=0, ncp.max=5)
# actual impute
rr.impute <- imputePCA(rr, ncp=nb$ncp)

# Run pca
pca.fit <- prcomp(rr.impute$completeObs)
    pca.fit$rotation
> pca.fit$rotation
           PC1        PC2        PC3         PC4         PC5
USD -0.6416800  0.1775006  0.4015143 -0.59493862  0.20389865
EUR  0.5018248  0.1099833  0.1971757 -0.58092170 -0.59977271
JPY -0.4836460  0.4292212 -0.1581251  0.31041583 -0.67859722
KRW -0.1720260 -0.2078281 -0.8481108 -0.45556326 -0.01961543
ZAR -0.2700229 -0.8537997  0.2358231  0.06842622 -0.37123994

Of course you can still do it the manual way with eigen as you provided but its always nice to have a function that does the work for you.

cdeterman
  • 4,543
  • 1
  • 20
  • 34