I would like to know the best way to estimate a principal component's latest value, if I only have partial information about the latest variable data points:
Assuming I have 5 variables:
> head(rr)
USD EUR JPY KRW
2010-02-12 2.648171e-03 -0.0016930704 0.0007601882 0.0010594185
2010-02-15 2.789115e-05 -0.0014012767 -0.0002707860 0.0010577844
2010-02-16 -7.923771e-03 0.0037916403 -0.0096992764 0.0005012469
2010-02-17 2.479928e-03 -0.0056491918 -0.0031176212 0.0044569950
2010-02-18 1.302002e-03 -0.0002913731 -0.0034022229 -0.0010499123
2010-02-19 9.438061e-04 -0.0006170278 -0.0047146407 -0.0023523910
ZAR
2010-02-12 -0.0041776791
2010-02-15 -0.0038725131
2010-02-16 0.0004297328
2010-02-17 0.0091811611
2010-02-18 0.0009181197
2010-02-19 -0.0038729740
And their principal component loadings:
> eigen(cor(rr))
$values
[1] 1.9858909 1.3388494 0.8530661 0.5337612 0.2884322
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.61173691 -0.23437356 0.1077678 0.2219310 0.7141286
[2,] -0.02097264 0.76835192 -0.3309138 -0.3429975 0.4266665
[3,] 0.57193284 -0.05324265 0.1981990 -0.7311797 -0.3100829
[4,] -0.24235617 -0.58496196 -0.6108094 -0.4126492 0.2360413
[5,] -0.48938168 -0.09843319 0.6830162 -0.3580450 0.3951064
Usually to get the first principal component I would do
> rr %*% eigen(cor(rr))$vectors[, 1]
so basically apply the PC1 loadings to the variables.
But what if I want to calculate the latest value of the principal component, but I only have 3 of the needed 5 data points for the variables?
So for example say I have missing values for today in ZAR and JPY:
USD EUR JPY KRW ZAR
2014-11-05 0.0047538401 0.000676224 -0.0045230806 -0.0050895701 -0.0038969653
2014-11-06 0.0051497837 -0.002184867 0.0028327631 0.0045528606 -0.0046996766
2014-11-07 -0.0019551509 0.000684713 -0.0001589247 -0.0008318505 -0.0020331115
2014-11-10 -0.0013404183 -0.001190019 -0.0014909916 -0.0018504471 -0.0005352551
2014-11-11 0.0001727452 0.003490628 -0.0070546749 -0.0071174438 0.0030901127
2014-11-12 -0.0008993663 -0.001086504 NA 0.0023363802 NA
What are my options for estimating the value of PC1 for 2014-11-12? I know that if I were to use the covariance matrix, then loadings would be equal to regression coefficients, but that would be for the PC being the independent variable, and here I need it the other way around? So basically, how do I estimate the latest PC1 value with only 3 out of the 5 input variables to hand?