3

When you do a Principal Component Analysis (PCA), your dataset generally looks like the following one:

Country    Var1    Var2    Var3
  A          2      18      23
  B          3      16      28
  C          1      19      33

But what happens, when you have more than one observation per country or in my case actor (because you have a time series):

Obs.    BACE    PWR       CC      SC     TASK     DIS     IGB   I1       I2      I3     I4a      I4b   Punish Threaten Oppose   Appeal  Promise Reward   P1      P2       P3     P4      P5
Ar_P1   0.3176  0.2115  0.6527  0.4615  0.6627  0.1185  0.0718  0.75    0.24    0.60    0.25    0.19    0.06    0.00    0.06    0.81    0.03    0.03    0.40    0.15    0.17    0.32    0.9456
Ar_P2   0.2838  0.2292  0.5847  0.4983  0.6312  0.1608  0.1350  0.47    0.23    0.16    0.53    0.69    0.12    0.02    0.12    0.46    0.05    0.22    0.30    0.12    0.10    0.24    0.9760
Ar_P3   0.2831  0.2081  0.6175  0.5222  0.6041  0.1385  0.1161  0.56    0.25    0.23    0.44    0.51    0.09    0.02    0.11    0.55    0.07    0.16    0.42    0.21    0.11    0.25    0.9725
Bu_P1   0.3902  0.2680  0.5426  0.3049  0.5194  0.1143  0.1522  0.58    0.30    0.17    0.42    0.47    0.06    0.04    0.11    0.50    0.12    0.17    0.43    0.25    0.11    0.35    0.9615
Bu_P2   0.3575  0.2781  0.5661  0.3219  0.5066  0.2685  0.1646  0.54    0.23    0.20    0.46    0.53    0.10    0.05    0.08    0.53    0.08    0.16    0.30    0.14    0.09    0.30    0.9730
Bu_P3   0.4185  0.2975  0.5674  0.2854  0.4879  0.2447  0.1532  0.58    0.26    0.20    0.42    0.60    0.11    0.03    0.07    0.53    0.08    0.19    0.39    0.21    0.12    0.34    0.9592
Ch_P1   0.2963  0.2331  0.6130  0.4507  0.7299  0.0548  0.1227  0.61    0.32    0.20    0.39    0.49    0.05    0.04    0.11    0.53    0.09    0.19    0.50    0.27    0.15    0.29    0.9565
Ch_P2   0.3215  0.2720  0.5821  0.4531  0.6480  0.2281  0.1203  0.35    0.15    0.10    0.65    0.73    0.16    0.06    0.11    0.41    0.06    0.21    0.28    0.13    0.09    0.23    0.9793
Ch_P3   0.3509  0.3119  0.5627  0.3972  0.5885  0.1889  0.1538  0.38    0.16    0.12    0.62    0.73    0.17    0.02    0.13    0.42    0.07    0.20    0.25    0.11    0.10    0.25    0.9750

When you do the PCA and plot it, all the actors will appear three times (because of the three observations per actor). The more observations, the more often the actor label appears.

 pca.data <- prcomp(data, scale=TRUE)
 biplot(pca.data)

How can you do a PCA with time series data, where every actor is plotted only once?

feder80
  • 171
  • 2
  • 5
  • This does not seem to be about R specifically. To get just one observation per country, you could reduce each country to a single observation per country, e.g. the average of its values. It's your choice depending on what you want to do. Alternatively, perhaps PCA is not what you want to do: there is no sign here of what your underlying research question is. – Nick Cox Dec 08 '13 at 11:01
  • Hi Nick, I have eight actors, 23 variables and 16 observations per actor and variable (4 years, 4 quarters per year). I want to find out, how these variables influence each other (not just a correlation-matrix). A PCA would be perfect, but the observations are not independent, because they values differ in the aspect "Actor" and "Date" (for the quarters). What would you do in this case? – feder80 Dec 08 '13 at 14:04
  • The research question "how these variables influence each other" is not sharp enough for me to offer advice here. In addition, your data set doesn't sound much like your original question. I recommend editing that. – Nick Cox Dec 08 '13 at 14:45
  • So, that is a subset of my dataset. I want to know, if DIS has an effect on P1, I1, SC etc. Or if DIS and IGB have an effect on each other an on Variables like P1 and I1. – feder80 Dec 08 '13 at 16:32
  • I am with @NickCox, I think PCA is not the best technique for your problem. I see you have a Panel data (http://en.wikipedia.org/wiki/Panel_data) so I would suggest to use Panel Analysis methods (http://en.wikipedia.org/wiki/Panel_analysis) which are more generalized linear models that takes into account the time dependencies. – Emer Dec 08 '13 at 17:06
  • 1
    +1 to @Emer's suggestion that you think of panel (longitudinal) data modelling. There are variants of PCA that cope with this kind of data structure, but they aren't prominent in many statistical environments. Dependence in time is no barrier to PCA; otherwise it would hardly be routine in meteorology and oceanography. But the biggest puzzle is why you think PCA sounds "ideal"; for study of what influences what you are better off with some predictive model. – Nick Cox Dec 08 '13 at 17:20
  • Thank you. I am using R as statistical environment. Someone suggested to use Multichannel Singular Spectrum Analysis. What do you think of that? For a predictive model I have to have a clue which variables "go together". I wanted to use PCA to get an insight into the nexus of these variables over time. – feder80 Dec 08 '13 at 17:29
  • If you really have no insight into what goes together, you are not in a position to analyse your data at all well. In any case, PCA tells you about correlations, not directions of influence. There is no magic method that will do your thinking for you and you are reaching for complicated ones when you are struggling with precisely what simpler ones can do for you. Bluntly: not a good strategy! – Nick Cox Dec 08 '13 at 17:45
  • Ok, I know what you mean. But PCA tells me about the correlations. That's what I want. But it doesn't work with dependent data, or? What would be the alternative to PCA in this case? – feder80 Dec 08 '13 at 17:51
  • 1
    As already said, PCA is compatible with dependent data. It's a transformation method, and dependence is no barrier to that. – Nick Cox Dec 08 '13 at 18:29
  • 1
    Now I am confused :-) See: http://matematicas.unex.es/~idelpuerto/WEB_EYSM/Articles/pt_paulo_canas_art.pdf – feder80 Dec 08 '13 at 18:31
  • 1
    There is no inconsistency between what I said and what that paper is trying to do. That's on all fours with the fact that you can calculate means and correlations and ignore dependence in time. The means and correlations are well-defined quantities, regardless. – Nick Cox Dec 08 '13 at 19:17
  • Ok, fine! Thank you very much. I really appreciate that! – feder80 Dec 08 '13 at 19:21

0 Answers0