0

Why is it impossible to do a PCA in R using principal from psych package without warnings with a matrix, which has more columns than rows (dim(t)=6x2404)? If I use prcomp, everything is fine. The difference between both methods is that principal computes a correlation or covariance matrix while prcomp uses SVD.

This warnings occur:

The determinant of the smoothed correlation was zero.
This means the objective function is not defined.
Chi square is based upon observed residuals.
The determinant of the smoothed correlation was zero.
This means the objective function is not defined for the null model either.
The Chi square is thus based upon observed correlations.
In factor.stats, the correlation matrix is singular, an approximation is used
Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs,  :
  In factor.stats, the correlation matrix is singular, and we could not calculate the beta weights for factor score estimates
3: In psych::principal(transposed_matrix, nfactors = 3) :
  The matrix is not positive semi-definite, scores found from Structure loadings

Basically, I'm doing this: I have a matrix with 2404 temperature samples over time as variables/columns and 6 measurement locations as observations/rows (this is called "T-Mode PCA" by Richman,1986). However, the matrix contains no missing values and is a transposed from a matrix, where the samples are rows and the stations are columns. The original matrix has no problems, whether with psych::principal(a.k.a. Eigendecomposition) nor prcomp (SVD). And I'm interested, why the transposed matrix has such problems with psych::principal.

Here is a MWE, which throws the warnings as well:

original_matrix = data.frame(replicate(6,sample(250:300,2404,rep=TRUE)))
transposed_matrix = t(original_matrix)
pca_temper = psych::principal(transposed_matrix, nfactors = 3)
sequoia
  • 143
  • 6
  • Could you explain the sense of "impossible"? Does it mean that the software forbids it, or does it mean that some kind of error occurs when you try? If the latter, what is the error? – whuber Aug 17 '18 at 16:35
  • @whuber I added the error message returned by `principal` – sequoia Aug 17 '18 at 17:48
  • This error message seems to come from something else, because it refers to "chi square" and "factor score estimates" as well as alluding to some kind of correlation smoother (`cor.smooth`), none of which are a standard part of any PCA calculation. You will need to explain what you're doing to create this message. – whuber Aug 17 '18 at 18:05
  • I've added an explanation and MWE – sequoia Aug 17 '18 at 18:19
  • 1
    I don't see any errors, I see warnings. – amoeba Aug 17 '18 at 18:57
  • @amoeba you're right. But what about those warnings. Edited again... – sequoia Aug 17 '18 at 19:09
  • The default for `principal()` is to include a `varimax` rotation computed by `GPA()` from the `GPArotation` package in R. I suspect that this is the source of the warnings and unexpected behavior versus other (straightforward) PCA routines. Try calling `principal()` with the parameter setting `rotate="none"`. – EdM Aug 17 '18 at 19:21
  • @EdM the warnings remain – sequoia Aug 17 '18 at 19:44

1 Answers1

3

As you note, principal() starts with a correlation matrix. With your examples:

> dim(cor(original_matrix))
[1] 6 6
> dim(cor(transposed_matrix))
[1] 2404 2404

So the correlation matrix of transposed_matrix is certainly singular and warnings of one type or another are certainly not surprising.

The warnings come either from principal() itself or from what seem to be summary functions in the psych package that it calls. Examine the code for principal(), fa.stats(), and factor.stats().

In contrast, base R svd() wisely limits the number of singular values and vectors it produces to the minimum of the numbers of rows or columns.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Thanks. With the help of your answer, I've done some research about ist. Anyway, is it foolish to use `principal()` in this case? Or can warnings be ignored? – sequoia Aug 18 '18 at 08:31
  • @sequoia I can’t say for sure. You should get the same fundamental PCA results but I don’t know whether this might pose issues for the other processing (rotations etc.) that `principal()` seems to do to make the analysis seem more like factor analysis; I don’t have much direct experience with that function or its package. For example, does the “smoothing” done with a singular matrix somehow artificially increase its apparent rank? Might check the code, an R-help site, or ask the authors of the package. – EdM Aug 18 '18 at 12:43
  • @sequoia also make sure that your time series is suitable for PCA; see for example [this answer](https://stats.stackexchange.com/a/159042/28500). Just because you can do PCA doesn’t mean that its results will be useful with a non-stationary time series. – EdM Aug 18 '18 at 12:51