1

I am using XLStat for a PCA of time-series water chemistry data. I have 23 analytes and 29 samples. I am using a correlation matrix for PCA as I find it more interpretable in the context of hydrochemistry. The data is also standardized to a variance of 1 and a mean of 0 to avoid the effect of differing units.

The results of the PCA look great. Very easy to interpret and everything makes a lot of sense. There are numerous significant correlations present in the correlation matrix(alpha=0.5). A KMO sampling adequacy test yields a value of 0.64. The problem is that I keep having an observed chi-squared of "-Inf" for Bartlett's Sphericity Test. Essentially, this means that the chi-squared could not be computed.

  1. What is going on here? This value makes no sense given the strong correlations in the matrix.

  2. Can I continue with PCA despite the failed test?

  3. Could the problem be that by normalizing the data I am imposing normality upon it falsely?

Data:

http://www.filedropper.com/wcrb_1

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Matt
  • 11
  • 3
  • 3
    KMO isn't needed for PCA, actually, it is for factor analysis ([see](http://stats.stackexchange.com/a/48503/3277) and a link therein). Bartlett's test - hard to say what was wrong without having data (you could show your data, btw). This test is for large sample from normal population (e.g. [see](http://stats.stackexchange.com/q/92791/3277)). This test is mainly for factor analysis. What might be a reason to use it in the context of PCA as long as PCA is seen as just a data reduction transformation? – ttnphns Sep 26 '14 at 15:10
  • Thanks for the response. I didn't realize KMO was more aimed at factor analysis. The real problem is the failure of Bartlett's Sphericity Test. How do I include my data? – Matt Sep 29 '14 at 14:58
  • If you want to give data, you could publish it in the question body (formatted as code) of leave there a link to an outer file host site. – ttnphns Sep 29 '14 at 18:01
  • I have left a link to a shared file. The data shown is the transformed data and the results of PCA. Thanks for your help! – Matt Sep 29 '14 at 18:19
  • 1
    Thanks for sharing it. Exemplarily done work! I ran PCA in SPSS and confirm every figure except Bartlett's (and contributions / cosines which I didn't check. BTW, how did you compute them?) Now, "my" Bartlett's was: `Approx. Chi-square 997.054; df 253; Sig. .00000`. SPSS computes the test as written [here](http://pic.dhe.ibm.com/infocenter/spssstat/v22r0m0/topic/com.ibm.spss.statistics.algorithms/alg_factors_optionalstats.htm). Could it be that your program simply considered the determinant of the matrix so close to 0 that it skipped computing the chi-sq value? – ttnphns Sep 29 '14 at 19:00
  • 1
    Wow. Thanks for checking this out and the praise. I appreciate it a lot. That is interesting that SPSS had no problem computing a chi-squared. I wonder if I have uncovered a bug in XLStat? It is nice to have external confirmation of results. I have to claim ignorance in the computation of the contributions and cosines. I checked the box in XLstat and that is what I got. I suppose that is the risk of powerful stats programs in the hands of inexperienced users: too much information and not enough knowledge to handle it properly. – Matt Sep 30 '14 at 21:24
  • 1
    It may have been a bug of XLStat as well as its intended behaviour. As I said, your correlation matrix is virtually singular, but the program might be designed to skip such cases. XLStat may be computing the chi-sq value a bit different way than SPSS does. – ttnphns Oct 01 '14 at 03:19

0 Answers0