0

I am attempting to use PCA to perform cluster analysis on high dimensional data from physical observations. All of the observations are positive in all variables. When I perform the PCA, some PC's will be negative for some observations. Ultimately, I would like each observation to be distributed amongst a subset of the PC's.

Here is what I am doing right now: PCA(x1 ... xn) gives y1 ... yn where yi = [pc1, pc2, ... pcn] Then:

$\text{PostiveProportion}(y_{i,pci}) =$ \begin{cases} \frac{pc_i}{\sum_{j = 1}^n(pc_j | pc_j >0)} & pc_i > 0 \\ 0 & pc_i \leq 0 \\ \end{cases}

This is clearly flawed in that it destroys some information. However, it was the best way I could think of to distribute each observation amongst PC's such that the distribution sums to one. Ultimately, I use these "positive proportions" of principle components to scale another aspect of each observation, and I want this scaling to be conservative in that it doesn't change the sum of the data being scaled.

Is there another way to achieve a similar result? Is there another way to interpret the principal component values of individual observations in a way that is compatible with the physical world?

I hope this is a good question.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    What significance do you think this PositiveProportion has? Why do you think it matters if a principal component is positive or negative? – elliotp Sep 02 '20 at 08:15
  • This is almost a FAQ:https://stats.stackexchange.com/questions/429377/why-the-first-principal-component-is-mostly-negative-while-the-second-component, https://stats.stackexchange.com/questions/420420/why-does-a-pca-component-have-negative-values-when-all-inputs-are-strictly-posit/420427, https://stats.stackexchange.com/questions/276725/my-dataset-has-only-positive-values-why-do-i-get-some-negative-pca-scores, https://stats.stackexchange.com/questions/429377/why-the-first-principal-component-is-mostly-negative-while-the-second-component – kjetil b halvorsen Sep 02 '20 at 18:42

0 Answers0