2

PCA gives me loadings with different signs. I understand that I may simply revert them (as explained in this thread), but I need an explanation (my boss insists) of why Matlab implementation gives me these signs, but not reverted.

UPDATE I found empirically, that in Matlab the loading with highest absolute value is always positive.

zlon
  • 639
  • 4
  • 20
  • Also relevant: http://stats.stackexchange.com/questions/34396. – whuber Feb 01 '17 at 19:30
  • 1
    @whuber I don't think it's a duplicate. This question specifically asks how Matlab chooses the sign. The fact that the sign is arbitrary seems to be clear to OP. But Matlab does *not* choose it randomly: if one runs `pca()` several times on the same data matrix, one gets the same signs (at least usually). I suppose that it has little to do with Matlab, but depends on some specific initialization choices in ARPACK. – amoeba Feb 01 '17 at 19:35
  • It's LAPACK not ARPACK. Anyway, I voted to reopen. – amoeba Feb 01 '17 at 19:47
  • @amoeba Your interpretation makes this purely a Matlab question. A *consistent* choice does not imply it is a *meaningful* choice, anyway. The duplicate explains that a choice is necessary as well as why it's arbitrary. It seems to me that fully answers all the statistical elements of this question. – whuber Feb 01 '17 at 19:55
  • 1
    It is arbitrary choice. But it should be done. So, how is it done in Matlab, or in any other software (R or Stata) ?? – zlon Feb 01 '17 at 19:58
  • 1
    @whuber I think OP's question is meaningful. It might be considered off-topic (even though if it's not only about Matlab but about LAPACK library, then I'd say it's on topic), but it's not a dup. See the above comment by OP too. – amoeba Feb 01 '17 at 19:59
  • @Amoeba The duplicate explains why this question has *no* meaning for PCA. Given that fact, how and why one software app makes one choice while another makes another choice are questions about the software, not about PCA. In fact, the duplicate shows that `R` contains PCA procedures that make *different* choices! – whuber Feb 01 '17 at 20:02
  • 1
    @whuber Yes, it's definitely a question about software. That's why I said it might be considered off-topic. – amoeba Feb 01 '17 at 20:06
  • In any case i found that biggest loading has i positive value. I've done it empirically. How i may answer my question? – zlon Feb 01 '17 at 20:07
  • zlon, you cannot as long as it's closed. But you can edit your question to provide an Update with the results of your research/experimentation. – amoeba Feb 01 '17 at 20:10

2 Answers2

3

Given any solution to PCA, a sign-flipped version of it is an equally valid solution. A numerical solver breaks this symmetry by finding one of these equally valid solutions. Implementation details and initial conditions determine which solution the solver will produce.

One way to think about PCA is that it maximizes the sum of the variance of the data projected onto the weight vectors, subject to the constraint that the weight vectors are orthonormal. Say the data set contains $n$ points in a $d$ dimensional space with mean $\mu$. We seek a set of orthonormal vectors $\{v_1, ..., v_p\}$ that solves the following optimization problem:

$$\max_{v_i, ..., v_p} \quad \sum_{i=1}^p \frac{1}{n} \sum_{j=1}^n \left [ (x_j - \mu)^T v_i \right ]^2 \quad \quad s.t. \quad \begin{array}{ll} \|v_i\| = 1 & \forall i \quad \\ v_i^T v_j = 0 & \forall i \ne j \\ \end{array} $$

Say we have a set of weight vectors that solves this problem. You can see that flipping their signs gives the exact same value for the objective function, and the constraints remain satisfied. Hence, the sign-flipped solution is equally valid. There are various other ways of thinking about PCA and writing it as an optimization problem. Some of these ways sound conceptually different, but they all yield the same set of solutions, and the same reasoning holds for all of them.

The reason a particular function implementing PCA returns any one of these solutions over the others comes down to implementation details and initial conditions. Say we're trying to solve the above problem using a standard optimization solver. It starts from some (possibly random) initial set of parameters, then iteratively updates them to increase the value of the objective function, while respecting the constraints. Imagine the objective function as a hilly landscape, where each location corresponds to a particular choice of parameters and the height at each location is the value of the objective function for those parameters. The constraints define particular regions the solver is allowed to go. Each solution to the problem is the highest allowed location on some surrounding hill. There are multiple hills, and the solutions all have the same height (i.e. are equally good). The solver starts from some initial location in this landscape and generally tries to move uphill, eventualy stopping when it can't make any further uphill progress. So, the solution it finally attains is determined by the hill it starts on and how it chooses to step around the landscape.

Of course, one wouldn't typically solve PCA this way because there are more specialized, computationally efficient ways to do it. For example, one popular method is to obtain the weights as eigenvectors of the covariance matrix. But the eigenvalue solver is itself an iterative algorithm, and is subject to the same kinds of issues.

user20160
  • 29,014
  • 3
  • 60
  • 99
1

An eigenvector remains an eigenvector when you change its sign. Thus, an eigenvector does not have a "sign". As loadings as just eigenvectors multiplied by the square root of the associated eigenvalue, they have no "sign" either.

So now, why MAtlab gives you a specific sign I guess you would have to see how the decomposition is made behind. As far as I have seen on different topics (there or there) the sign is chosen arbitrarily.

Maybe this can help you. Otherwise, there has been some attempt to define what a "right" sign would be here.

EDIT :

Here is a website listing the methods chosen to solve the problem depending on the type of input (type of data in the matrix). I did not want to put it as I cannot find the sources the author of the page uses. But as far as I have seen, it seems to be still true:

http://matlab.izmiran.ru/help/techdoc/ref/eig.html

LouisBBBB
  • 193
  • 13
  • Thanks. "MATLAB chooses to normalize the eigenvectors to have a norm of 1.0, the sign is arbitrary.". Could you explain me what does it means? – zlon Feb 01 '17 at 19:14
  • zlon, suppose you were to ask two surveyors to mark off a one-kilometer segment of a road. Surveyor *A* marks it from east to west, while surveyor *B* marks the same stretch from west to east. Who is correct? Obviously they both are--and there is no way to choose between them that is not arbitrary. PCA finds one-dimensional eigenspaces: these are like roads. The choice of a sign is a choice between the two (equally correct) surveyors. – whuber Feb 01 '17 at 19:33
  • whuber, I understand that it is arbitrary. My question is how Matlab choose "positive" direction. In Your example I may say let's east-west direction will be positive, because sun goes in this way. I found, that "MATLAB chooses to normalize the eigenvectors to have a norm of 1.0". I have terrible math background to understand it. What does it means "norm of 1" for eigenvectors? – zlon Feb 01 '17 at 19:56
  • 1
    Let's try this, then: because the sign makes no difference to PCA, you are free to impose whatever convention you wish. For instance, by switching signs as necessary, you can arrange to make the first nonzero component of each eigenvector be positive. *What Matlab chooses is meaningless.* Even if you could figure out how it chooses, that would only give you information about arbitrary, irrelevant aspects of the underlying numerical algorithms it uses. – whuber Feb 01 '17 at 20:06
  • I understand it. But my problem is: I found, that loading A is positive, and loading B is negative. Then i decide to test the stability of loadings. I perform sampling with replacement for observations. And for each loading I found perfect bimodal distribution. But, variables just contrcorrelated. Thus, for one observations sets have +/- for others -/+. Now i just say ok, sign is arbitrary. I fix loading 1 to be always positive. I obtain perfectly stable loadings. But I should explain how my software choose this arbitrary direction. Does not matter which software. It is important to explain. – zlon Feb 01 '17 at 20:12