How can I experiment with Lagrange multiplier in PCA optimization?

Question

Suppose we want to solve following optimization problem (it is a PCA problem in this post)

$$ \underset{\mathbf w}{\text{maximize}}~~ \mathbf w^\top \mathbf{Cw} \\ \text{s.t.}~~~~~~ \mathbf w^\top \mathbf w=1 $$

As mentioned in the linked post, using the Lagrange multiplier, we can change the problem into

$$ {\text{minimize}} ~~ \mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1)) $$ Differentiating, we obtain $\mathbf{Cw}-\lambda\mathbf w=0$, which is the eigenvector equation. Problem solved and $\lambda$ is the largest eigenvalue.

I am trying to do a numerical example here to understand more about how Lagrange multiplier changed the problem, but not sure my validation process is correct.

I experimented with the iris data's co-variance matrix (see code). The figure shows geometric solution to the problem, where the black curve are the contours of the objective function, and green curve is the constraints. The red curve shows the optimal solution that can maximize the objective and satisfy constraints.

In my code, I am trying to use optimx to minimize a unconstrained objective function. I am replacing the $\lambda$ with the solution from eigen decomposition.

X=iris[,c(1,3)]
X$Sepal.Length=X$Sepal.Length-mean(X$Sepal.Length)
X$Petal.Length=X$Petal.Length-mean(X$Petal.Length)
C=cov(X)
r=eigen(C)

obj_fun<-function(x){
  w=as.matrix(c(x[1],x[2]),ncol=1)
  lambda=r$values[1]
  v=t(w) %*% C %*% w + lambda *(t(w) %*% w -1)
  return(as.numeric(v))
}

gr<-function(w) {
  lambda=r$values[1]
  v=2* C %*% w + 2*lambda* w
  return(v)
}

res=optimx::optimx(c(1,2), obj_fun,gr, method="BFGS")

I am getting following results, where the objective function is negative of the optimal value to the graphical solution. And two parameters p1 and p2 are 0.

My question is that is such validation method right? i.e., can we replace $\lambda$ with largest eigen value and minimize the objective function $\mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1))$ to get a solution?

I edited your question for readability. However, I don't think I understand what you are asking. Are you asking if your code is correct? Why do you think it's not correct? Do you have any problems with it? Simply asking "is my code correct?" is not a good question for this forum. Your second question about the sign of $\lambda$ is entirely separate; the answer to it is that it's arbitrary: $\lambda$ is a free parameter to be optimized over, it can have any sign. I don't understand how it's connected to your question about the code. Perhaps you can remove this second Q, if it becomes clear. — amoeba, Jun 30 '16 at 20:35
Hmm, also: didn't you have another related question, about Lagrange multipliers in PCA and ridge regression? I cannot find it anymore. Have you deleted it? If so, why? — amoeba, Jun 30 '16 at 20:37
@amoeba Thanks for asking. I deleted another question because I think there are too much incorrect notation in math. And I put some lesson learned in this question's answer. Thanks for suggestions to this question and I will edit it. I already got the answers for the signs for $\lambda$ — Haitao Du, Jul 01 '16 at 18:44
For another view on experimenting with PCA, see this paper *Cauchy Principal Component Analysis*, by two CMU statisticians, Xie and Ming. In it, they explore the impact that differently shaped input information (e.g., fat-tailed data) can have on PCA performance and discuss varying strategies to dealing with it. — Mike Hunter, Jul 01 '16 at 19:20

Haitao Du · Answer 1 · 2017-09-05T17:48:16.533

3

I think I got the answer by myself but wish some experts can confirm.

The confusion is that, in CVX book we are converting one optimization problem with constraints to another optimization problem without constraints and solve the dual problem. But in PCA optimization we cannot.

For example, page 227, we convert

$$ \underset{x}{\text{minimize}}~~ x^\top x \\ \text{s.t.}~~~~~~ Ax=b $$

into maximize the dual function $g(v)=-(1/4)v^\top A A^\top v -b^\top v$, which is

$$ \underset{x}{\text{maximize}}~~\left(-(1/4)v^\top A A^\top v -b^\top v \right)\\ $$

In PCA optimization problem, problem has Lagrangian (for equality constraint we can use $-\lambda$)

$$ \mathcal{L}(\mathbf w,\lambda)=\mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1) $$

For fixed $\lambda$, we get partial derivative and set to $\mathbf 0$.

$$ \frac{\partial \mathcal{L}}{\partial \mathbf w}=\mathbf 0=2\mathbf {Cw}-2\lambda\mathbf w $$

which is the eigenvector equation

$$ \mathbf {Cw}=\lambda\mathbf w $$

As pointed out by Matthew Gunn in the comment, PCA problem the objective is not convex see this discussion. Therefore we should not try to minimize dual function to solve the original problem.

edited Sep 05 '17 at 17:48

answered Jun 20 '16 at 20:17

Haitao Du

32,885
17
118
213

I don't get it. The Lagrangian $w^\top Cw - \lambda (w^\top w - 1)$ is definitely not a constant. What is "dual function"? – amoeba Jun 20 '16 at 20:22
@amoeba in [cvx book](http://stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) page 216, section 5.1.2. "We define the Lagrange dual function g ..." – Haitao Du Jun 20 '16 at 20:25
Also I remember the lesson from the optimization class, "maximize primal is equal to minimize dual". such as page 5, section 4.2 of [this link](http://web.mit.edu/15.053/www/AMP-Chapter-04.pdf) – Haitao Du Jun 20 '16 at 20:32
2

@hxd1011 Only if *strong duality* holds! In all cases $\max_y \min_x f(x,y) <= \min_x \max_y f(x,y)$. That's known as weak duality. $\max_y \min_x f(x,y) = \min_x \max_y f(x,y)$ is strong duality, *aka* the saddle point property. A big category of problems where strong duality holds for the Lagrangian function is the set of convex optimization problems where [Slater's condition](https://en.wikipedia.org/wiki/Slater%27s_condition) is satisfied. – Matthew Gunn Jun 20 '16 at 20:42
1

This thread appears to have some useful info: https://groups.google.com/forum/#!topic/10725-f12/P9e8BsqaAok – Matthew Gunn Jun 20 '16 at 20:49

How can I experiment with Lagrange multiplier in PCA optimization?

1 Answers1