1

In Principal Component Analysis, we start with $m$ observations $x_1,\dots,x_m$, each of which is an $n$-dimensional vector. Assume we have centered the data; that is, we have subtracted the variable means from each observation. To project each observation into a $1$-dimensional subspace while maximizing sample variance, we want to compute a unit length basis vector, call it $v$. So we get the constrained problem

$$ \mathop{\arg\,\max}\limits_{v}\,\frac{1}{m}\sum_{i=1}^{m}\left\lVert\langle x_i,v\rangle v\right\rVert $$

$$ \text{subject to }v^Tv=1 $$

After some computations, this becomes

$$ \mathop{\arg\,\max}\limits_{v}\,\frac{1}{m}\sum_{i=1}^{m}v^TCv $$

$$ \text{subject to }v^Tv=1 $$

where $C$ is the covariance matrix of the data. Then the Lagrangian objective function is

$$ \mathcal{L}(v,\lambda)=v^TCv-\lambda(v^Tv-1) $$

If we start taking partial derivatives of $\mathcal{L}$, setting these to zero, etc, how do we know that we are maximizing $\mathcal{L}$? Do we need to check that the Hessian is negative definite? Even then, I think that only guarantees a local maximum?

  • Because the constraint $v^\prime v=1$ defines a *compact* domain and the objective function is continuously differentiable, you are guaranteed that at least one critical point is a global maximum. – whuber Dec 04 '18 at 19:01
  • @whuber And not a global minimum? – waynemystir Dec 04 '18 at 19:17
  • It seems the relevant answers on this site don't attempt to check all the critical points of $\mathcal{L}$ (for example, https://stats.stackexchange.com/questions/10251/what-is-the-objective-function-of-pca https://stats.stackexchange.com/questions/301532/is-pca-optimization-convex) – Frank Dec 04 '18 at 19:51
  • Thanks Frank, good point and very helpful links. I'll also add this https://stats.stackexchange.com/questions/217995/what-is-an-intuitive-explanation-for-how-pca-turns-from-a-geometric-problem-wit where @amoeba 's first proof uses Lagrange multipliers. Aside from checking critical points, I would think a negative definite Hessian would suffice. – waynemystir Dec 04 '18 at 20:38
  • Whether you need to check for a global minimum *versus* a global maximum depends on the algorithm. Many of them simply iteratively apply the matrix, relying on the fact that the result has probability $1$ of converging exponentially to the maximum rather than the minimum. – whuber Dec 05 '18 at 19:16

0 Answers0