0

I am trying to derive the first principal component direction from the definition and need help in finding which step is going wrong. Here's my attempt:

$\mathbf{X} \in \mathbb{R}^{N \times p}$ is the centered data matrix, then finding the first principal component direction involves finding a vector $\mathbf{v}$ such that when $\mathbf{X}$ is projected onto $\mathbf{v}$, the variance of the projected data is maximized.

Now, the variance of data projected is given by

$$ \begin{aligned} \hat{\sigma}^2 &= \frac{1}{N} \frac{\mathbf{v}^\intercal\mathbf{X}^\intercal\mathbf{X}\mathbf{v}}{(\mathbf{v}^\intercal\mathbf{v})^2} \\ &= \frac{\mathbf{v}^\intercal \mathbf{S}\mathbf{v}}{(\mathbf{v}^\intercal\mathbf{v})^2} \end{aligned} $$ where $\mathbf{S}$ is the sample covariance matrix of the original data.

Now $\mathbf{v}$ that maximizes $\hat{\sigma}^2$ should satisfy $\frac{d }{d \mathbf{v}} \hat{\sigma}^2 = 0$.

$$ \begin{aligned} d (\hat{\sigma}^2) &= d \Big(\frac{\mathbf{v}^\intercal \mathbf{S}\mathbf{v}}{(\mathbf{v}^\intercal\mathbf{v})^2}\Big) \\ &= \frac{d \big(\mathbf{v}^\intercal \mathbf{S}\mathbf{v}\big)}{(\mathbf{v}^\intercal\mathbf{v})^2} + (\mathbf{v}^\intercal \mathbf{S}\mathbf{v} ) d\big( (\mathbf{v}^\intercal\mathbf{v})^{-2}\big) \\ &= \frac{2 \mathbf{v}^\intercal \mathbf{S}}{(\mathbf{v}^\intercal\mathbf{v})^2} d \mathbf{v} + (\mathbf{v}^\intercal \mathbf{S}\mathbf{v} ) (-2) \frac{2 \mathbf{v}^\intercal}{(\mathbf{v}^\intercal\mathbf{v})^3} d \mathbf{v} \end{aligned} $$

$$ \frac{d}{d \mathbf{v}} \hat{\sigma}^2 = \frac{2}{(\mathbf{v}^\intercal\mathbf{v})^2} \Big(\mathbf{v}^\intercal \mathbf{S} - \frac{2 (\mathbf{v}^\intercal \mathbf{S}\mathbf{v}) \mathbf{v}^\intercal}{\mathbf{v}^\intercal\mathbf{v}} \Big) $$

Setting the derivative to zero (and taking a transpose) gives me,

$$ \begin{aligned} \mathbf{S}\mathbf{v} &= 2 \frac{\mathbf{v}^\intercal\mathbf{S}\mathbf{v}}{\mathbf{v}^\intercal\mathbf{v}} \mathbf{v} \\ &=2 \frac{\mathbf{v}^\intercal\mathbf{S}\mathbf{v}}{(\mathbf{v}^\intercal\mathbf{v})^2} (\mathbf{v}^\intercal\mathbf{v})\mathbf{v} \\ &= 2 \hat{\sigma}^2 (\mathbf{v}^\intercal\mathbf{v}) \mathbf{v} \end{aligned} $$

From above, I can see that $\mathbf{v}$ has to be an eigenvector of $\mathbf{S}$. To ensure the uniqueness of the first principal component direction, I enforce that $\mathbf{v}$ has to be a unit vector which gives me $$ \mathbf{S}\mathbf{v} = 2 \hat{\sigma}^2 \mathbf{v} $$

Now, to maximize the variance, $\mathbf{v}$ has to be the principal eigenvector of $\mathbf{S}$ (because eigenvalue is proportional to the variance).

However, something has to be wrong here because I know that variance is not just proportional, but equal to the largest eigenvalue of $\mathbf{S}$. Where did I go wrong?

ethelion
  • 27
  • 5
  • At the point where you started differentiating you began working far too hard, increasing the likelihood of making some algebraic mistake. Some helpful ways to think about this problem are discussed in my post at https://stats.stackexchange.com/a/301561/919. – whuber Apr 10 '21 at 12:39
  • 1
    @whuber Thanks for the comment and the pointer. In your answer there, you say "we may try to maximize the unconstrained object $\mathbf{x}^\intercal \mathbf{A} \mathbf{x} / \mathbf{x}^\intercal \mathbf{x}$". I tried differentiating this objective function and I arrived at the correct solution. However, my objective function is of the form $\mathbf{x}^\intercal \mathbf{A} \mathbf{x} / (\mathbf{x}^\intercal\mathbf{x})^2$. This must be where I went wrong, right? – ethelion Apr 10 '21 at 15:11
  • In case it's helpful, I have posted another question specifically to verify the calculation of variance of data projected onto a vector that is not necessarily a unit vector. https://stats.stackexchange.com/questions/519050/variance-of-projected-data – ethelion Apr 10 '21 at 16:06

0 Answers0