2

(This question is from my pattern recognition course.) There is this exercise:

Imagine we have $N$ samples with $n$ dimensions.

First it asks to find a point $m$ where the summation of Euclidean distances from $m$ is minimum. Then imagine another vector $e$ with size of 1. $e$ passes from $m$. Every point on this line crosses from: $x=m+\alpha*e$. The $\alpha_k$ value is the distance of a point on the line where the distance from that point and the $x_k$ is minimum. Then the exercise asks me to find values of $\alpha_k$ where the distance is minimum (i.e., the dashed line).

The last part wants me to prove that the desired values of $\alpha_k$ are actually the eigenvector with the maximum eigenvalue of the below estimation of covariance matrix: $\Sigma=1/N\sum_{k=0}^{k=N} (x-m)(x-m)^t $

enter image description here

A.s
  • 49
  • 1
  • 4
  • if there is anything unclear about it just ask. these questions are really hard for me! – A.s Oct 06 '13 at 16:34
  • If your least squares linear fit goes through the origin, they are related. – Memming Oct 06 '13 at 16:35
  • they are random samples. they may or may not be centered – A.s Oct 06 '13 at 16:35
  • 2
    Regarding the relationship between Eigenvectors & a regression line, it may help you to read this thread: [Making sense of principal component analysis, eigenvectors & eigenvalues](http://stats.stackexchange.com/questions/2691/), & possibly my answer here: [What is the difference between linear regression on Y with X and X with Y?](http://stats.stackexchange.com/questions/22718//22721#22721). – gung - Reinstate Monica Oct 06 '13 at 16:58
  • 2
    We welcome questions of this type, but we treat them differently (see our [help page](http://stats.stackexchange.com/help/on-topic)). We need to know what you understand / have done so far, & then we provide hints to get you unstuck. – gung - Reinstate Monica Oct 06 '13 at 17:07
  • 1
    The question makes no sense, because for each $e$ each $x_k$ determines its *own* value of $\alpha_k$, whence there are $N$, not just $1$, $\alpha_k$ for each $e$. Furthermore, the covariance matrix is $n$ by $n$, whence any eigenvector will have $n$ dimensions. Unless $N+1$ (the number of $\alpha_k$) and $n$ are identical, you can't possibly think of $(\alpha_k)$ being an eigenvector. Finally, if you do want to find the maximum eigenvalue (and its eigenvector), you want to find a direction $e$ for which the *sum of squares* of the $\alpha_k$ is *maximized*, not minimized,. – whuber Oct 07 '13 at 14:51
  • 1
    i think you are right there is something wrong. i have been thinking about for days now and it makes no sense at all! i can get that $m$ is the mean value of samples and if we want to minimize the distance $\alpha_k$ we have to assume that estimated X by formula $e*x+m$ is the orthogonal projection of X on that line. – A.s Oct 08 '13 at 19:17

1 Answers1

0

This question is kinda old, but it still shows up on searches, so I think is maybe worth answering in part.

What you are doing is creating a least squares regression line with the mean $\bar x$ of your data $\vec x$. If you normalize your data, then the regression line is an eigenvector of the linear transformation matrix.

In this context, to normalize the data, you would subtract the mean, $\bar x$, from all observations, $\vec x$, and divide each observation by the standard deviation, $s$, of the observations. This normalization yields a dataset with a mean of zero and a standard deviation of 1, so $\bar x$ is on the origin.

Greenstick
  • 132
  • 5
Dirk
  • 109
  • 2
  • Least squares regression of what variable against what other variable? With $n=2$, if you regress either component against the other, you will *not* obtain the line described in the question (unless the variables are perfectly correlated to begin with. For larger $n$ it's unclear what you would be regressing to obtain a line. – whuber Oct 26 '17 at 22:39