Explain steps of LLE (local linear embedding) algorithm?

Question

I understand the basic principle behind the algorithm for LLE consists of three steps.

Finding the neighborhood of each data point by some metric such as k-nn.
Find weights for each neighbor which denote the effect the neighbor has on the data point.
Construct the low dimensional embedding of the data based on the computed weights.

But the mathematical explanation of steps 2 and 3 are confusing in all the text books and online resources that I have read. I am not able to reason why the formulas are used.

How are these steps performed in practice? Is there any intuitive way of explaining the mathematical formulas used?

References: http://www.cs.nyu.edu/~roweis/lle/publications.html

score 11 · Answer 1 · 2019-03-13T22:37:55.500

Local linear embedding (LLE) eliminates the need to estimate distance between distant objects and recovers global non-linear structure by local linear fits. LLE is advantageous because it involves no parameters such as learning rates or convergence criteria. LLE also scales well with the intrinsic dimensionality of $\mathbf{Y}$. The objective function for LLE is
\begin{equation} \zeta(\mathbf{Y})=(\mathbf{Y}- \mathbf{WY})^2\\ \quad \quad \quad \quad \quad\quad \quad = \mathbf{Y}^\top (\mathbf{I}-\mathbf{W})^\top (\mathbf{I}-\mathbf{W})\mathbf{Y} \end{equation} The weight matrix $\mathbf{W}$ elements $w_{ij}$ for objects $i$ and $j$ are set to zero if $j$ is not a nearest neighbor of $i$, otherwise, the weights for the K-nearest neighbors of object $i$ are determined via a least squares fit of \begin{equation} \mathbf{U}=\mathbf{G}\boldsymbol{\beta} \end{equation} where the dependent variable $\mathbf{U}$ is a $K \times 1$ vector of ones, $\mathbf{G}$ is a $K \times K$ Gram matrix for all nearest neighbors of object $i$, and $\boldsymbol{\beta}$ is a $K \times 1$ vector of weights that follow sum-to-unity constraints. Let $\mathbf{D}$ be a symmetric positive semidefinite $K \times K$ distance matrix for all pairs of the K-nearest neighbors of $p$-dimensional object $\mathbf{x}_i$. It can be shown that $\mathbf{G}$ is equal to the doubly-centered distance matrix $\boldsymbol{\tau}$ with elements \begin{equation} \tau_{lm}=-\frac{1}{2} \left( d_{lm}^2 - \frac{1}{K}\sum_l d_{lm}^2 - \frac{1}{K}\sum_m d_{lm}^2 + \sum_l\sum_m d_{lm}^2 \right). \end{equation} The $K$ regression coefficients are determined numerically using \begin{equation} \underset{K \times 1}{\boldsymbol{\beta}}=\underset{K \times K}{(\boldsymbol{\tau}^\top \boldsymbol{\tau})}^{-1}\underset{K \times 1}{\boldsymbol{\tau}^\top\mathbf{U}}, \end{equation} and are checked to confirm they sum to unity. Values of $\boldsymbol{\beta}$ are embedded into row $i$ of $\mathbf{W}$ at the various column positions corresponding to the K-nearest neighbors of object $i$, as well as the transpose elements. This is repeated for each $i$th object in the dataset. It warrants noting that if the number of nearest neighbors $K$ is too low, then $\mathbf{W}$ can be sparse causing eigenanalysis to become difficult. It was observed that $K=9$ nearest neighbors resulted in $\mathbf{W}$ matrices which did not contain pathologies during eigenanalysis. The objective function is minimized by finding the smallest non-zero eigenvalues of \begin{equation} (\mathbf{I}-\mathbf{W})^\top(\mathbf{I}-\mathbf{W})\mathbf{E}=\boldsymbol{\Lambda}\mathbf{D}\mathbf{E}. \end{equation} The reduced form of $\mathbf{X}$ is represented by $\mathbf{Y}=\mathbf{E}$ where $\mathbf{E}$ has dimensions $n \times 2$ based on the two lowest eigenvalues of $\boldsymbol{\Lambda}$.

"K=9 nearest neighbors" Doesn't this depend on the dimensionality of $Y$? For example, if $Y$ has fewer than 9 dimensions, then the weight matrix $W$ is not uniquely determined. Does this cause problems with LLE? — Him, Jan 30 '19 at 17:37
Yes, but if there are, say, 8 dimensions then for randomish data literally every point can be written perfectly as a linear combination of 9 others, in an infinite number of ways. — Him, Jan 31 '19 at 14:26
There are always "what if" scenarios when implementing a technique, and that's why parameter constraints are used. — , Feb 01 '19 at 16:18

Explain steps of LLE (local linear embedding) algorithm?

1 Answers1

Linked