Local linear embedding (LLE) eliminates the need to estimate distance between distant objects and recovers global non-linear structure by local linear fits. LLE is advantageous because it involves no parameters such as learning rates or convergence criteria. LLE also scales well with the intrinsic dimensionality of $\mathbf{Y}$. The objective function for LLE is
\begin{equation}
\zeta(\mathbf{Y})=(\mathbf{Y}- \mathbf{WY})^2\\
\quad \quad \quad \quad \quad\quad \quad = \mathbf{Y}^\top (\mathbf{I}-\mathbf{W})^\top (\mathbf{I}-\mathbf{W})\mathbf{Y}
\end{equation}
The weight matrix $\mathbf{W}$ elements $w_{ij}$ for objects $i$ and $j$ are set to zero if $j$ is not a nearest neighbor of $i$, otherwise, the weights for the K-nearest neighbors of object $i$ are determined via a least squares fit of
\begin{equation}
\mathbf{U}=\mathbf{G}\boldsymbol{\beta}
\end{equation}
where the dependent variable $\mathbf{U}$ is a $K \times 1$ vector of ones, $\mathbf{G}$ is a $K \times K$ Gram matrix for all nearest neighbors of object $i$, and $\boldsymbol{\beta}$ is a $K \times 1$ vector of weights that follow sum-to-unity constraints.
Let $\mathbf{D}$ be a symmetric positive semidefinite $K \times K$ distance matrix for all pairs of the K-nearest neighbors of $p$-dimensional object $\mathbf{x}_i$. It can be shown that $\mathbf{G}$ is equal to the doubly-centered distance matrix $\boldsymbol{\tau}$ with elements
\begin{equation}
\tau_{lm}=-\frac{1}{2} \left( d_{lm}^2 - \frac{1}{K}\sum_l d_{lm}^2 - \frac{1}{K}\sum_m d_{lm}^2 + \sum_l\sum_m d_{lm}^2 \right).
\end{equation}
The $K$ regression coefficients are determined numerically using
\begin{equation}
\underset{K \times 1}{\boldsymbol{\beta}}=\underset{K \times K}{(\boldsymbol{\tau}^\top \boldsymbol{\tau})}^{-1}\underset{K \times 1}{\boldsymbol{\tau}^\top\mathbf{U}},
\end{equation}
and are checked to confirm they sum to unity. Values of $\boldsymbol{\beta}$ are embedded into row $i$ of $\mathbf{W}$ at the various column positions corresponding to the K-nearest neighbors of object $i$, as well as the transpose elements. This is repeated for each $i$th object in the dataset. It warrants noting that if the number of nearest neighbors $K$ is too low, then $\mathbf{W}$ can be sparse causing eigenanalysis to become difficult. It was observed that $K=9$ nearest neighbors resulted in $\mathbf{W}$ matrices which did not contain pathologies during eigenanalysis. The objective function is minimized by finding the smallest non-zero eigenvalues of
\begin{equation}
(\mathbf{I}-\mathbf{W})^\top(\mathbf{I}-\mathbf{W})\mathbf{E}=\boldsymbol{\Lambda}\mathbf{D}\mathbf{E}.
\end{equation}
The reduced form of $\mathbf{X}$ is represented by $\mathbf{Y}=\mathbf{E}$ where $\mathbf{E}$ has dimensions $n \times 2$ based on the two lowest eigenvalues of $\boldsymbol{\Lambda}$.