I have a question about probabilistic PCA (PPCA) and regular PCA, particularly regarding transforming to and from the latent space. The main question (detailed in the following) is: when are the eigenvalues of the covariance matrix used in the transformations?
In all cases, I assume $X \in \mathbb{R}^{m\times n}$, where each row is a datum and each column is a feature, but a single datum is written as a column vector $x_j\in\mathbb{R}^{n\times 1}\equiv \mathbb{R}^{n}$ when alone (unfortunate habits of some fields). Let's also assume $X$ is centered, for simplicity.
PCA
There are two methods to write the PC decomposition. One is to use the SVD of the data matrix: $$ X = U\Sigma V^T\;\;\;\implies\;\;\; XV = U\Sigma =: Z $$ thus we have that $V^T$ is the transformation matrix; i.e., $$ z_\ell = V^T x_\ell $$ is the mapping from the data space to the latent space. Geometrically, the columns of $V$ are orthogonal axes of the principal space, and we are simply projecting onto them.
The other method is eigendecomposition of the covariance matrix: $$ \hat{C} = V\Lambda V^T\;\;\; \text{where} \;\;\;\hat{C}=\frac{1}{n-1} X^T X$$ so that the eigenvectors (columns of $V$) of $\hat{C}$ can form the basis of a new space. The eigenvalues $\lambda_i$ in $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$ are often called the "explained variance" of axis $i$.
For dimensionality reduction, we take only the first $k$ columns of $V$ as the basis of the latent space (so $z_j\in\mathbb{R}^k$), truncating $V$ into a new matrix $V_k \in \mathbb{R}^{n \times k}$. Then the transformation equations are: \begin{align} z &= V_k^T x \\ \hat{x} &= V_k z \end{align} where $x$ and $z$ are any vectors in the data and latent space respectively. Notice we do not use $\Lambda$ or $\Sigma$ - hopefully this is correct.
PPCA
The PPCA article assumes the relation between $z$ and $x$ can be modelled by a probability model: \begin{align} p(x|z) &= \mathcal{N}( Wz + \mu, \sigma^2 I) \\ \mathbb{E}[z|x] &= M^{-1} W^T (x - \mu) \end{align} with $z\sim \mathcal{N}(0,I)$, $M = W^TW + \sigma^2 I$, and $\mu= 0$ since $X$ is centered.
The authors show that the maximum likelihood estimate for $W$ (for a given $k$) is given by: $$ W = V_k (\Lambda_k - \sigma^2 I)^{1/2} R, $$ where $R$ is an arbitrary orthogonal matrix. So $W$ is not the orthogonal matrix $V$ of eigenvectors - it is instead a rotation and axis-wise scaling of the original principal directions. I'll assume $R = I$ (as in the original article).
So the transformations now look as follows. Given a point in latent space, the mean of the data-space posterior can be computed as \begin{align} \mathbb{E}[x|z] &= Wz = V_k (\Lambda_k - \sigma^2 I)^{1/2} z \end{align} while the expected value of the latent space posterior given a datum $x$ is given by $$ \mathbb{E}[z|x] = (W^TW + \sigma^2 I)^{-1} W^T x. $$ Now suppose $\sigma\rightarrow 0$. Then \begin{align} \mathbb{E}[x|z] &= V_k \sqrt{\Lambda_k} z \end{align} transforms from latent space to data space, and \begin{align} \mathbb{E}[z|x] &= (W^TW + \sigma^2 I)^{-1} W^T x \\ &= (W^TW)^{-1} W^T x \\ &= (\sqrt{\Lambda_k} V_k^T V_k \sqrt{\Lambda_k})^{-1} \sqrt{\Lambda_k} V_k^T x \\ &= \Lambda_k^{-1/2} \underbrace{V_k^T V_k}_I V_k^T x \\ &= \Lambda_k^{-1/2} V_k^T x \\ \end{align} transforms from data space to latent space.
Notice that this transformation (1) is not the same as that of PCA, despite the disappearance of $\sigma$, and (2) there is the presence of $\Lambda$ in these transformations. It merely scales the axes by the standard deviations, as far as I can tell, but why does it appear here and not in PCA? I suppose for reconstruction (as long as one is consistent) it doesn't matter, but how does it affect either mapping (i.e., $z\rightarrow x$ and $x\rightarrow z$ separately)?
Furthermore, looking at this nice answer, the answerer notes that "one selects a -dimensional vector that represents the point in the "reduced" -space of dimensions, then to map it back to dimensions one needs to multiply it with $S_k V_k^T$" where $S_k = \sqrt{\Lambda_k} = \Sigma_k$. The eigenvalues seemingly do not appear elsewhere in the transformations.
Question Summary
When should we use the eigenvalues of the covariance matrix in scaling the components during PCA and PPCA? Is there any difference caused by PPCA vs regular PCA? I am asking specifically in the context of transforming to and from the latent space.