In probabilistic PCA (PPCA), they model dimensionality reduction as probabilistic model
$t = Wx + \mu + \epsilon$
where
$W$ is non-swuare matrix of size $(d \times q)$, $q < d$
$x$ is a vector from $q$-dimensional "latent" space
$t$ is a vector from $d$ dimensional "observable" space
$\epsilon$ is normally distributed random variable, covering "insufficiency" or transform from low-dimensional space to high-dimensional.
So, forward transform (from latent space to observable space) is performed by matrix
$W$
and reverse transform is performed by matrix
$(\sigma^2 I + W^T W)^{-1} W^T$
(formula for "posterior" mean in the middle of paragraph in chapter 3.3)
I know matrices are not square, but why is it so different?
UDPATE
Now I saw formula (16)
which claims reverse transform is done with the matrix
$W(W^T W)^{-1}(\sigma^2 I + W^T W)$
anyway, the question is why is it so complex and is there any notion of matrix inverse for non-square matrices in this case?