I think we have to further break this question in order to approach its solution.
First, I think the prime comparison is between AE and VAE, given that both can be applied for dimensionality reduction. The advantage of VAE, in this case, is clearly answered here. The main point is in addition to the abilities of an AE, VAE has more parameters to tune that gives significant control over how we want to model our latent distribution.
Second, what do you mean by "Saw well-applied VAE on mnist", if it means you observed the resulting images and concluding that the images are of better quality in case of VAE w.r.t to SAE. I think you should also check the MSE loss values between the input and output of the network.
Third, SAE is a version of AE itself where the hidden layers of AE are trained by unsupervised pre-training. So on a higher level, it helps you to find an optimal solution to AE. On the other hand, VAE may further enhance the solution depending on its settings.