As the title states: PCA is an autoencoder with one hidden layer with linear transfer functions.
Can someone explain this sentence to me? I understand that if you want to map something into lower dimensional space then PCA has essentially perfectly optimized this procedure. With this understanding, if an autoencoder has a loss function that aims to find the best representation in lower dimensional space (say dimension H) then PCA that takes H rank basis vectors will produce the exact same result as the autoencoder?
Which of these is quicker and a better approach?