Auto-encoders' learning process and overfitting

Question

Reading about autoencoders from Ian Goodfellow's deep learning book, and they made this statement about autoencoders learning process on page 494:

"Unfortunately, if the encoder and the decoder are allowed too much capacity, the autoencoder can learn to perform the copying task without extracting useful information about the distribution of the data."

Can someone please explain what they mean by this sentence, capacity in what sense?

https://stats.stackexchange.com/questions/312424/what-is-the-capacity-of-a-machine-learning-model/312578 — doubllle, Jul 31 '20 at 11:47

score 2 · Accepted Answer · answered Jul 31 '20 at 13:02

Relevant question. The most simple autoencoder has the NN-topology:

$n-k-n$

with $n$ the number of input/output nodes and $k$ the number of hidden nodes.

We talk of encoding/decoding when $k<n$. When the activation functions of the hidden and output nodes are $linear$: $y=x$, then the autoencoder performs principal component analysis. This use case has been analyzed in depth by others.

The higher principal components capture 'signal' whereas the minor components mainly propagate 'noise'. The principal components are ranked by their eigenvalues $\lambda_i,\; i \in \{{1\ldots k}\}$.

So $k$ equals capacity - and $k$ should be chosen as to propagate signal, but not noise.

Much more complex autoencoders with nonlinear activation functions and more than one hidden layer have been developed also. Because of their nonlinearity, these NNs are complex to study analytically.

Auto-encoders' learning process and overfitting

1 Answers1