Is it possible to achieve state of the art results by using back-propagation only (without pre-training) ?
Or is it so that all record breaking approaches use some form of pre-training ?
Is back-propagation alone good enough ?
Is it possible to achieve state of the art results by using back-propagation only (without pre-training) ?
Or is it so that all record breaking approaches use some form of pre-training ?
Is back-propagation alone good enough ?
Pre-training is no longer necessary. Its purpose was to find a good initialization for the network weights in order to facilitate convergence when a high number of layers were employed. Nowadays, we have ReLU, dropout and batch normalization, all of which contribute to solve the problem of training deep neural networks. Quoting from the above linked reddit post (by the Galaxy Zoo Kaggle challenge winner):
I would say that the “pre-training era”, which started around 2006, ended in the early ’10s when people started using rectified linear units (ReLUs), and later dropout, and discovered that pre-training was no longer beneficial for this type of networks.
From the ReLU paper (linked above):
deep rectifier networks can reach their best performance without requiring any unsupervised pre-training
With that said, it is no longer necessary, but still may improve performance in some cases where there are too many unsupervised (unlabeled) samples, as seen in this paper.