I've been looking at DBNs:
- first a greedy (unsupervised) layerwise pretraining.
- now split weights into recognition R and generative G, and apply Wake-Sleep (again unsupervised, i.e. Unlabelled Data)
- now use back propagation with labelled data for fine tuning.
This approach looks aesthetically very attractive to me.
But I have heard that "According to the keynote i saw from Bengio in december (2016), ReLU units solved the problem of deep nets; rendering pretraining largely obsolete."
Are these pretty techniques set to be consigned to a chapter in the history books of ML? Or are they still relevant?