Is there a solid reference on pre-training methods in deep neural networks which never see the actual inputs? Any such known thing in literature?
I guess a more correct term is "initialization using gradient methods" instead of "pre-training".
I see it like this: generating layerwise iid weights is the simplest approach. We can do better by unsupervised pre-training, if we have a dataset. But what can be done between these two extremes?