I understand that pretraining is used to avoid some of the issues with conventional training. If I use backpropagation with, say an autoencoder, I know I'm going to run into time issues because backpropagation is slow, and also that I can get stuck in local optima and not learn certain features.
What I don't understand is how we pretrain a network and what specifically we do to pretrain. For example, if we're given a stack of restricted Boltzmann Machines, how would we pretrain this network?