In Goodfellow's Deep Learning book (http://www.deeplearningbook.org/contents/regularization.html 7.12) they state:
Because we usually use an inclusion probability of 1/2, the weight scaling rule usually amounts to dividing the weights by 2 at the end of training, and then using the model as usual. Another way to achieve the same result is to multiply the states of the units by 2 during training.
Could someone explain the purpose of rescaling when using dropout? I am having trouble grasping what exactly this is correcting for.