I was reading a paper published on Dropout. What I find difficulty in understanding that, In the training phase, a unit is present with a probability p
and not present with a probability 1-p
. In the test phase, all units are present, but we multiply each of them with the probability.
Now, is it like, let we have 4 input units originally named a,b,c,d. In the training stage, after applying dropout, with a dropout rate of 0.5, we are left with units a and c. So, As in the test stage, all the units are present, so, is it like, we multiply each of the units with 0.5? Also, Is p
defined for each of the units in the network, or for the entire Neural Network?
Also, In doing so, how is the result same for training and test stage?