Can someone give an intuition behind drop-out method used in convolutional neural networks?
What is exactly drop-out doing?
Can someone give an intuition behind drop-out method used in convolutional neural networks?
What is exactly drop-out doing?
As described in the paper introducing it, dropout proceeds like so:
The only difference is that for each training case in a mini-batch, we sample a thinned network by dropping out units. Forward and backpropagation for that training case are done only on this thinned network. [...] Any training case which does not use a parameter contributes a gradient of zero for that parameter.
If a unit is retained with probability $p$ during training, the outgoing weights of that unit are multiplied by $p$ at test time as shown in Figure 2. This ensures that for any hidden unit the expected output (under the distribution used to drop units at training time) is the same as the actual output at test time.
The intuition is that we'd like to find the Bayes optimal classifier, but doing that for a large model is prohibitive; per the paper, using a full network trained via dropout is a simple approximation that proves useful in practice. (See the paper for results on a variety of applications. One application includes a convolutional architecture.)
When you find that your model is overfitting i.e. doing well on a cross validation during training but suffers in an independent test set, then you add dropout layers to reduce dependence on training set.
https://www.quora.com/How-does-the-dropout-method-work-in-deep-learning/answer/Arindam-Paul-3