What is the intuition for dropout used in convolutional neural networks?

Question

Can someone give an intuition behind drop-out method used in convolutional neural networks?

What is exactly drop-out doing?

score 3 · Accepted Answer · edited Jun 11 '20 at 14:32

As described in the paper introducing it, dropout proceeds like so:

During training, randomly remove units from the network. Update parameters as normal, leaving dropped-out units unchanged.

The only difference is that for each training case in a mini-batch, we sample a thinned network by dropping out units. Forward and backpropagation for that training case are done only on this thinned network. [...] Any training case which does not use a parameter contributes a gradient of zero for that parameter.
At test time, account for this by rescaling:

If a unit is retained with probability $p$ during training, the outgoing weights of that unit are multiplied by $p$ at test time as shown in Figure 2. This ensures that for any hidden unit the expected output (under the distribution used to drop units at training time) is the same as the actual output at test time.

The intuition is that we'd like to find the Bayes optimal classifier, but doing that for a large model is prohibitive; per the paper, using a full network trained via dropout is a simple approximation that proves useful in practice. (See the paper for results on a variety of applications. One application includes a convolutional architecture.)

thanks . but still I cant grasp the idea. what is a Bayes optimal classifier? Is'nt there something more intuitive? I mean something that doesn't necessarily include mathematics and probabilities? I'm just asking though :-( — Hossein, Apr 18 '16 at 20:11
Hmm, try this: The paper thinks of a network of $n$ neurons as a collection of $2^n$ smaller networks. Averaging *all* of those would yield a sort of 'gold standard,' one that would on average outperform others, but computation would be prohibitive; dropout is a practical approximation of this optimum. For Bayes optimal classifiers, I'd suggest starting [here](https://en.wikipedia.org/wiki/Ensemble_learning#Bayes_optimal_classifier), then posting followup questions about any confusions. — Sean Easter, Apr 18 '16 at 21:13

rajesh-nitc · Answer 2 · 2017-05-19T14:22:43.640

1

When you find that your model is overfitting i.e. doing well on a cross validation during training but suffers in an independent test set, then you add dropout layers to reduce dependence on training set.

https://www.quora.com/How-does-the-dropout-method-work-in-deep-learning/answer/Arindam-Paul-3

edited May 19 '17 at 14:22

answered May 18 '17 at 06:26

rajesh-nitc

111
5

What is the intuition for dropout used in convolutional neural networks?

2 Answers2