3

From what I understand, you're supposed to rescale your activation layer after an application of dropout by an amount proportional to how much you dropped. Essentially truing up the lost relevance (poorly stated, but I hope I made myself clear). My question is whether this rescaling for lost weight is necessary or recommended after an application of drop connect on a matrix of weights?

piRSquared
  • 251
  • 1
  • 10

1 Answers1

2

Unlike dropout, the inference of dropconnect is sort of a MC method according to the papper Sec 3.2.
enter image description hereenter image description here
It approximates the pre-activation values with Gaussian distributions, draws a number of samples from the Gaussians, and take the average of the activation values of the samples.

dontloo
  • 13,692
  • 7
  • 51
  • 80