I have a question related to the dropout function in the LSTM tutorial: http://deeplearning.net/tutorial/code/lstm.py
def dropout_layer(state_before, use_noise, trng):
proj = tensor.switch(use_noise,
(state_before *
trng.binomial(state_before.shape,
p=0.5, n=1,
dtype=state_before.dtype)),
state_before * 0.5)
return proj
To my understanding, the code means that when use_noise=1
, we multiple state_before
by a random binary vector (i.e. the dropout procedure).
But when use_noise=0
, which is used when we validate the model, we set hidden unit values as state_before*0.5
.
Why *0.5
here?
Shouldn't it be just state_before
without multiplying by any number?