How can a network with only ReLU nodes output negative values?

Question

I'm trying to use an api with a feedforward neural network for time series forecasting. For dense aggregate data it works fine, but for sparse data it sometimes forecasts negative values, even though my historical data has only positive values.

The source code is very dense, and I might be missing a line or two, but as far as I can tell, the input layers and the hidden layers are all ReLU nodes.

Assuming I am correct, how can a network with only ReLU layers lead to negative values, especially if none of the training data has negative values?

also, might the api automatically normalize the predictors and targets, and unnormalize the network outputs? — shimao, Aug 17 '18 at 01:46

Sycorax · Accepted Answer · 2020-02-20T15:13:25.913

Consider the definition of the ReLU:

$$ f(x) = \max\{0, x\} $$

The output of a ReLU unit is non-negative, full stop. If the final layer of the network of ReLU units, then the output must be non-negative. If the output is negative, then something has gone wrong: either there's a programming error, or the output layer is not a ReLU.

Suppose that the last layer is linear, and this linear layer takes ReLU outputs as its input. Clearly, linear layers have no constraints on their outputs, so the output of the linear layer could be positive, negative or neither.

score 0 · Answer 2 · answered Nov 13 '18 at 20:40

Your final outputs are contingent on the activation function in your output layer. If your network only contains relu activations including the output activation, then the outputs will be non-negative, that's correct. However, a model with relu activations in the hidden layers and another output (e.g. a linear function or tanh) can produce non-negative outputs.

How can a network with only ReLU nodes output negative values?

2 Answers2

Linked

Related