7

I'm trying to use an api with a feedforward neural network for time series forecasting. For dense aggregate data it works fine, but for sparse data it sometimes forecasts negative values, even though my historical data has only positive values.

The source code is very dense, and I might be missing a line or two, but as far as I can tell, the input layers and the hidden layers are all ReLU nodes.

Assuming I am correct, how can a network with only ReLU layers lead to negative values, especially if none of the training data has negative values?

Skander H.
  • 10,602
  • 2
  • 33
  • 81

2 Answers2

15

Consider the definition of the ReLU:

$$ f(x) = \max\{0, x\} $$

The output of a ReLU unit is non-negative, full stop. If the final layer of the network of ReLU units, then the output must be non-negative. If the output is negative, then something has gone wrong: either there's a programming error, or the output layer is not a ReLU.

Suppose that the last layer is linear, and this linear layer takes ReLU outputs as its input. Clearly, linear layers have no constraints on their outputs, so the output of the linear layer could be positive, negative or neither.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
0

Your final outputs are contingent on the activation function in your output layer. If your network only contains relu activations including the output activation, then the outputs will be non-negative, that's correct. However, a model with relu activations in the hidden layers and another output (e.g. a linear function or tanh) can produce non-negative outputs.