0

When performing stochastic gradient descent, it is necessary for the training loss to decrease

a) between iterations in an epoch? (I think the answer is no)

b) between epochs? (I think the answer is yes)

The training loss is always defined using the entire dataset.

ved
  • 962
  • 5
  • 15

1 Answers1

2

In both cases, it is not guaranteed because you're not using the full batch, i.e. stochastic gradient descent only approximates the gradient to be found by the full batch. However, between epochs (b), it's much more likely to decrease compared to between iterations (a), because the entire training dataset have been inputted to the network and weights are updated several times towards a local minimum.

gunes
  • 49,700
  • 3
  • 39
  • 75