Neural Networks: Is an epoch in SGD the same as an epoch in mini-batch?

Question

In SGD an epoch would be the full presentation of the training data, and then there would be N weight updates per epoch (if there are N data examples in the training set).

If we now do mini-batches instead, say in batches of 20. Does one epoch now consist of N/20 weight updates, or is an epoch 'lengthened' by 20 so that it contains the same number of weight updates?

I ask this as in a couple of papers learning seems to be too quick for the number of epochs stated.

Possible duplicate of [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network) — Franck Dernoncourt, Aug 16 '16 at 15:29
The question is more on convention, i.e. if someone states they have trained a network for 10 epochs using mini-batches of 20, does this mean there has been 10*N weight updates, or 10*N/20? — James, Aug 16 '16 at 15:38
I see, sorry for the confusion, maybe http://stats.stackexchange.com/a/164875/12359 answers your question? — Franck Dernoncourt, Aug 16 '16 at 15:47

score 7 · Answer 1 · answered Aug 17 '16 at 14:56

In the neural network terminology:

one epoch = one forward pass and one backward pass of all the training examples
batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

score 2 · Answer 2 · answered Jul 10 '18 at 01:01

Franck's answer is not correct. It takes some gut to say this because he has a lot more reps than me and many people already voted for it.

Epoch is a word that means a single pass through a training set, not all training examples.

So, yes. If we do mini-batches GD instead of a batch GD, say in batches of 20, One epoch now consist of N/20 weight updates. N is the total number of samples.

To be verbose, In a batch gradient descent, a single pass through the training allows you to take only one gradient descent step. With mini-batch (batch size = 5,000) gradient descent, a single pass through the training set, that is one epoch, allows you to take 5,000 gradient descent steps.

Neural Networks: Is an epoch in SGD the same as an epoch in mini-batch?

2 Answers2