10

In SGD an epoch would be the full presentation of the training data, and then there would be N weight updates per epoch (if there are N data examples in the training set).

If we now do mini-batches instead, say in batches of 20. Does one epoch now consist of N/20 weight updates, or is an epoch 'lengthened' by 20 so that it contains the same number of weight updates?

I ask this as in a couple of papers learning seems to be too quick for the number of epochs stated.

James
  • 101
  • 1
  • 4
  • Possible duplicate of [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network) – Franck Dernoncourt Aug 16 '16 at 15:29
  • The question is more on convention, i.e. if someone states they have trained a network for 10 epochs using mini-batches of 20, does this mean there has been 10*N weight updates, or 10*N/20? – James Aug 16 '16 at 15:38
  • I see, sorry for the confusion, maybe http://stats.stackexchange.com/a/164875/12359 answers your question? – Franck Dernoncourt Aug 16 '16 at 15:47

2 Answers2

7

In the neural network terminology:

  • one epoch = one forward pass and one backward pass of all the training examples
  • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
  • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
2

Franck's answer is not correct. It takes some gut to say this because he has a lot more reps than me and many people already voted for it.

Epoch is a word that means a single pass through a training set, not all training examples.

So, yes. If we do mini-batches GD instead of a batch GD, say in batches of 20, One epoch now consist of N/20 weight updates. N is the total number of samples.

To be verbose, In a batch gradient descent, a single pass through the training allows you to take only one gradient descent step. With mini-batch (batch size = 5,000) gradient descent, a single pass through the training set, that is one epoch, allows you to take 5,000 gradient descent steps.

aerin
  • 744
  • 5
  • 14