0

I think I understand what batches serve for in neural network training, especially after reading this question. It has also clear correspondence in libraries like Keras:

model.fit(X_trains, Y_trains, epochs=20, batch_size=2048)

However, in one NLP project I am trying to refactor I encoutered an alternative approach:

BUCKET_SIZE = 2048
data= sorted(data, key=len)
buckets_count = len(data) // BUCKET_SIZE + 1
buckets_indices = list(range(buckets_count))
for epoch in range(20):
    random.shuffle(buckets_indices)
    for bucket_index in buckets_indices:
        offset= bucket_index * BUCKET_SIZE  
        data_bucket = data[[offset:(offset+ BUCKET_SIZE)]]
        X_trains, Y_trains = data2tensors(data_bucket)
        model.fit(X_trains, Y_trains, epochs=1, batch_size=128)

You can see that there are both batches and buckets, were batch_size < BUCKET_SIZE. Also, in each epoch, buckets are provided to the network in a different (random) order.

  1. Why bucket could be used in addition to batch?
  2. Why buckets order is altered for each epoch?
dzieciou
  • 171
  • 5
  • 1
    Since this is an NLP setting, this appears to be a trick to preserve local ordering (because word order is important in a sentence, or a paragraph) but ignore whatever ordering was used to make `data`, which may not be usefully ordered (perhaps because it just concatenates a bunch of tweets or wikipedia articles or whatever). Without asking the author why they did this, there's probably no definitive answer, because this method seems explicitly linked to the particular data set under analysis. – Sycorax Aug 27 '19 at 16:30
  • @Sycorax I didn't know that by default `model.fit()` shuffles dataset before splitting it into batches. Now I can see that before splitting dataset (sentences) into buckets, they are ordered by length (number of words in a sentence). In other words, the author wanted to group sentences by length. Shuffling whole dataset would destroy those groups. – dzieciou Aug 28 '19 at 06:57

0 Answers0