I think I understand what batches serve for in neural network training, especially after reading this question. It has also clear correspondence in libraries like Keras:
model.fit(X_trains, Y_trains, epochs=20, batch_size=2048)
However, in one NLP project I am trying to refactor I encoutered an alternative approach:
BUCKET_SIZE = 2048
data= sorted(data, key=len)
buckets_count = len(data) // BUCKET_SIZE + 1
buckets_indices = list(range(buckets_count))
for epoch in range(20):
random.shuffle(buckets_indices)
for bucket_index in buckets_indices:
offset= bucket_index * BUCKET_SIZE
data_bucket = data[[offset:(offset+ BUCKET_SIZE)]]
X_trains, Y_trains = data2tensors(data_bucket)
model.fit(X_trains, Y_trains, epochs=1, batch_size=128)
You can see that there are both batches and buckets, were batch_size
< BUCKET_SIZE
. Also, in each epoch, buckets are provided to the network in a different (random) order.
- Why bucket could be used in addition to batch?
- Why buckets order is altered for each epoch?