What is the effect of changing the weight decay and warm-up steps in fine-tuning PEGASUS?

Question

I am fine-tuning PEGASUS model using this script. I am currently using the SAMSum dataset and I have reached a point in which the output doesn't get better.

Examples:

The Actual Summary

Alexis and Carter met tonight. Carter would like to meet again, but Alexis is busy.

The Best Output Summary (based on human evaluation)

'Carter and Alexis are up for it.'

The Second Best Output Summary (based on human evaluation)

['Carter and Alexis are up for it, I want to see some tomorrow. But']

As seen above the summaries don't share the same meaning so I would like to know if changing either the weight decay or the warm-up steps would help achieve better results or not? and if so would it be better to increase or decrease the values of the values of the weight decay or the warm-up steps?

NOTES:

I am using batch size 1 as I am using colab pro and the mazimum GPU size is 16280MB an so using a larger batch size doesn't permit using the whole dataset size and this leads to worse results. Also the current warm-up steps are 500 and I am having a total of 4000 steps in 2000 epochs with weight decay of 0.01
I have already used different combinations and sizes for the training/validation/testing. the default was 90/5/5 but I tried 90/10/0, 70/15/15, 70/30/0
The Best Output is always produced around the 500 steps and the Second Best Output is produced at 2500 steps in the combinations of 90/10/0, 70/15/15 and 70/30/0
Any further tips to enhance the output would be much appreciated and thank you in advance!

What is the effect of changing the weight decay and warm-up steps in fine-tuning PEGASUS?

0 Answers0