0

I am trying to use BERT generated embeddings in a simple linear model with relu and dropouts(0.3) in between two hidden layers of dimensions 256 and 128, respectively. For a binary classification task.

I know I could use the BERT as a classifier for my text but my idea is to further include other non-linguistic features in the model in following steps.

My initial model is reporting the following values for a 100 epochs of training. I am using binary crossentropy loss and adam optimizer.

enter image description here

It is somehow clear to me that the model starts to overfit around epoch 20, as the val_loss increases from that point on.

The validation loss never gets "near" the training_loss. Is that a sign that my problem is "too hard to solve" with the given architecture? Any hints on what can be done to make that val_loss drops closer to the train_loss?

Thanks

0 Answers0