I have two classifiers (linear1 and linearGP). LinearGP has a better accuracy but it's CE loss has higher values in comparison with CE values of linear1.
linearGP is learned by another loss. Data set is balanced. X axis represent samples during training process, at the end of the traning 30000 samples were passed through both models.
What is the reason?
I think that one model returns very high probabilities for it's prediction whereas the other one doesn't although it is better in it's predictions
I created a simulated jupyter notebook example: https://github.com/cherepanovic/omwtuss/blob/master/CE_Acc_sim.ipynb
Would you agree or do you have also other arguments?
Thanks a lot!