When is Log-Cosh Loss used?

Question

I have seen people speaking of the Log-Cosh Loss that is 2 times differentiable and mimic the Mean Absolute Error goers 0. It is therefore useful for algorithm that need hessian.

Hence I'm not sure to understand why MAE is not sufficient since it is not differentiable only in one point and this point is when our prediction is perfect.

However my intuition is that this log cosh loss can be pretty cool near 0 to reduce parameters update by decreasing the gradient.

Am I understanding this all right? Does anyone knows any specific example where this loss is used?

MAE can lead to failures to converge because the optimizer continues to step over the optima provided the learning rate is large enough. — Demetri Pananos, May 03 '20 at 23:07

score 1 · Answer 1 · answered May 04 '20 at 02:26

One way to find where a topic has been deployed is a Google search. For example, if we search https://scholar.google.com/scholar?hl=en&as_sdt=0%2C47&q=log-cosh+loss&btnG=

we find a promising article

"Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder" Pengfei Chen, Guangyong Chen, Shengyu Zhang. The authors claim "We propose to train VAE with a new reconstruction loss, the log hyperbolic cosine (log-cosh) loss, which can significantly improve the performance of VAE and its variants in output quality, measured by sharpness and FID score."

When is Log-Cosh Loss used?

1 Answers1

Linked