1

In this paper, the authors claim to be using a robust likelihood function:

Excerpt from paper

The code for this paper is on github and is referred to as t_likelihood. Isn't this just a log Gaussian likelihood? What's robust about this function?

If you want to trace through the code:

  1. The model described in this paper is here
  2. The paper describes modelling the mean and variance of a distribution using two neural networks and is trained at approximately here
stevew
  • 749
  • 3
  • 12
  • I'm not claiming to understand the paper in its entirety, but IIUC, the authors use the term "robust" when describing how they use a Bayesian hierarchical model with an inverse gamma prior. They use the HT algorithm to "unbias" the mini-batch estimator which they claim has better convergence properties than a full estimate being sparser due to less data in the sample. – Avraham Dec 22 '21 at 13:59
  • 1
    I've re-read the paper and also the online supplementary material and I think this is the closest answer. By "robust" they sample variance from an inverse Gamma distribution (in the paper, even though the code seems to be using a regular gamma distribution). – stevew Dec 23 '21 at 10:37
  • So I'll turn it into an answer 8-) – Avraham Dec 23 '21 at 14:06

2 Answers2

1

I'm not claiming to understand the paper in its entirety, but if I understand it correctly, the authors use the term "robust" when describing how they use a Bayesian hierarchical model with an inverse gamma prior on the sigma parameter of the Gaussian distribution. The resulting predictive distribution is a Student-t which is the common substitute for the normal when the variance is unknown.

The authors use the HT algorithm to "unbias" the mini-batch estimator which they claim has better convergence properties than a full estimate being sparser due to having less data to crunch in each of the individual samples.

Avraham
  • 3,182
  • 21
  • 40
0

My intuition is that for many non-normal probability models the likelihood has the appearance of a normal likelihood with increasing sample size.

The justification in this setting may be similar to using a chi-square approximation for the sampling distribution of the likelihood ratio test statistic, or using a normal approximation for the sampling distribution of the Wald test statistic when the data generative process is non-normal. Here is a related thread on Wilk's theorem.

Geoffrey Johnson
  • 2,460
  • 3
  • 12
  • Thanks for your reply. Seems like they call it "robust" because they sample variance from an inverse Gamma distribution. – stevew Dec 23 '21 at 10:38