t-distributed/robust likelihood

Question

In this paper, the authors claim to be using a robust likelihood function:

The code for this paper is on github and is referred to as t_likelihood. Isn't this just a log Gaussian likelihood? What's robust about this function?

If you want to trace through the code:

The model described in this paper is here
The paper describes modelling the mean and variance of a distribution using two neural networks and is trained at approximately here

I'm not claiming to understand the paper in its entirety, but IIUC, the authors use the term "robust" when describing how they use a Bayesian hierarchical model with an inverse gamma prior. They use the HT algorithm to "unbias" the mini-batch estimator which they claim has better convergence properties than a full estimate being sparser due to less data in the sample. — Avraham, Dec 22 '21 at 13:59
I've re-read the paper and also the online supplementary material and I think this is the closest answer. By "robust" they sample variance from an inverse Gamma distribution (in the paper, even though the code seems to be using a regular gamma distribution). — stevew, Dec 23 '21 at 10:37

score 1 · Accepted Answer · answered Dec 23 '21 at 14:10

I'm not claiming to understand the paper in its entirety, but if I understand it correctly, the authors use the term "robust" when describing how they use a Bayesian hierarchical model with an inverse gamma prior on the sigma parameter of the Gaussian distribution. The resulting predictive distribution is a Student-t which is the common substitute for the normal when the variance is unknown.

The authors use the HT algorithm to "unbias" the mini-batch estimator which they claim has better convergence properties than a full estimate being sparser due to having less data to crunch in each of the individual samples.

score 0 · Answer 2 · answered Dec 22 '21 at 13:37

My intuition is that for many non-normal probability models the likelihood has the appearance of a normal likelihood with increasing sample size.

The justification in this setting may be similar to using a chi-square approximation for the sampling distribution of the likelihood ratio test statistic, or using a normal approximation for the sampling distribution of the Wald test statistic when the data generative process is non-normal. Here is a related thread on Wilk's theorem.

Thanks for your reply. Seems like they call it "robust" because they sample variance from an inverse Gamma distribution. — stevew, Dec 23 '21 at 10:38

t-distributed/robust likelihood

2 Answers2