Using the MLE to select the prior distribution...empirical Bayes?

Question

It was requested that I read the following article for work: https://support.sas.com/resources/papers/proceedings15/1400-2015.pdf

In Case II, the author starts by doing two things:

First, he computes the maximum likelihood estimator for the PD parameter $\lambda$, denoted $\hat \lambda$. Second, he chooses the prior $p(\lambda)$ so that $E_{\lambda}[\lambda]=\hat\lambda$.

I am not an expert with Bayesian inference, but my understanding tells me that this is totally contrary to the philosophy of Bayesian inference. We are working with a very small data set, and so there is very little information contained in the data. By using the data to construct the prior, we are essentially building a posterior distribution by incorporating the information in the data with itself. I understand that there is a method called "empirical bayes", but from what I understand, this involves computing the MLE from the marginal distribution of the data $x$, not from the conditional distribution $p(x|\lambda)$. In other words, if we have subgroups within the data, I understand Empirical Bayes to be when we use data from all subgroups to build a prior regarding a particular subgroup. In the above article I cited, only the data from a particular subgroup is used to build the prior for that subgroup.

Can someone tell me if this is common practice in Bayesian stats? I have never seen anyone do this, and I would like to sound more informed if I tell my boss that the methodology is flawed.

This is very low-key empirical Bayes indeed. – Xi'an Sep 10 '20 at 18:31 — Xi'an, Sep 10 '20 at 18:31

Tim · Accepted Answer · 2020-09-10T12:54:40.560

6

This is called empirical Bayesian approach. Here you can find nice introductory blog post on this method or the An Introduction to Empirical Bayes Data Analysis paper by George Casella (1985). You are right with considering this approach to be inconsistent with proper Bayesian approach, since priors should not depend on the data. This is a little bit of cheating and can lead to overconfident, hence misleading, results. On another hand, some argue that "it works, so why not use it?". On yet another hand, modern Bayesians are often less purist, e.g. three prominent authors Andrew Gelman, Daniel Simpson, Michael Betancourt in their paper The prior can often only be understood in the context of the likelihood notice that we often do consider the data when choosing priors, so it is not that black-and-white as it could appear.

edited Sep 10 '20 at 12:54

answered Sep 10 '20 at 12:42

Tim

108,699
20
212
390

But I thought that empirical Bayes is when you compute the MLE from the marginal distribution of the data, not the conditional distribution...I believe that the blog post which you cited says this. – DavidSilverberg Sep 10 '20 at 12:44
1

@Dion no, it's about estimating parameters for the priors from the data. Depending on the particular problem, those may be different things. – Tim Sep 10 '20 at 12:51
1

I think I have seen the estimation of prior parameters from the data described as a kind of poor man's hierarchical model (perhaps it was in the Gelman &al BDA3 book, and they probably used different words). There also (in the hierarchical model) the prior parameters are influenced by the data, but variation in these estimates is permitted. – einar Sep 10 '20 at 12:57
But even in the blog post you mention, it's not exactly what I'm describing. In the blogpost, the goal is to estimate the average of a particular person using data about the particular person, and the prior is based on the MLE derived from the averages of all people. As I thought I indicated in the OP (maybe not well), I understand that empirical bayes may use data regarding all "groups" to make a prior about a particular group. The article that I cite uses data only regarding the subgroup to make a prior for that subgroup. – DavidSilverberg Sep 10 '20 at 13:04
@Tim i edited my question to be a little more clear regarding my understanding of Empirical Bayes. – DavidSilverberg Sep 10 '20 at 13:10
@Dion the blog post obviously does not cover every case of empirical Bayesian approach. As said above, depending on a problem it can depend on what would be the exact procedure to apply it, but the general idea is to estimate the priors from the data. Is it common? I'd say no, with modern-day computational power you can just use proper Bayesian approach. Yet, sometimes people do use some variation of it, as mentioned above. – Tim Sep 10 '20 at 13:18

Using the MLE to select the prior distribution...empirical Bayes?

1 Answers1