It was requested that I read the following article for work: https://support.sas.com/resources/papers/proceedings15/1400-2015.pdf
In Case II, the author starts by doing two things:
First, he computes the maximum likelihood estimator for the PD parameter $\lambda$, denoted $\hat \lambda$. Second, he chooses the prior $p(\lambda)$ so that $E_{\lambda}[\lambda]=\hat\lambda$.
I am not an expert with Bayesian inference, but my understanding tells me that this is totally contrary to the philosophy of Bayesian inference. We are working with a very small data set, and so there is very little information contained in the data. By using the data to construct the prior, we are essentially building a posterior distribution by incorporating the information in the data with itself. I understand that there is a method called "empirical bayes", but from what I understand, this involves computing the MLE from the marginal distribution of the data $x$, not from the conditional distribution $p(x|\lambda)$. In other words, if we have subgroups within the data, I understand Empirical Bayes to be when we use data from all subgroups to build a prior regarding a particular subgroup. In the above article I cited, only the data from a particular subgroup is used to build the prior for that subgroup.
Can someone tell me if this is common practice in Bayesian stats? I have never seen anyone do this, and I would like to sound more informed if I tell my boss that the methodology is flawed.