Is there a reasonable way to impose a prior within a likelihood-based model?

Question

I have been using GAM (mgcv's gam()) to perform a fairly complex and computationally intensive analysis - I have millions of observations and dozens of terms including 2-D tensor product splines. It gets excellent results for the most part, but some predictions differ an excessive amount from prior expectations. (I have a specific model whose predictions can be used as a prior, and the deviations are being measured relative to this "respected baseline" model.)

This could be taken as a cue to use a more Bayesian approach, but the modeling already takes many hours to complete, bam() doesn't help, and I would like to avoid switching to a method like MCMC which I assume is much more computationally intensive. Instead, I'm wondering if it's possible to impose a prior within the existing fitting framework, or something with similar computational demands to what I am now using.

For example, drawing on the interpretation of conjugate priors as adding pseudo-observations, could it be reasonable to add some amount of data that matches my prior, and fit to the expanded dataset? If not, what other possibilities might I consider?

Typically [MAP](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) estimation is a reasonable middle ground between pure MLE and estimating a full Bayesian posterior. I am not familiar with GAM. Typically MAP would be incorporated via a penalty term, but if this is not possible, you could add "prior data" sampled from your alternate model to force the MLE to consider it. — GeoMatt22, Oct 07 '16 at 21:41
@GeoMatt22 I was thinking along the same lines. I can't find a way to get a penalty term into gam so I guess the extra data approach is best I can do. — Paul, Oct 07 '16 at 21:53
If the "GAM interface" allows for weighted data, you could use that to regulate the "penalty" strength rather than needing to replicate points at least. You could also add selective prior data points, focusing on the "some predictions differ an excessive amount from prior expectations" areas (similar to a robust loss function, i.e. switch on to downweight outliers). — GeoMatt22, Oct 07 '16 at 23:10
@GeoMatt22 I thought about this some more and things keep troubling me. I can't seem to figure out how many pseudo-observations to add. The question feels ill-posed; the answer seems to depend on how much data I have and how I bin that data. I've never seen a Bayesian model, even an empirical or approximate one, in which the penalty depends on the model matrix - it's supposed to be "prior" to that, right? — Paul, Oct 13 '16 at 17:51
A simple analogue would be [recursive least squares](http://stats.stackexchange.com/questions/66950/intuition-for-recursive-least-squares), which is an "online" algorithm for doing linear least squares by incrementally updating the estimates while processing a data stream. This is equivalent to a simple form of [recursive Bayesian estimation](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation), where the prior and posterior are both Gaussian. However as the results are the same as "batch" least squares, the "prior" is equivalent to an "augmented data matrix". Does this help? — GeoMatt22, Oct 13 '16 at 18:05

Is there a reasonable way to impose a prior within a likelihood-based model?

0 Answers0

Linked