0

I have the following question about "Data Snooping" in Bayesian Models.

In Bayesian Models, priors are generally said to come from an independent source, such as domain experts on the data being used. Suppose you want to fit the following model to your data:

Y = b_0 + b_1*X1

You decide that you want to place a normal prior on b_0 ~ N(mu_1, sigma_1), a normal prior on b_1 ~ N(mu_2, sigma_2) and a log-normal prior on sigma ~N(mu_3, sigma_3).

Suppose you choose some values for mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3:

  • (Model 1) Choice 1: mu_1 = a1, sigma_1 = a2, mu_2 = a3, sigma_2 = a4, mu_3 = a5, sigma_3 = a6
  • (Model 2) Choice 2: mu_1 = b1, sigma_1 = b2, mu_2 = b3, sigma_2 = b4, mu_3 = b5, sigma_3 = b6

Based on a randomly selected 70% sample of the data, you then find the Bayesian estimates (e.g. MAP) of b_0, b_1 and sigma for Choice 1 and Choice 2.

You know want to evaluate the performance of both models on the other 30% of the data. Suppose you notice that the MSE of Model 1 is lower than the MSE of Model 2, indicating that Model 1 is better.

My Question: Since in this case you have effectively treated the "Bayesian Priors" as "hyperparameters" and selected priors based on the model performance - is this the equivalent of "data snooping"? Should Bayesian Priors always be selected "prior" to fitting the model, and the choice of these priors have no relation to the actual model performance?

Thanks!

stats_noob
  • 5,882
  • 1
  • 21
  • 42
  • 1
    In all of statistics the statistical model should be appropriate to the inferential task and the nature of the data. (I am here including the prior as part of the model, but it need not always be thought of in that way.) Some Bayesian approaches allow for validation or modification of the model during the analysis. That is often a good thing. Frequentist approaches often do not encourage model validation and modification. "Data snooping" is not necessarily a bad thing if it leads to the use of a better statistical model. – Michael Lew Dec 25 '21 at 20:30
  • 1
    I guess the answer to your final question about priors being selected before the analysis is this: not always. – Michael Lew Dec 25 '21 at 20:32
  • @ Michael Lew: thank you for your replies! These are the kind of insights I am looking for! – stats_noob Dec 25 '21 at 20:34
  • Over here, I posted some strategies for bayesian prior selection based on model performance: https://stats.stackexchange.com/questions/557971/is-this-considered-cheating-in-bayesian-modelling check it out! – stats_noob Dec 25 '21 at 20:36

0 Answers0