0

Suppose you have some data that corresponds to a single predictor variable and a single response variable. You are interested in fitting a regression model to this data : Y = B_0 + B_1 * X

If you wanted to perform Bayesian Linear Regression, this would require you to place priors on "B_0", "B_1" and "Sigma":

  • Prior on "B_0" = Normal(mu_1, sigma_1)
  • Prior on "B_1" = Normal(mu_2, sigma_2)
  • Prior on "Sigma" = Normal(mu_3, sigma_3)

These priors can then be multiplied by the joint likelihood, and the MAP estimates of "B_0", "B_1" and "Sigma" can be calculated (e.g. either via theoretical conjugacy properties or MCMC sampling).

My Question: If you decide to "strategically" select many values of the above priors (i.e. the 6 parameters: mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3) until you identified the combination that resulted in the smallest cross validation error of the regression model - would this be considered as "cheating"?

Here are three methods I thought of for doing this:

Method 1: Grid Search for Prior Selection

Using the R programming language, a 6-Dimensional Grid can be made that contains different possible values of mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3

#warning: do not run (the below grid is massive)

grid = expand.grid(mean_b0 = seq(0, 5, 0.25), sigma_b0 = seq(0, 5, 0.25), mean_b1 = seq(0, 5, 0.25), sigma_b1 = seq(0, 5, 0.25), mean_sigma = seq(0, 5, 0.25), sigma_simga = seq(0, 5, 0.25))

  mean_b0 sigma_b0 mean_b1 sigma_b1 mean_sigma sigma_simga
1     0.0        0       0        0          0           0
2     0.5        0       0        0          0           0
3     1.0        0       0        0          0           0
4     1.5        0       0        0          0           0
5     2.0        0       0        0          0           0
6     2.5        0       0        0          0           0

                                 ....

        mean_b0 sigma_b0 mean_b1 sigma_b1 mean_sigma sigma_simga
1771556     2.5        5       5        5          5           5
1771557     3.0        5       5        5          5           5
1771558     3.5        5       5        5          5           5
1771559     4.0        5       5        5          5           5
1771560     4.5        5       5        5          5           5
1771561     5.0        5       5        5          5           5

From here, a function can be written that takes the values from each row within this grid, uses them as a Bayesian Prior, calculates the estimates for "B_0", "B_1" and "Sigma", and records the overall model performance (i.e. cross validation error). Below, I show how this would look like for a single instance:

#load libraries
library(rstanarm)
library(see)
library(bayestestR)
library(performance)

#specify priors
my_prior <- normal(location = c(grid[1,1],  grid[1,3] , grid[1,5] ), scale = c( grid[1,2] , grid[1,4],  grid[1,6]))

#run model
model <- stan_glm(response_var ~ predictor_var, data=my_data, prior = my_prior, seed=111)

#record parameter estimates

estimates <- summary(model)

#record error
performance performance(model)

Of course, this would be a extremely computationally approach and might not be possible for high dimensional data.

Method 2: Evolutionary Algorithms for Prior Selection

A less computationally expensive and exhaustive search for prior selection can be done using Evolutionary Algorithms (e.g. Genetic Algorithm). A "fitness function" can be created in which the inputs are "mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3" and the output (i.e. "fitness value") could be the cross validation error corresponding to the regression parameter estimates corresponding to a given selection of "mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3" (this can be even treated as a multi-objective optimization problem in which the training error and test error are both considered).

The GA Library in R (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) can be used for this:

# Define a function that performs Bayesian Regression which takes inputs as  "mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3" and outputs "cross validation error":

fitness <- function(mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3) {

#define function here

return(cross_validation_error)

}

#specify appropriate ranges of the function inputs, mutation parameters, and iteration rounds (and optimization constraints if necessary)

GA <- ga(type = "real-valued", 
         fitness = function(x)  fitness(x[1], x[2], x[3], x[4], x[5], x[6]),
         lower = c(0, 0, 0, 0, 0,0), upper = c(10, 10, 10, 10, 10,10), 
         popSize = 50, maxiter = 1000, run = 100)

Although Evolutionary Algorithms do not have any theoretical guarantees of converging to the maximum point, they still have the ability to explore large regions within the feasible set at a cheaper cost compared to grid search.

Method 3: Standard Optimization Algorithms

Although I am not sure about this, but perhaps this problem can be formulated as a standard optimization problem and the values of "mu_1, sigma_1, mu_2, sigma_2, mu_3, sigma_3" can be decided using Newton-Raphson or Gradient Descent.

Thus, to reiterate my question: Regardless of which of the above 3 methods are chosen - is choosing priors based on optimal model performance considered as "cheating" within the framework of Bayesian Modelling?

Thanks!

Note: Additional inputs can be added to the "fitness function" and additional columns can be added to the "grid" if the user is interested in choosing priors based on the multivariate normal distribution containing a specific type of variance-covariance matrix.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
stats_noob
  • 5,882
  • 1
  • 21
  • 42
  • What are the definitions of "strategically select" and "cheating" in this context? – msuzen Dec 22 '21 at 05:52
  • 1
    @ Mehmet Süzen: Thank you for your reply! Usually, priors are selected independent to the data and model. What I meant by "strategically select" and "cheating" was using priors that increase model performance. – stats_noob Dec 22 '21 at 05:53
  • Naively thinking, this sounds like a prior selection problem https://stats.stackexchange.com/questions/78606/how-to-choose-prior-in-bayesian-parameter-estimation – msuzen Dec 22 '21 at 05:58

0 Answers0