Is this considered a bayesian approach and if now how can I make it into one?

Question

So I came up with this analysis scheme wherein in use a nested cross validation loop, logistic regression with an l1 penalty in the inner loop and then logistic regression w/o the penalty in the outer loop. To break it down:

Step 1. Split data for outer loop into train and test.

Step 2. Using the training data only use logistic regression with cross validation and an l1 penalty.

Step 3. Take the overlap(intersection) of the variables(features) whose coefficients did not shrink to zero over all cross validation folds in step 2.

Step 4. Train a new Logistic Regression model using only the features that survived across all overlaps in step 3 to train a model on the training data, and then perform prediction on the unseen testing data.

Step 5. Repeat until all data has been used for testing in the outer cross validation loop.

I have a couple of questions, I have just started learning about Bayesian analysis and one take away that I've gotten so far is that it involves a reallocation of credibility. Is this analysis considered "Bayesian" because in a sense I am reallocating the credibility by which some features are likely to give better prediction accuracy by using the l1 penalty with logistic regression in the inner loop ? If not, how could I make it Bayesian ?

The second question is, using this method I end up with far better prediction accuracy on the test data than I would just using a nested cross validation scheme and logistic regression (without removing features). Is there any double dipping going on here ? Do you see any statistically unsound uses ?

score 1 · Accepted Answer · answered Feb 23 '17 at 11:20

This is not Bayesian. You don't have a prior, likelihood function or posterior. You aren't even applying Bayes theorem.
If your new approach results in much better performance, then you've made a better model! I don't see any statistical problem with your approach, you're really just measuring the importance of variables.

score 0 · Answer 2 · edited Apr 13 '17 at 12:44

The procedure you are describing is a feature selection procedure (see other threads tagged as feature-selection) and has nothing to do with Bayesian approach.

Bayesian approach is not about "reallocating of credibility". The main points about Bayesian approach are

explicitly defining models in probabilistic terms,
treating parameters as random variables, and
using priors to include your out-of-data knowledge about parameters into the model (including so-called "uninformative" priors).

I can't see any of them in your description.

See also the Who Are The Bayesians? and What exactly is a Bayesian model? threads.

Is this considered a bayesian approach and if now how can I make it into one?

2 Answers2