0

I'll start by saying that I'm a software engineer, and while I took statistics courses, I'm far of an expert in it.

My job is to essentially build software for data scientists in my team to help them with their research.

One of the data scientists recently asked me for a tool that will do the following:

  1. Given $K$ features (large number), run a regression of them on $Y$.
  2. Compute the $t$ stat and $p$ value of each of these features.
  3. If any feature has $p$ value greater than $0.05$, remove it from the set of features.
  4. Rerun this procedure until the $p$ value is $\leq 0.05$ for all features. Report this model.

Now I can easily build this, but honestly, I am questioning the soundness of this technique.

I recall my stat professor mentioning that you can't compare the $t$ value in multiple regression across multiple features and you should only use it to test the significance of one variable, not a group of variables. I also researched a bit online and it seems like the most common way to selecting features is either 1) Forward selection 2) Backward Selection and 3) Stable selection, none of which use the $p$ value as a basis of removing variables.

Any idea if I am making a fuss out of nothing?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
AspiringMat
  • 131
  • 4
  • See answers here: https://stats.stackexchange.com/questions/133920/should-i-remove-non-significant-variables-from-my-regression-model – ofer-a Oct 20 '21 at 09:45
  • **NO**, it does not make much sense! – kjetil b halvorsen Oct 20 '21 at 18:58
  • 2
    Does this answer your question? [Algorithms for automatic model selection](https://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection) That might be the most widely read thread on this subject on this site. – EdM Oct 20 '21 at 19:14
  • The notorious accepted answer with a -57 rating... @EdM – Dave Oct 20 '21 at 19:21
  • @Dave I had more in mind the bountied answer currently at +387. And several other helpful ones. This site does, after all, allow those who pose questions to answer and accept answers, whether an answer makes sense or not. – EdM Oct 20 '21 at 19:25
  • It just cracks me up every time I see the accepted answer greyed out with a -57 rating. – Dave Oct 20 '21 at 19:29

0 Answers0