0

I have a very large dataset, and I'm trying to find which variable(s) may describe the truth about a certain variable. I've considered just doing OLS on variables that make logical sense. But I've stumbled upon elastic net regression, and I'm wondering, is this a solution to p-hacking? I know it's used mostly for prediction, which is not my purpose, but does elastic net have any properties that would allow me to tailor it for descriptive statistics? i.e. make me be able to more decisively say, "This variable is more important for describing the truth about this other variable"? (I don't have too much of a background in causal inference, so if I'm thinking about this wrongly, then please let me know). Thank you so much!

Daycent
  • 5
  • 4
  • There is a related thread about the three types of lasso [here](https://stats.stackexchange.com/a/444801/7071). – dimitriy Jan 06 '21 at 01:30
  • Thank you for this! I was not aware of these STATA commands. This simplifies things a lot. In the STATA commands for this, it mostly uses lasso for selection of controls. Are you familiar with any lasso-based selection for the causal variables (not the controls)? – Daycent Jan 08 '21 at 18:41
  • I should probably rephrase my question to be "is elastic net/lasso" a solution to "multiple comparisons" – Daycent Jan 08 '21 at 18:45
  • It is not inherently causal, but the inference lassos allow for standard statistical inference, under a sparseness conditions (only a few variables matter). So it should help. However, I am not aware of any research that guarantee a particular family-wise error rate. But this is an area of active work and one that I have only started to delve into, so take that with a large grain of salt. – dimitriy Jan 08 '21 at 18:58
  • Thanks for the help! I'm very interested in seeing the results of this type of research in the future. – Daycent Jan 08 '21 at 19:03
  • @maxIRimp data dredging/p-hacking and multiple comparisons are different issues. If your main interest is in multiple comparisons, it would be best to edit the question to reflect this. BTW, lasso does not prevent p-hacking. In fact, extra care must be taken when using lasso, because the regularization parameter is typically selected using the data. If you don't compute p values in a special way that accounts for this, they'll be over-optimistic (i.e. you can unintentionally bias the p values if not careful). – user20160 Jan 08 '21 at 19:24
  • Hi @user20160, I thought multiple comparisons and p-hacking were very similar issues? (p hacking has an ethical-bent in the term, but it seems if everyone used multiple comparisons corrections, then p-hacking would be a moot point). I just found [link](https://stats.stackexchange.com/questions/410173/lasso-regression-p-values-and-coefficients), so thanks for that. Somehow I imagined that something that optimized out-of-sample prediction would also theoretically correct for something like multiple comparisons as well. – Daycent Jan 08 '21 at 21:15

0 Answers0