8

Statistical Learning and its results are currently pervasive in Social Sciences. A couple of months ago, Guido Imbens said: "LASSO is the new OLS".

I studied Machine Learning a little bit, and I know that its main goal is prediction. I also agree with Leo Breiman's distinction between two cultures of statistics. So, from my point of view, causality is opposed to prediction to some extent.

Considering that sciences usually try to identify and understand causal relations, is machine learning useful for this goal? In particular, what are the advantages of LASSO for causal analysis?

Are there any researchers (and papers) addressing those questions?

  • 2
    Well, OLS will not produce estimates of causal effects very often, so if LASSO is to replace OLS, it does not have the "burden" of discovering causal relations. That said, have a look at this page for some recent research in econometrics on causal effects and sparse methods: http://www.mit.edu/~vchern/ – Christoph Hanck Feb 04 '16 at 05:03
  • 1
    For me the more natural distinction here would be that by Shmueli (["To Explain or to Predict"](http://www.jstor.org/stable/41058949), 2010) rather than Breiman's, but perhaps Breiman's distinction is also fine. – Richard Hardy Feb 04 '16 at 07:30
  • @ChristophHanck . Well, you're right. But the point is: OLS has been employed for estimating causal effects a lot. For example, 'Mostly Harmless Econometrics' address several subjects related to this. Therefore, if it is possible with OLS, why not with LASSO? Anyway, Thank you for the reference. – Guilherme Duarte Feb 04 '16 at 11:00
  • @RichardHardy You're completely right. I know this paper. I just mentioned Breiman, because I thought it would be easier to explain. – Guilherme Duarte Feb 04 '16 at 11:01
  • 2
    I don't disagree there: in cases in which OLS can be used to estimate casual effects, I do not see why lasso should not also be applicable – Christoph Hanck Feb 04 '16 at 11:06

1 Answers1

2

I don't know all of them, I'm sure, so I hope no one will mind if we do this wiki-style.

One important one though is that the LASSO is biased (source, Wasserman in lecture, sorry), which while acceptable in prediction, is a problem in causal inference. If you want causality, you probably want it for Science, so you're not just trying to estimate the most useful parameters (which happen strangely to predict well), you're trying to estimate the TRUE(!) parameters.

one_observation
  • 1,500
  • 11
  • 15
  • Good answer! Actually if you have bias, it's a big deal for causal estimates. But maybe LASSO could be employed preliminarily in a more complete procedure to assess causality. – Guilherme Duarte Feb 04 '16 at 17:00
  • Perhaps! That's why I'm eager to have other people chime in. – one_observation Feb 04 '16 at 17:02
  • 1
    @GuilhermeDuarte, It is the overall error that matters, not bias. Under square loss we care about MSE, and that equals Bias$^2$ + Variance. Lasso may deliver a good tradeoff with relatively small MSE despite some bias and as such should be more useful for causal analysis than unbiased estimation with high MSE. The real problem with lasso is that it is difficult to get confidence intervals for it; currently that is an active research area. – Richard Hardy Mar 13 '17 at 19:17
  • @RichardHardy sorry, you mean when we care about causality, we shouldn't be concerned about bias, but with the MSE? This is not entirely clear to me – Guilherme Duarte Mar 14 '17 at 13:20
  • 1
    @GuilhermeDuarte, just as in prediction, in causality we need precise estimates of model coefficients. Precision can be measured in terms of absolute error, squared error, etc., but not bias. For example, you can have low bias and high estimation error at the same time. So looking at bias you would think you are doing fine, but that would be misleading as the estimation error (absolute, squared or whichever) is high. It is estimation error, not bias that matters when you consider effect sizes, statistical significance etc. in causal inference. – Richard Hardy Mar 14 '17 at 13:43
  • @RichardHardy Sorry, Richad, I've been trained to think causal models as the way of assessing effects of a variable "x" to another "y". Even though we sometimes have low r2 or high squared error, I was taught to look at the unbiased coefficient because it'd represent the real causal effect. Could you provide any references? – Guilherme Duarte Mar 14 '17 at 16:37
  • This is something so basic that it is difficult to find references for. I thought the way you are thinking until Glen_b said to me what I said to you, and it has made perfect sense to me ever since :) I think the best way to understand this is through counterexamples showing that unbiased estimator with high variance works worse than biased estimator with low variance as long as the MSE of the latter is below the MSE of the former. And it does not really matter whether we are after causal or predictive modelling. – Richard Hardy Mar 14 '17 at 16:48
  • @RichardHardy I reopened the question. If you like, you could write a more complete answer and I'll mark your answer as correct. – Guilherme Duarte Mar 14 '17 at 17:53
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/55358/discussion-between-guilherme-duarte-and-richard-hardy). – Guilherme Duarte Mar 14 '17 at 17:58
  • My doubts are strongly related to GuilhermeDuarte’s For example, @RichardHardy write: “just as in prediction, in causality we need precise estimates of model coefficients.” I strongly disagree with this sentence. In prediction we are not focused on coefficients, we can also completely ignore them. Exactly for this reason, for example, we can compare the performance in prediction between regression and neural network models. Only predicted values per se matters. – markowitz Feb 06 '20 at 11:33
  • @RichardHardy, this is not the place for long discussion but if you would to defend your position above let me known. If so I open a new question. – markowitz Feb 06 '20 at 11:34
  • @markowitz, for any given model, imprecise estimation of coefficients will yield large prediction errors. Yes, in prediction we can ignore the point estimates of parameters, but if the estimates are off, the prediction will be off. In that sense we want precise estimates. Some confusion in this discussion might be coming from possibly different definitions of the true parameters (structural/causal vs. reduced-form) -- we might be talking of different things. See e.g. https://stats.stackexchange.com/questions/265739/. Opening a new question is not a bad idea. Please notify me if you do, thanks. – Richard Hardy Feb 06 '20 at 11:54
  • So clarified sound me much better. If I will open another related question I notify you. – markowitz Feb 06 '20 at 15:33
  • @markowitz, I am not sure myself anymore. There might be a confusion of structural/causal vs. reduced-form parameters in my argumentation. I would have to think deeper about it before concluding. – Richard Hardy Feb 07 '20 at 13:33