Many science fields have shifted the focus from a more theoretical study of statistics to a more data-based focus. How about econometrics?

Question

In the last decade, science fields, which depend on an empirical approach, have shifted the focus from a more theoretical study of statistics to a more data-based focus provided by machine learning or statistical learning theory (automatic procedures to choose the variables, cross-validation procedures...). Simple examples can come from physics, engineering and also statistics. Econometrics does not seem to have followed this phenomenon. That's true? Why?

PS:

This question was motivated by @MatthewDrury.

It was based in an interesting discussion (in the comments) with @RichardHardy about an answer to this question.

This might be on topic on an economics site. It doesn't seem to be about statistics or machine learning *per se,* nor does it appear sufficiently focused on solving an actual problem, as required of all questions here. — whuber, Jan 01 '22 at 21:04

score 1 · Answer 1 · answered Jan 01 '22 at 12:04

Great questions. Not sure if they are exclusive. See a nice view on Econometrics and Machine Learning, describing overlaps. Probably, why data-centric approach had a less hyped impact in Econometrics appears to be causal inference. Traditionally, econometric models embraced causal modelling. See a recent exposition from Hünermund-Bareinboim, Causal Inference and Data Fusion in Econometrics.

score 0 · Answer 2 · answered Feb 26 '20 at 03:37

Short answer:

The "classical" focus of econometrics is on Statistical Inference.

Why is that?

1) Because economists want to test their theories instead of building theories based on data or to make out of sample forecast. They actually use economic theory to choose what to include in the model.

2) Because economists, in several situations, are interested in the coeffcients and not in the predictable variable. For intance, imagine a work that tries to explain corruption-level using a regression model such as: $corruptionLevel = \beta_0 + \beta_1 yearsInPrison + beta_2 number convicted + \cdots$

Note that the coefficients $\beta_1$ and $\beta_2$ provide information to guide the public policy. Depending on the values of the coefficients, different public policies will be carried out. So, they cannot be biased.

Usually, the recent approaches of data science may accept bias in some extent in order to improve the forecast (reducing the variance and also the possibility of overfitting). For instance, people use LASSO, Ridge and so on.

In the end:

Machine practitioners pay with bias to receive in return less variance and the possibility of overfitting. From the classical econometrics perspective, this does not seem to work.

Long answer:

1) In practice, most people, that have run a simple econometric model, have in some extent run a bizarre procedure of removing non-significant variables from the regression based on t-values. So, in the end, these regressions may be suffering from the omitted variable bias. If the omitted variables are correlated with the regressors, bias will arise.

2) If the idea is that we should trust in the coefficients of the econometric regression model and we are working with high dimensional databases, maybe we may accept to pay with some bias to receive in return lower variance: “Bias-variance tradeoff holds not only for forecasts (which in the case of a linear model are simply linear combinatons of the estimated coefficients) but also for individual coefficients. One can estimate individual coefficients more accurately (in terms of expected squared error) by introducing bias so as to cut variance. So in that sense biased estimators can be desirable. Remember: we aim at finding the true value. Unbiasedness does not help if variance is large and our estimates lie far away from the true value on average across repeated samples.” - @Richard_Hardy

3) Points (1) and (2) have motivated researchers to look for solutions that sound good for economists as well. Recent literature has approached this problem by choosing focus variables that are not penalized. These focus variables are the ones that are important to guide public policy. In order to avoid the omitted variables bias, we also run a regression of this focus variables on all the other independent variables using a shrinking procedure (such as Lasso). The ones with coefficients different from zero are also included in the regression model as well. They ensure that asymptotic this procedure is good.

To finish:

Two other points may delay to turn economics in a more data based science.

1) Recall that economics is an Applied Social Science and these new computer techniques are not trivial for most economists.

2) Economics is a very conservative science since most models cannot be tested. For instance, suppose that you have a model such as $Inflation = \beta_0 + \beta_1 interestRate + \cdots \; (Eq1)$. We cannot play with interest rate to generate values for the Inflation. In this case, we can only use the small sample of data that available in the Central Bank homepage. Furthermore, this data also presents endogeneity. While in the model above, interestRate obvious affects inflation. We may also have another model $InterestRate = \gamma_0 + \gamma_1 inflation + \cdots \; (Eq2)$. Note that $\gamma_1<0$ and $\beta_1>0$ has different signs, but we have only one data.

(Eq1) means that if the interest rate is high, the inflation is lower because many people buy since the cost of money is high.

(Eq1) means that if the inflation is high, a central bank member may choose to increase the interest rate in order to in the next step the inflation falls.

Further reading:

H. R. Varian (2014) “Big data: New tricks for econometrics.” The Journal of Economic Perspectives, 28 (2):3-27.

S. Mullainathan and J. Spiess (2017) “Machine learning: an applied econometric approach” Journal of Economic Perspectives, 31(2):87-106

A. Belloni, V. Chernozhukov, and C. Hansen (2014) “High-dimensional methods and inference on structural and treatment effects.” The Journal of Economic Perspectives, 28(2):29-50.

S. Athey, and G. Imbens (2017) “The State of Applied Econometrics: Causality and Policy Evaluation,” Journal of Economic Perspectives, vol 31(2):3-32.

A. Goldberg. In defence of forensic social science. Big data and Society, 2015.

D. A. McFarland e K. Lewis. Sociology in the era of big data: the ascent of forensic social science. American Sociology, 2015.

Many science fields have shifted the focus from a more theoretical study of statistics to a more data-based focus. How about econometrics?

2 Answers2

Linked