Questions tagged [endogeneity]

Endogeneity refers to a situation where an explanatory variable in a model is correlated with the error term. Endogeneity induces biased parameter estimates. This is an important problem when working with observational data and the goal is causal inference.

If two variables, $X_1~\&~X_2$, are correlated with each other and both are entered into a model, the result will be some amount of collinearity, if one is left out of the model, and it actually does have an effect on the response, the result will be endogeneity. The effect of the omitted variable will be attributed to the variable that is in the model by virtue of their correlation, causing bias. Endogeneity can also arise from other sources, such as measurement error in $X$.

When working with observational data excluding the possibility of endogeneity is rarely credible. Thus, it is a substantial barrier to drawing causal conclusions (although it is not a problem for making predictions or assessing marginal relationships).

A number of techniques (other than experiments) have been developed to help address this problem. They include , , regression discontinuity designs, difference in differences (a.k.a., ), quasi-experimental designs, natural experiments, etc.

274 questions
33
votes
3 answers

When to use fixed effects vs using cluster SEs?

Suppose you have a single cross-section of data where individuals are located within groups (e.g. students within schools) and you wish to estimate a model of the form Y_i = a + B*X_i where X is a vector of individual level characteristics and a a…
20
votes
3 answers

Two stage models: Difference between Heckman models (to deal with sample selection) and Instrumental variables (to deal with endogenity)

I am trying to get my head around the difference between sample selection and endogeneity and in turn how Heckman models (to deal with sample selection) differ from instrumental variable regressions (to deal with endogeneity). Is it correct to say…
kyrenia
  • 635
  • 1
  • 9
  • 15
16
votes
3 answers

Estimating $b_1 x_1+b_2 x_2$ instead of $b_1 x_1+b_2 x_2+b_3x_3$

I have a theoretical economic model which is as follows, $$ y = a + b_1x_1 + b_2x_2 + b_3x_3 + u $$ So theory says that there are $x_1$, $x_2$ and $x_3$ factors to estimate $y$. Now I have the real data and I need to estimate $b_1$, $b_2$, $b_3$.…
renathy
  • 467
  • 1
  • 6
  • 12
12
votes
2 answers

Does direction of causality between instrument and variable matter?

The standard scheme of instrumental variable in terms of causality (->) is: Z -> X -> Y Where Z is an instrument, X an endogenous variable, and Y a response. Is it possible, that following relations: Z <- X ->Y Z <-> X ->Y are also valid? While…
cure
  • 1,666
  • 1
  • 7
  • 19
11
votes
0 answers

Instrumental variables with interactions between endogenous variables

I have two endogenous variables $x_1$ and $x_2$ and am trying to estimate the following model: $$y = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_{12} x_{12}$$ where $x_{12} = x_1\times x_2$. I'm particularly interested in the interaction term…
11
votes
1 answer

Consistency of 2SLS with Binary endogenous variable

I have read that 2SLS estimator is still consistent even with binary endogenous variable (http://www.stata.com/statalist/archive/2004-07/msg00699.html). In the first stage, a probit treatment model will be run instead of a linear model. Is there…
Vincent
  • 307
  • 1
  • 2
  • 8
10
votes
2 answers

does serial correlation have something to do with endogeneity?

I'm a beginner of econometrics, and I've construed that endogeneity is caused by omitted variable bias, measurement error, and reverse causality, and it makes OLS estimator be biased. And also I've learned that serial correlation which refers to…
Kevin Kang
  • 419
  • 3
  • 11
8
votes
4 answers

Can zero covariance and zero expectation imply zero conditional expectation?

Let $x$ and $\epsilon$ are two random variables. If $$Cov(x, \epsilon)=0$$ and $$E[\epsilon]=0,$$ can that lead to $E[\epsilon|x]=0?$
7
votes
0 answers

Endogeneity in spatially lagged regression model

The standard convention in Spatial Statistic is that the spatial lag term in a regression model will be biased due to simultaneity. Looking at the following model, it would be difficult to argue with this: $y_{i} = \rho W y_{i} + X_{i} \beta +…
user29145
  • 71
  • 1
7
votes
2 answers

Omitted Variables and their consequences for other variables

Suppose we have the following regression model: $$ y_{i}=\boldsymbol{x_{i}'\beta}+e_{i} $$ where the vector $\boldsymbol{x_{i}'}$ contains two variables, $[x_{1i}\,x_{2i}].$Suppose for a second, that $e_{i}$ contains variables that correlated…
ChinG
  • 741
  • 8
  • 24
7
votes
1 answer

Random vs Fixed variables in Linear Regression Model

Reading "Econometrics" by Fumio Hayashi, from Princenton University Press, ISBN 0-691-01018-5, in page 13 by "Fixed Regressors" subtitle, it is stated: "We have presented the classical linear regression model, treating the regressors as random.…
LocoGris
  • 283
  • 2
  • 12
7
votes
1 answer

How can endogeneity arise in OLS estimation?

The definition of endogeneity is: $$\mathbb{E}(\varepsilon \vert X)\neq0,$$ which I know is not quite the same as: $$\text{Cov}(\varepsilon, X)\neq0.$$ My question is, how can this occur in OLS, where by construction $\varepsilon \perp X$? Is it…
mss
  • 499
  • 2
  • 4
  • 9
7
votes
1 answer

Bad Controls and Omitted Variables

The traditional manner (in Economics at least) to explain an omitted variables bias involves the consideration of a Mincer type regression:$$w_{it}=\alpha+x_{it}'\beta+\gamma E_{i}+\alpha_{i}+\epsilon_{it}$$ where the LHS denotes wage of individual…
ChinG
  • 741
  • 8
  • 24
7
votes
0 answers

Identification of peer/neighborhood effects in a multilevel framework

My question concerns estimation of “peer effects“ or “neighborhood effects” in a multilevel framework. The idea of such an effect is that the behavior of a household (on level-1) is influenced by the behavior of others in the same…
KML
  • 175
  • 1
  • 9
6
votes
1 answer

Can I ignore the negative R-squared value when I am using instrumental variable regression?

I am running an instrumental variable regression using 'ivreg' command in R program. I find that all my validity tests related to endogeneity are satisfied only except the R-squared value which is negative. May I know whether I can ignore this…
Eric
  • 434
  • 1
  • 10
  • 27
1
2 3
18 19