Endogeneity in spatially lagged regression model

Question

The standard convention in Spatial Statistic is that the spatial lag term in a regression model will be biased due to simultaneity. Looking at the following model, it would be difficult to argue with this: $y_{i} = \rho W y_{i} + X_{i} \beta + \varepsilon_{i}$, as we would expect a feedback effect. In other words, the spatial lag effects $y_{i}$, but $y_{i}$ also influences its own spatial lag.

However, I am trying to simulate a spatial regression model and I am not finding this result. I would appreciate it if anybody was able to shed light on why this is the case. This is even the case with a very low number of observations and high spatial correlation.

My simulation procedure is as follows. There are 100 observations evenly spaced on a $10 \times 10$ grid. The $W$ weights are given by the rook contiguity, and row standardized. I estimate 4 models. The first two are OLS, with and without a spatial lag term. Unsurprisingly, the coefficients are biased when the spatial lag is excluded. The other models estimate the spatial lag via maximum likelihood and 2SLS. I do not find any substantial difference between the OLS with spatial lag, maximum likelihood, and 2SLS estimators.

The following variables are generated:

$x_1 = 1$, i.e. the intercept

$u_{1i}\sim \mathcal{N}(0,1)$

$u_{2i}\sim \mathcal{N}(0,1)$

$\rho_x = 0.5$, so the independent variables are spatially correlated

$x_{2i} = (1-\rho_{x} W)^{-1} u_{1i}$

$x_{3i} = (1-\rho_{x}W)^{-1} u_{2i}$

$\rho = 0.75$, strong spatial correlation

$b_1 = 1$

$b_2 = 8$

$b_3 = 2$

$\varepsilon_{i}\sim \mathcal{N}(0,1)$

$M=(1- \rho W)^{-1}$

$y_{i} = M b_{1}x_{1i} + M b_{2}x_{2i} + M b_{3}x_{3i} + M \varepsilon_{i}$, the reduced form model.

I also attach the relevant R code.

# some spatial models
library(spdep)

rm(list=ls())
n = 100
data = data.frame(n1=1:n)
# coords
data$lat = rep(1:sqrt(n), sqrt(n))
    data$long = sort(rep(1:sqrt(n), sqrt(n)))
# create W matrix
wt1 = as.matrix(dist(cbind(data$long, data$lat), method = "euclidean", upper=TRUE))
wt1 = ifelse(wt1==1, 1, 0)
diag(wt1) = 0
# row standardize
rs = rowSums(wt1)
wt1 = apply(wt1, 2, function(x) x/rs)
lw1 = mat2listw(wt1, style="W")

rx = 0.5
rho = 0.75
b1 = 1
b2 = 8
b3 = 2

inv1 = invIrW(lw1, rho=rx, method="solve", feasible=NULL)
inv2 = invIrW(lw1, rho=rho, method="solve", feasible=NULL)

sims = 100
beta1results = matrix(NA, ncol=4, nrow=sims)
beta2results = matrix(NA, ncol=4, nrow=sims)
beta3results = matrix(NA, ncol=4, nrow=sims)

rhoresults = matrix(NA, ncol=3, nrow=sims)

for(i in 1:sims){
  u1 = rnorm(n)
  x2 = inv1 %*% u1
  u2 = rnorm(n)
  x3 = inv1 %*% u2 
  e1 = rnorm(n)
  y1 = b1 + b2*x2 + b3*x3 + e1
  y1 = inv2 %*% y1 
  yl = wt1 %*% y1
  data1 = data.frame(y1, x2, x3)

  m1 = coef(lm(y1 ~ x2 + x3))
  m2 = coef(lm(y1 ~ yl + x2 + x3))
  m3 = coef(lagsarlm(y1 ~ x2 + x3, data1, lw1))
  m4 = coef(stsls(y1 ~ x2 + x3, data1, lw1))

  beta1results[i,] = c(m1[1], m2[1], m3[2], m4[2])
  beta2results[i,] = c(m1[2], m2[3], m3[3], m4[3])
  beta3results[i,] = c(m1[3], m2[4], m3[4], m4[4])
  rhoresults[i,] = c(m2[2], m3[1], m4[1])
}

# assess performance
apply(rhoresults, 2, mean) ; apply(rhoresults, 2, sd)
apply(beta1results, 2, mean) ; apply(beta1results, 2, sd)
apply(beta2results, 2, mean) ; apply(beta2results, 2, sd)
apply(beta3results, 2, mean) ; apply(beta3results, 2, sd)

See (your own post?) [here](http://diffuseprior.wordpress.com/2013/08/18/endogenous-spatial-lags-for-the-linear-regression-model). — tchakravarty, Dec 21 '13 at 22:29

Endogeneity in spatially lagged regression model

0 Answers0