The standard convention in Spatial Statistic is that the spatial lag term in a regression model will be biased due to simultaneity. Looking at the following model, it would be difficult to argue with this: $y_{i} = \rho W y_{i} + X_{i} \beta + \varepsilon_{i}$, as we would expect a feedback effect. In other words, the spatial lag effects $y_{i}$, but $y_{i}$ also influences its own spatial lag.
However, I am trying to simulate a spatial regression model and I am not finding this result. I would appreciate it if anybody was able to shed light on why this is the case. This is even the case with a very low number of observations and high spatial correlation.
My simulation procedure is as follows. There are 100 observations evenly spaced on a $10 \times 10$ grid. The $W$ weights are given by the rook contiguity, and row standardized. I estimate 4 models. The first two are OLS, with and without a spatial lag term. Unsurprisingly, the coefficients are biased when the spatial lag is excluded. The other models estimate the spatial lag via maximum likelihood and 2SLS. I do not find any substantial difference between the OLS with spatial lag, maximum likelihood, and 2SLS estimators.
The following variables are generated:
$x_1 = 1$, i.e. the intercept
$u_{1i}\sim \mathcal{N}(0,1)$
$u_{2i}\sim \mathcal{N}(0,1)$
$\rho_x = 0.5$, so the independent variables are spatially correlated
$x_{2i} = (1-\rho_{x} W)^{-1} u_{1i}$
$x_{3i} = (1-\rho_{x}W)^{-1} u_{2i}$
$\rho = 0.75$, strong spatial correlation
$b_1 = 1$
$b_2 = 8$
$b_3 = 2$
$\varepsilon_{i}\sim \mathcal{N}(0,1)$
$M=(1- \rho W)^{-1}$
$y_{i} = M b_{1}x_{1i} + M b_{2}x_{2i} + M b_{3}x_{3i} + M \varepsilon_{i}$, the reduced form model.
I also attach the relevant R code.
# some spatial models
library(spdep)
rm(list=ls())
n = 100
data = data.frame(n1=1:n)
# coords
data$lat = rep(1:sqrt(n), sqrt(n))
data$long = sort(rep(1:sqrt(n), sqrt(n)))
# create W matrix
wt1 = as.matrix(dist(cbind(data$long, data$lat), method = "euclidean", upper=TRUE))
wt1 = ifelse(wt1==1, 1, 0)
diag(wt1) = 0
# row standardize
rs = rowSums(wt1)
wt1 = apply(wt1, 2, function(x) x/rs)
lw1 = mat2listw(wt1, style="W")
rx = 0.5
rho = 0.75
b1 = 1
b2 = 8
b3 = 2
inv1 = invIrW(lw1, rho=rx, method="solve", feasible=NULL)
inv2 = invIrW(lw1, rho=rho, method="solve", feasible=NULL)
sims = 100
beta1results = matrix(NA, ncol=4, nrow=sims)
beta2results = matrix(NA, ncol=4, nrow=sims)
beta3results = matrix(NA, ncol=4, nrow=sims)
rhoresults = matrix(NA, ncol=3, nrow=sims)
for(i in 1:sims){
u1 = rnorm(n)
x2 = inv1 %*% u1
u2 = rnorm(n)
x3 = inv1 %*% u2
e1 = rnorm(n)
y1 = b1 + b2*x2 + b3*x3 + e1
y1 = inv2 %*% y1
yl = wt1 %*% y1
data1 = data.frame(y1, x2, x3)
m1 = coef(lm(y1 ~ x2 + x3))
m2 = coef(lm(y1 ~ yl + x2 + x3))
m3 = coef(lagsarlm(y1 ~ x2 + x3, data1, lw1))
m4 = coef(stsls(y1 ~ x2 + x3, data1, lw1))
beta1results[i,] = c(m1[1], m2[1], m3[2], m4[2])
beta2results[i,] = c(m1[2], m2[3], m3[3], m4[3])
beta3results[i,] = c(m1[3], m2[4], m3[4], m4[4])
rhoresults[i,] = c(m2[2], m3[1], m4[1])
}
# assess performance
apply(rhoresults, 2, mean) ; apply(rhoresults, 2, sd)
apply(beta1results, 2, mean) ; apply(beta1results, 2, sd)
apply(beta2results, 2, mean) ; apply(beta2results, 2, sd)
apply(beta3results, 2, mean) ; apply(beta3results, 2, sd)