0

I've modeled two regression models the first is a multiple linear regression (OLS) $$Y=\beta_0+\beta_1X_1+\cdots+\beta_nX_n+e$$ and I can get its $R^2$. The second model is a spatial autoregressive model (SAR) $$Y=\rho W + \beta_0 + \beta_1 X_1 + \cdots+\beta_nX_n+e$$ where $W$ is the contiguity matrix and $\rho$ is an unknown parameter. This model is estimated by the method of maximum likelihood but I cannot calculate its $R^2$ and rather I have to use the $R^2$ Nalgerkerke. I've found this "There is no direct equivalent to the OLS R-squared, these models are fitted by maximum likelihood." from http://r-sig-geo.2731867.n2.nabble.com/How-to-calculate-squared-R-of-spatial-autoregressive-models-td5762576.html but I'd like to know why I cannot calculate $R^2$ for this model if the formula is just $$R^2=1-\frac{\sum(y_i-\hat{y}_i)^2}{\sum(y_i-\overline{y})^2}$$

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
  • 3
    "There is no direct equivalent" does not mean "you cannot calculate it." The former needs to be interpreted as a warning about the applicability and interpretation of $R^2.$ – whuber Jun 28 '19 at 18:13
  • Alright but why? I mean I can calculate it for both but is the interpretation different or no equivalent? Do you know some paper or book where I could read more about this? Thanks for your answer. – Alexis Galois Jun 28 '19 at 18:42
  • If you have a likelihood, then you can compute deviance. For Gaussian likelihood, deviance is the r-squared. Compare both formulas to understand if and why they differ. – Firebug Aug 15 '21 at 12:39
  • 1
    Fit the model and find its predictions. What is $ \sum_i\Big[ (y_i - \hat{y_i})(\hat{y_i} - \bar{y}) \Big] $? If that sum is not zero (or some tiny number that close enough to zero for arithmetic on a computer), then $R^2$ [loses its usual "proportion of variance explained" interpretation](https://stats.stackexchange.com/questions/551915/interpreting-nonlinear-regression-r2). – Dave Dec 16 '21 at 21:19

1 Answers1

1

First, the definition of $R^2$ originates from the decomposition formula, i.e. $$S_T=S_R+S_e,$$ where $S_T=\sum\limits_{i=1}^n(y_i-\bar y)^2$, $S_R=\sum\limits_{i=1}^n(\hat y_i-\bar y)^2$, $S_e=\sum\limits_{i=1}^n(y_i-\hat y_i)^2$. Assume that the matrix form of the multiple linear regression is $$Y_{n\times 1}=X_{n\times (p+1)}\beta_{(p+1)\times 1}+\varepsilon_{n\times 1},$$ the proof of the above decomposition formula uses the normal equations of the OLS estimator $\hat\beta_{OLS}$ $$X^T(Y-X\hat\beta_{OLS})=0_{(p+1)\times 1}.$$ Second, the MLE estimator $\hat\beta_{MLE}$ of $\beta$ doesn't satisfy the normal equations unless $\varepsilon_{n\times 1}$ follows $N(0_{n\times 1},I_n)$.

alma2004
  • 41
  • 4