I used R for fitting a linear model using weighted least squares (due to heteroskedastic errors): $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$, where $E(\boldsymbol{\varepsilon}) = 0,\ Cov(\boldsymbol{\varepsilon}) = \sigma^{2}\boldsymbol{V}$ and $\boldsymbol{V}$ a known, diagonal matrix.
set.seed(1)
x = c(-1, 0, 1, 2, 3)
y_true = 2 * x^2 + 3*x + 1
vars = c(0.5, 1, 1.5, 2, 2.5)
e = rnorm(vars, mean=0, sd=sqrt(vars))
y = y_true + e
weights = 1./vars
m = lm(y ~ x + I(x^2), weights=weights)
After fitting, R provides the r.squared value:
summary = summary(m)
# The R^2 value that lm provides
r2 = summary$r.squared # 0.993019
How exactly is this value computed?
I know that for linear models using ordinary (unweighted) least squares, this value is computed as $R^{2} = 1 - \frac{(\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{\beta}})^T(\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{\beta}})}{\boldsymbol{y}^{T}\boldsymbol{y} - n (\bar{\boldsymbol{y}})^{2}}$, where $\hat{\boldsymbol{\beta}} = (\boldsymbol{X}^{T}\boldsymbol{X})^{-1}\boldsymbol{X}^{T} \boldsymbol{y}$ is the OLS estimator and $\bar{\boldsymbol{y}}$ is the sample mean.
The heteroskedastic model can be transformed to the OLS case by multiplication of both sides with $\boldsymbol{V}^{-1/2}$ yielding
$\tilde{\boldsymbol{y}} = \boldsymbol{V}^{-1/2}\boldsymbol{y} = \boldsymbol{V}^{-1/2}\boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{V}^{-1/2} \boldsymbol{\varepsilon} = \tilde{\boldsymbol{X}}\boldsymbol{\beta} + \tilde{\boldsymbol{\varepsilon}}$.
So, as also described by Willet and Singer (see https://gseacademic.harvard.edu/~willetjo/pdf%20files/Willett_Singer_AS88.pdf), I see two obvious options for computing $R^{2}$.
- Computing it for the transformed model $R^{2} = 1 - \frac{(\tilde{\boldsymbol{y}} - \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})^T(\tilde{\boldsymbol{y}} - \tilde{\boldsymbol{X}}\hat{\boldsymbol{\beta}}_{*})}{\tilde{\boldsymbol{y}}^{T}\tilde{\boldsymbol{y}} - n (\bar{\tilde{\boldsymbol{y}}})^{2}}$, where $\hat{\boldsymbol{\beta}}_{*} = (\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{X}})^{-1}\tilde{\boldsymbol{X}}^{T}\tilde{\boldsymbol{y}} = (\boldsymbol{X}^{T}\boldsymbol{V}^{-1}\boldsymbol{X})^{-1}\boldsymbol{X}^{T}\boldsymbol{V}^{-1} \boldsymbol{y}$ is the OLS estimator of the transformed model.
X = model.matrix(m)
V = diag(vars)
V_neg_half = sqrt(solve(V)) # V^(-1/2)
y_tilde = V_neg_half %*% y
X_tilde = V_neg_half %*% X
n = length(y)
# OLS of the transformed data
X_tilde_t_X = t(X_tilde) %*% X_tilde
beta_hat_star = solve(X_tilde_t_X) %*% t(X_tilde) %*% y_tilde
residuals = y_tilde - X_tilde %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y_tilde) %*% y_tilde - n * mean(y_tilde)**2
r2.1 = 1 - numerator / denominator # 0.989646
- Using the WLS estimator with the untransformed data as $R^{2} = 1 - \frac{(\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})^T(\boldsymbol{y} - \boldsymbol{X}\hat{\boldsymbol{\beta}}_{*})}{\boldsymbol{y}^{T}\boldsymbol{y} - n (\bar{\boldsymbol{y}})^{2}}$
residuals = y - X %*% beta_hat_star
numerator = t(residuals) %*% residuals
denominator = t(y) %*% y - n * mean(y)**2
r2.2 = 1 - numerator / denominator # 0.9923849
However, neither of those values correspond to the result that R provides and I am not sure if the differences of my (artificial) example are only due to rounding issues.