Sorry if this is a bit long..
I've been trying to fit models predicting the % of area infested in a field (response between 0 and 100%, total of 61 fields), with four explanatory variables, two factorials (planting
and monitor
) and two covariates (area2peri
and Tmin7_bef
).
I used the betareg()
with link = "cauchit"
(after comparing LL and AIC for the same predictors with different link functions), and tried the different combinations of parameters for both mean and dispersion sub-models. Would like some help understanding the possibilities of using different pseudo-$R^2$ for the different models. I am referring to the table on the UCLA website.
There are a few things that are not clear to me, and looking at betareg
documentation did not solve them..
- The
summary(betareg(...))
provides a pseudo-$R^2$ square that is the "squared correlation of linear predictor and link-transformed response". How is that calculated? and to which type of the pseudo-$R^2$ it relates (Efron's,McFadden's, Cox&Snell etc.)?
I tried calculating the different types myself, as shown in the betareg
documentation p.20, fitted a null (intercept only) and full models and extracted the LogLik for both. However there were several issues:
- The formula suggested for Mcfadden's $R^2$ in
betareg
is the inverse of the one presented in the UCLA website, the null model's LL is the numerator inbetareg
and the denominator in UCLA.. what am I missing?
Here are the results for the different $R^2$, as well as the full model summary:
Call:
betareg(formula = A2to5 ~ planting + area2peri_m + monitor + Tmin7_bef | planting + monitor, data = na.omit(pre_n0),
link = "cauchit")
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-2.1745 -0.6122 0.1443 0.9129 1.6225
Coefficients (mean model with cauchit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.766418 0.714323 -1.073 0.2833
plantinglate -1.013075 0.409167 -2.476 0.0133 *
area2peri_m -0.013529 0.006384 -2.119 0.0341 *
monitor1 0.712917 0.277825 2.566 0.0103 *
Tmin7_bef 0.111958 0.049065 2.282 0.0225 *
Phi coefficients (precision model with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9635 0.2577 3.738 0.000185 ***
plantinglate 0.4337 0.3331 1.302 0.192929
monitor1 -0.2295 0.3145 -0.730 0.465674
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 12.55 on 8 Df
Pseudo R-squared: 0.07703
Number of iterations: 31 (BFGS) + 1 (Fisher scoring)
Mfull <-as.vector(logLik(betareg_cauchit)) # 12.54867
Mintercept <- as.vector(logLik(betareg_cauchit_intonly)) #4.207168
n <- betareg_cauchit_intonly$n #61
## McFadden's pseudo-R-squared (explained portion of variance)- according to betreg documentation
1 -(Mintercept/Mfull) #0.6647
## McFadden's pseudo-R-squared - according to UCLA
1 -(Mfull/Mintercept) #-1.9827
## adjusted McFadden's pseudo-R-squared
1-(Mfull-8)/Mintercept #-0.08117
## Cox&Snell (improvment over null model)
1-((Mintercept/Mfull)^(2/n)) #0.0352
max_cox_senll <- 1-Mintercept^(2/n) #-0.0482
## Cragg & Uhler’s (improvment over null model)
(1-((Mintercept/Mfull)^(2/n)))/max_cox_senll #-0.73
Which brings me to:
In all of my calculation attempts, hopefully not erroneous, I did not achieve the summary's pseudo-$R^2$ (relating to the first question) but what is the meaning of a negative value? Is my fit that bad?
Finally, I get that there is no consensus of which type to use, or should one report the pseudo-$R^2$ at all, but how can I really judge if my models is able to explain something in the world?