I was a little bit surprised by the high value of the McFadden R^2 given by the "mlogit" R package for this simple model:
f = mFormula(mode ~ log(cost) | 1 | 1)
"mlogit" gives the following output:
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
2:(intercept) -4.002328 0.103865 -38.534 < 2.2e-16 ***
3:(intercept) -2.028449 0.021986 -92.260 < 2.2e-16 ***
log(cost) -1.781669 0.076317 -23.346 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -9138.7
McFadden R^2: 0.62162
Likelihood ratio test : chisq = 30026 (p.value = < 2.22e-16)
If I try to compute the McFadden R^2 "by hand", I have to estimate the null model, (i.e. mFormula(mode ~ 1)), for which I obtain
Coefficients :
Estimate Std.Error t-value Pr(>|t|)
(Intercept):2 -2.558537 0.026906 -95.091 < 2.2e-16 ***
(Intercept):3 -2.715562 0.028951 -93.797 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -10212, df = 2
AIC: 20427
If I'm not wrong, given the LL value of -9138 of the model, the McFadden R^2 is computed as 1 - (-9138 / -10212) = 0.105, which is very different from 0.621.
Note that I have aggregated data, and that the discussion in How to calculate pseudo R2 when using logistic regression on aggregated data files? might be useful here, as I use a weighted logit.
Did I miss something or is the R^2 provided by the "mlogit" package not accurate when computed for weighted models ?