I am trying to replicate what the function dfbetas()
does in R.
dfbeta()
is not an issue... Here is a set of vectors:
x <- c(0.512, 0.166, -0.142, -0.614, 12.72)
y <- c(0.545, -0.02, -0.137, -0.751, 1.344)
If I fit two regression models as follows:
fit1 <- lm(y ~ x)
fit2 <- lm(y[-5] ~ x[-5])
I see that eliminating the last point results in a very different slope (blue line - steeper):
This is reflected in the change in slopes:
fit1$coeff[2] - fit2$coeff[2]
-0.9754245
which coincides with the dfbeta(fit1)
for the fifth value:
(Intercept) x
1 0.182291949 -0.011780253
2 0.020129324 -0.001482465
3 -0.006317008 0.000513419
4 -0.207849024 0.019182219
5 -0.032139356 -0.975424544
Now if I want to standardize this change in slope (obtain dfbetas) and I resort to:
Williams, D. A. (1987) Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics 36, 181–191
which I think may be one of the references in the R documentation under the package {stats}. There the formula for dfbetas is:
$\large \mathrm{dfbetas} (i, \mathrm{fit}) = \Large {(\hat{b} - \hat{b}_{-i})\over \mathrm{SE}\, \hat{b}_{-i}}$
This could be easily calculated in R:
(fit1$coef[2] - fit2$coef[2])/summary(fit2)$coef[4]
yielding: -6.79799
The question is why I am not getting the fifth value for the slope in:
dfbetas(fit1)
(Intercept) x
1 1.06199661 -0.39123009
2 0.06925319 -0.02907481
3 -0.02165967 0.01003539
4 -1.24491242 0.65495527
5 -0.54223793 -93.81415653!
What is the right equation to go from dfbeta to dfbetas?