Simulate a poisson model:
set.seed(1)
predictor <- rnorm(100000, 2.5, 0.5)
# describe(predictor)
lam <- 0.98 * predictor
# describe(lam)
rp <- function(lambda){rpois(1, lambda)}
vrp <- Vectorize(rp)
response <- vrp(lam)
# describe(response)
fit <- glm(response ~ 1, offset=log(predictor), family="poisson")
summary(fit)
Regression:
Call:
glm(formula = response ~ 1, family = "poisson", offset = log(predictor))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8188 -0.8334 -0.1146 0.5777 4.3871
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.017707 0.002018 -8.773 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 111501 on 99999 degrees of freedom
Residual deviance: 111501 on 99999 degrees of freedom
AIC: 361334
Number of Fisher Scoring iterations: 5
> pchisq(111501, 99999)
[1] 1
This above should be an exact fit, I don't get why pchisq
is giving me so extreme number.
Reference link: When someone says residual deviance/df should ~ 1 for a Poisson model, how approximate is approximate?
Update
I found this http://pj.freefaculty.org/guides/stat/Regression-GLM/GLM2-SigTests/
And it suggests to use pearson's residual instead of deviance to perform a goodness of fit... and it works much better.
ssr <- sum(residuals(fit, type="pearson")^2)
pchisq(ssr, 99999)