Effectively using coefficients from poisson regression

Question

This is maybe annoyingly easy for some, but I am completely new to regression.

As an example, I shall use the data set in R, called mtcars. I am interested in the columns cyl,drat,gear and carb, and will try to model cyl using Poisson regression with interaction between gear and carb

> mtcars2<-mtcars
> mtcars2$gear<-as.factor(mtcars2$gear)
> mtcars2$carb<-as.factor(mtcars2$carb)
> mtcars.glm<-glm(cyl~drat + gear + carb + gear:carb, family="poisson", data=mtcars2)
> summary(mtcars.glm)

Gives the following output:

Call:
glm(formula = cyl ~ drat + gear + carb + gear:carb, family = "poisson", 
    data = mtcars2)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.49482  -0.01587   0.00000   0.01705   0.26216  

Coefficients: (7 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  1.97529    0.99358   1.988   0.0468 *
drat        -0.09497    0.30377  -0.313   0.7546  
gear4       -0.20374    0.44511  -0.458   0.6471  
gear5        0.09532    0.49360   0.193   0.8469  
carb2        0.39226    0.30885   1.270   0.2041  
carb3        0.39570    0.32405   1.221   0.2220  
carb4        0.40960    0.29615   1.383   0.1666  
carb6        0.06493    0.63827   0.102   0.9190  
carb8        0.34502    0.61194   0.564   0.5729  
gear4:carb2 -0.38318    0.47243  -0.811   0.4173  
gear5:carb2 -0.68770    0.55361  -1.242   0.2142  
gear4:carb3       NA         NA      NA       NA  
gear5:carb3       NA         NA      NA       NA  
gear4:carb4 -0.01806    0.44170  -0.041   0.9674  
gear5:carb4       NA         NA      NA       NA  
gear4:carb6       NA         NA      NA       NA  
gear5:carb6       NA         NA      NA       NA  
gear4:carb8       NA         NA      NA       NA  
gear5:carb8       NA         NA      NA       NA  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 16.57428  on 31  degrees of freedom
Residual deviance:  0.42691  on 20  degrees of freedom
AIC: 141.09

Number of Fisher Scoring iterations: 4

Now, shockingly, there seem to be little significance with regards to the coefficients in the output, but for the sake of my question, I do very much hope that we can all be mature about that and for now simply look away.

Lets say that we have another data set with the explanatory variables I used above (drat, gear, carb) and I now wanted to predict cyl. Let us say the data set looks like this:

drat2<-rnorm(10,4,0.2)
gear2<-c(4,4,4,4,4,5,5,5,5,5)
carb2<-c(2,2,2,2,2,2,2,2,2,2)
data.frame(drat2,gear2,carb2)

Now I would like to predict the outcome from these values, effectively using the coefficients from the summary of the regression above. Is there any effective way to do this? Also, how do you experts see from the following:

Null deviance: 16.57428  on 31  degrees of freedom
    Residual deviance:  0.42691  on 20  degrees of freedom
    AIC: 141.09

that my model is bad? Did one compare the residuals with the degrees of freedom, or something?

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

You can use the predict function in R to predict. There are three different types of prediction that you can make. Use ?predict.glm and double check the type argument to find out their differences. Note that When you fitted your glm model, you used as.factor. In other words, you created some factors here. So the new data frame you are creating (to be used in the predict function) should be in the same format like this:

> drat2<-rnorm(10,4,0.2)
> gear2<-c(4,4,4,4,4,5,5,5,5,5)
> carb2<-c(2,2,2,2,2,2,2,2,2,2)
> newdat=data.frame(drat=drat2,gear=gear2,carb=carb2)
> newdat$gear<- as.factor(newdat$gear)
> newdat$carb<- as.factor(newdat$carb)
> 
> prd.1=predict(mtcars.glm,newdata=newdat)
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==  :
  prediction from a rank-deficient fit may be misleading
> prd.1
       1        2        3        4        5        6        7        8        9       10 
1.423121 1.415504 1.369435 1.385949 1.404930 1.420861 1.403413 1.389493 1.380887 1.434150 
>

Above predictions are by default on the scale of the linear predictors. Also there is a Warning. This is because of the NA's that you see in the summary output. Or put it this way ... This is because of the interaction term gear:carb you included when fitting. If we refit a new model without an interaction like this:

>  mtcars.glm.2<-glm(cyl~drat + gear + carb, family="poisson", data=mtcars2)

and predict again:

> prd.2=predict(mtcars.glm.2,newdata=newdat)
> prd.2
       1        2        3        4        5        6        7        8        9       10 
1.556609 1.544339 1.470129 1.496730 1.527306 1.657174 1.629067 1.606645 1.592782 1.678580 
>

, then there will be no warning.
The Residual deviance gives an overall assessment of the model, but you cannot tell if your model is good or bad just by looking at Residual deviance. In particular, you need to double check the assumptions of the glm model that you fitted (like any other linear model) and perform model adequacy checkings. For example, here there does not appear to be any problem regarding the over dispersion effect in the model mtcars.glm. I used package AER to test it as follows:

> library(AER)
> dispersiontest(mtcars.glm,trafo=1)

        Overdispersion test

data:  mtcars.glm
z = -68.9419, p-value = 1
alternative hypothesis: true alpha is greater than 0
sample estimates:
     alpha 
-0.9871009

See this question as well. But as I said, you need to check other assumptions. Here are some other model validations codes:

> #model validation plot with the fitted data
> r<-resid(mtcars.glm)
> f<-fitted.values(mtcars.glm,pch=19)
> par(mfrow=c(1,3))
> plot(r~f,ylab="Residuals",xlab="Fitted values",pch=19)
> hist(r,main="",xlab="Residuals")
> qqnorm(r); qqline(r, lwd=2)

enter image description here

Thank you very much! Btw, why did we get an error using the interaction terms? And will this predict function automatically use the correct coefficients, including the coefficients for all the interaction terms when I include the gear:carb? — Erosennin, Apr 02 '14 at 21:06
Also, you write "Above predictions are by default on the scale of the linear predictors.". What does that mean? — Erosennin, Apr 03 '14 at 06:31
And amazing plots, thank you for that, as well! But unfortunately I dont understand them properly... — Erosennin, Apr 03 '14 at 09:18
You get an error because of "singularities". See in the output where it says "Coefficients: (7 not defined because of singularities)". The coefficients have not been estimated, when it says NA! So R has nothing to work on when predicting! Whenever you fit a glm, you have a link function. For example, for the poisson, the link is log. Put it this way, the log of mean of your response is linear function of independent variables. The prediction by default is on the log scale i.e. $\beta_0+\beta_1X1+....$. If you write type = c("response"), then it gives you $exp(\beta_0+\beta_1X1+....)$. — Stat, Apr 03 '14 at 13:48
i.e. giving you on the scale of the response variable. Left hand side plot, is the fitted values against residuals. A good model should not show any pattern. The middle and the right hand side plot is to check the normality assumption of the residuals. From the qq-plot on the right, your residuals are not normal. So the model is not good. Hope that helps. — Stat, Apr 03 '14 at 13:50

Effectively using coefficients from poisson regression

1 Answers1