This is maybe annoyingly easy for some, but I am completely new to regression.
As an example, I shall use the data set in R, called mtcars
. I am interested in the columns cyl
,drat
,gear
and carb
, and will try to model cyl
using Poisson regression with interaction between gear
and carb
> mtcars2<-mtcars
> mtcars2$gear<-as.factor(mtcars2$gear)
> mtcars2$carb<-as.factor(mtcars2$carb)
> mtcars.glm<-glm(cyl~drat + gear + carb + gear:carb, family="poisson", data=mtcars2)
> summary(mtcars.glm)
Gives the following output:
Call:
glm(formula = cyl ~ drat + gear + carb + gear:carb, family = "poisson",
data = mtcars2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.49482 -0.01587 0.00000 0.01705 0.26216
Coefficients: (7 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.97529 0.99358 1.988 0.0468 *
drat -0.09497 0.30377 -0.313 0.7546
gear4 -0.20374 0.44511 -0.458 0.6471
gear5 0.09532 0.49360 0.193 0.8469
carb2 0.39226 0.30885 1.270 0.2041
carb3 0.39570 0.32405 1.221 0.2220
carb4 0.40960 0.29615 1.383 0.1666
carb6 0.06493 0.63827 0.102 0.9190
carb8 0.34502 0.61194 0.564 0.5729
gear4:carb2 -0.38318 0.47243 -0.811 0.4173
gear5:carb2 -0.68770 0.55361 -1.242 0.2142
gear4:carb3 NA NA NA NA
gear5:carb3 NA NA NA NA
gear4:carb4 -0.01806 0.44170 -0.041 0.9674
gear5:carb4 NA NA NA NA
gear4:carb6 NA NA NA NA
gear5:carb6 NA NA NA NA
gear4:carb8 NA NA NA NA
gear5:carb8 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 16.57428 on 31 degrees of freedom
Residual deviance: 0.42691 on 20 degrees of freedom
AIC: 141.09
Number of Fisher Scoring iterations: 4
Now, shockingly, there seem to be little significance with regards to the coefficients in the output, but for the sake of my question, I do very much hope that we can all be mature about that and for now simply look away.
Lets say that we have another data set with the explanatory variables I used above (drat
, gear
, carb
) and I now wanted to predict cyl
. Let us say the data set looks like this:
drat2<-rnorm(10,4,0.2)
gear2<-c(4,4,4,4,4,5,5,5,5,5)
carb2<-c(2,2,2,2,2,2,2,2,2,2)
data.frame(drat2,gear2,carb2)
Now I would like to predict the outcome from these values, effectively using the coefficients from the summary of the regression above. Is there any effective way to do this? Also, how do you experts see from the following:
Null deviance: 16.57428 on 31 degrees of freedom
Residual deviance: 0.42691 on 20 degrees of freedom
AIC: 141.09
that my model is bad? Did one compare the residuals with the degrees of freedom, or something?