I have a question about my methods to develop a GLM and the results of that GLM (particularly deviance). We were given data around heart valve thickness (measure) and possible explanatory variables (age, smoking habits, activity(SPORT), alcohol habits and BMI) and asked to run a number of univariate analysis which I did as such:
model1a <- lm(measure ~ AGE, data = Intima2)
summary(model1a)
model2a <- lm(measure ~ packyear, data = Intima2)
summary(model2a)
model3a <- lm(measure ~ SPORT, data = Intima2)
summary(model3a)
model4a <- lm(measure ~ alcohol, data = Intima2)
summary(model4a)
model5a <- lm(measure ~ BMI, data = Intima2)
summary(model5a)
We were then asked to keep all results that returned a p<0.25 and then "one by one, test all possible interactions between the selected explanatory variables and the main exposure variable packyear" - the only model that was above this threshold was SPORT so I removed that and then ran this code:
model1b<- lm(measure ~ packyear*AGE, data = Intima2)
summary(model1b)
model2b <- lm(measure ~ packyear*BMI, data = Intima2)
summary(model2b)
model3b <- lm(measure ~ packyear*alcohol, data = Intima2)
summary(model3b)
This step I am not 100% confident on and am wondering if I made an error? Are there more interactions I could have done? Also as a categorical variable with 3 levels should I have written alcohol as factor(alcohol) in either or both of these steps?
We were then asked to estimate a model with all interactions and terms that returned p<0.25. The only interaction that gave this was packyear*alcohol so I ran this:
glm1 <- glm(measure ~ AGE + BMI + packyear + alcohol + packyear:alcohol, family = Gamma(link = identity), data = Intima2)
We then had to perform backward model selection and eliminate the highest p values, making sure that the removals did not make a big difference on the estimate of the coefficient associated with smoking status and my final model was this and gave this result:
glm3 <- glm(measure ~ AGE + BMI + packyear, family = Gamma(link = log), data = Intima2)
summary(glm3)
Call:
glm(formula = measure ~ AGE + BMI + packyear, family = Gamma(link = log),
data = Intima2)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.30117 -0.08328 -0.01571 0.06458 0.39219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1338622 0.0807785 -14.037 < 2e-16 ***
AGE 0.0073254 0.0011212 6.534 2.29e-09 ***
BMI 86.0179269 31.5763694 2.724 0.00754 **
packyear 0.0003447 0.0013512 0.255 0.79914
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.01603842)
Null deviance: 2.6334 on 109 degrees of freedom
Residual deviance: 1.6348 on 106 degrees of freedom
AIC: -282.74
Number of Fisher Scoring iterations: 4
Everything looks good to me except the deviance, but I'm not sure what the exceptionally low deviances mean or how to fix them, or even if I need to?
Thanks very much for any help and apologies for the long post!
Cheers,
D