3

I am following an example here on using Logistic Regression in R. However, I need some help interpreting the results. They do go over some of the interpretations in the above link, but I need more help with understanding a goodness of fit for Logistic Regression and the output that I am given.

For convenience, here is the summary given in the example:

## Call:
## glm(formula = admit ~ gre + gpa + rank, family = "binomial", 
##     data = mydata)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.627  -0.866  -0.639   1.149   2.079  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.98998    1.13995   -3.50  0.00047 ***
## gre          0.00226    0.00109    2.07  0.03847 *  
## gpa          0.80404    0.33182    2.42  0.01539 *  
## rank2       -0.67544    0.31649   -2.13  0.03283 *  
## rank3       -1.34020    0.34531   -3.88  0.00010 ***
## rank4       -1.55146    0.41783   -3.71  0.00020 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 499.98  on 399  degrees of freedom
## Residual deviance: 458.52  on 394  degrees of freedom
## AIC: 470.5
## 
## Number of Fisher Scoring iterations: 4
  1. How well did Logistic Regression fit here?
  2. What exactly are the Deviance Residuals? I believe they are the average residuals per quartile. How do I determine if they are bad/good/statistically significant?
  3. What exactly is the z-value here? Is it the normalized standard deviation from the mean of the Estimate assuming a mean of 0?
  4. What exactly are Signif. codes?

Any help is greatly appreciated! You do not have to answer them all!

CodeKingPlusPlus
  • 403
  • 1
  • 6
  • 14
  • 1
    possible duplicate of [Interpretation of R's output for binomial regression](http://stats.stackexchange.com/questions/86351/interpretation-of-rs-output-for-binomial-regression) – gung - Reinstate Monica May 03 '14 at 19:47
  • That link definitely answers some of my questions, but not all. Specifically, I am still unsure about goodness of fit as the example didn't consider goodness of fit because the data as univariate. – CodeKingPlusPlus May 03 '14 at 19:57
  • 3
    Here's a really nice resource for the theory behind the glm function in R: http://people.bath.ac.uk/sw283/mgcv/tampere/glm.pdf. Since you're using family="binomial", I believe deviance is just -2*log(likelihood). You've left the world of Sums of Squares and have wandered into the land of likelihood. Things will feel a little strange, but that resource will walk you through the analogs to ordinary regression. – Ben Ogorek May 04 '14 at 02:17
  • 1
    Regarding the idea of goodness of fit, people often recommend so-called $\text{pseudo }R^2$'s. This is somewhat controversial, see: [Which pseudo-R2 measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?](http://stats.stackexchange.com/q/3559/) – gung - Reinstate Monica May 04 '14 at 03:10

1 Answers1

2
  • Something quite useful is to use Nagelkerke $R^2$, which is just a generalization of the general R^2 statistic in linear regression. Using the rms package.

     library(rms)
     model <- lrm(y~x)
     summary(model)
    

Also, you could use cross-validation. That means you test for correct predictions in your original data using a criteria (usually 1 if predict > 0.5) and then calculate a rate. (>80% is usually fine, but that depends on the study).

  • Deviance is a generalization of the residual sum of squares, and can be used to make some hypothesis testing in logistic regression.

  • z-value is the statistic $(B_{j}-\hat{B_{j}})/s.e.(\hat{B_{j}})$ which asymptotically converges to a $\mathcal{N}(0,1)$(Under the null hypothesis $B_{j}=0$).

  • Significant codes are just a guide. e.g. * means the associated p-value is $<0.05$.

Sycorax
  • 76,417
  • 20
  • 189
  • 313