5

I wonder how can I determine whether a variable is significant using Pr (>Chi) and Df? Take following ANOVA table as an example, I know variable type is significant, but I don't know why.

##        Df  Deviance       Resid.Df       -2*LL      Pr(>Chi)
## NULL   NA        NA             28      103.49            NA
## write   1    16.689             27       86.81     4.403e-05
## rating  1     6.097             26       80.71     1.354e-02
## type    2    14.450             24       66.26     7.280e-04
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Learn
  • 53
  • 1
  • 5

2 Answers2

5

This actually appears to be an analysis of deviance table, but the principle is the same, and people still call it an 'ANalysis Of VAriance' table. The table presents information about a series of sequentially nested model fits. In the first row is the null model (without any of the variables included). Each subsequent row adds another variable to the model and information about the changes is given. It is more typical to move in the opposite order (i.e., from the 'full' model on down, dropping one variable at a time), but this is inconsequential.

The columns might make more sense if they were presented in a different order. The fourth column contains a measure of goodness of fit ($-2\times\log\ {\rm likelihood}$); bear in mind that lower values imply a better fit and that the fit has to improve upon adding a variable whether that variable is relevant or not. The second column (Deviance) is displaying the difference between that model's -2*LL and the previous model's. Column three (Resid.Df) says how many residual degrees of freedom each model has. In the first column, you see listed the degrees of freedom associated with each variable, it is the difference between that model's residual degrees of freedom and the previous model's. The thing to realize here is that the difference between two nested models' -2*LL (i.e., the deviance), is distributed as a chi-squared variable with the degrees of freedom equal to the difference between the two models' residual degrees of freedoms. The probability of seeing a difference in -2*LL that large or larger, given the addition of a variable with that many degrees of freedom is displayed in the last column (Pr(>Chi)). Thus, having stipulated an $\alpha$ / type I error rate you feel you can live with, we can see if the improvement in model fit upon adding a variable is greater than we would expect by chance alone.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • In this case, if I stipulated α as 0.05, does it means that the model with all three variables is the best model (with lowest '-2*LL'/goodness of fit of model)? And if I stipulate α as 0.01, the best model will only have 'write' as independent variable? – Learn Dec 10 '14 at 17:39
  • @Learn, unfortunately it's not that simple. Each of these tests assumes the variables above it are included in the model. The results you would get if the order were rearranged might differ from these results. Although these tests use deviance instead of sums of squares, you can get the idea by reading my answer here: [How to interpret type I (sequential) ANOVA and MANOVA?](http://stats.stackexchange.com/a/20455/7290) (Also, `type` is significant at the .01 level.) – gung - Reinstate Monica Dec 10 '14 at 17:48
  • thank you for your detailed answer. Just to be clear, does "-2*LL" for "type" (7.280e-04) indicate the goodness of fit of the model including all the variables above it? Or it only tell us whether "type" is significant? – Learn Dec 10 '14 at 18:34
  • 1
    @Learn, the `-2*LL` for `type` is `66.26`. This refers to the model that includes `type`, `rating` & `write`. The p-value for `type` is `7.280e-04`. It refers to the probability of having an improvement in the goodness of fit of `14.450` or greater, if `type` were not actually related to your response variable. – gung - Reinstate Monica Dec 10 '14 at 18:40
  • May I ask why do we need the assumption that `type` were not actually correlated to dependent variable(if I understand it correctly)? – Learn Dec 10 '14 at 19:02
  • @Learn, that's what you're testing. If you assume `type` is related to the DV, there is no need to test. – gung - Reinstate Monica Dec 10 '14 at 19:08
2

You would most easily judge significance for one of those variables by checking whether the p-value was $\leq \alpha$, your previously chosen significance level.

This is true of essentially any hypothesis test, not just chi-square tests.

You haven't stated your significance level, so I can't talk about which of those coefficients are significant.

Glen_b
  • 257,508
  • 32
  • 553
  • 939