1

I have used mlogit package and I am trying to summarize the results I have from my model. I have a question regarding the reference value and will get to that in a moment.

redata.full <- mlogit(no.C~ 1| WR+age+age2+BP+noC.1yr, data=redata, reflevel="0", na.action=na.fail)

no.C = number of offspring    
WR = risk
age+age2 = the non-linear relationship that as an individual ages their production decreases
BP = browsing pressure
noC.1yr = number of offspring produced the year before

I recognize that my data is ordinal in nature, but Im following other people's methods who have done this and used the reference based approach rather than ordinal logistic regression. However, I am still shakey on justification other than citing the other person and saying "he did it too!" If anyone has a suggestion I would appreciate it.

My results for this model are:

Call:
mlogit(formula = no.C ~ 1 | WR + age + age2 + BP + noC.1yr, data = redata, 
    na.action = na.fail, reflevel = "0", method = "nr", print.level = 0)

Frequencies of alternatives:
       0        1        2 
0.233766 0.675325 0.090909 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 2.16E-07 
gradient close to zero 

Coefficients :
               Estimate Std. Error t-value Pr(>|t|)  
1:(intercept) -0.281226   1.225763 -0.2294  0.81854  
2:(intercept) -0.605312   1.997179 -0.3031  0.76183  
1:WR           0.847273   0.518854  1.6330  0.10248  
2:WR           1.347976   0.689916  1.9538  0.05072 .
1:age          0.314075   0.275486  1.1401  0.25425  
2:age         -0.422368   0.395240 -1.0686  0.28523  
1:age2        -0.018998   0.014446 -1.3151  0.18847  
2:age2         0.022572   0.018949  1.1912  0.23359  
1:BP          -0.143720   0.173585 -0.8280  0.40770  
2:BP          -0.074553   0.331108 -0.2252  0.82185  
1:noC.1yr      0.574304   0.377821  1.5200  0.12850  
2:noC.1yr      1.251673   0.626033  1.9994  0.04557 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -116.6
McFadden R^2:  0.079844 
Likelihood ratio test : chisq = 20.236 (p.value = 0.0271)

exp(cbind(OddsRatio = coef(redata.full), ci))
              OddsRatio      2.5 %    97.5 %
1:(intercept) 0.7548580 0.06831155  8.341351
2:(intercept) 0.5459038 0.01089217 27.360107
1:WR          2.3332750 0.84394900  6.450831
2:WR          3.8496270 0.99577472 14.882511
1:age         1.3689929 0.79782462  2.349065
2:age         0.6554925 0.30209181  1.422317
1:age2        0.9811815 0.95379086  1.009359
2:age2        1.0228284 0.98553735  1.061530
1:BP          0.8661299 0.61634947  1.217136
2:BP          0.9281585 0.48504538  1.776078
1:noC.1yr     1.7758933 0.84686698  3.724076
2:noC.1yr     3.4961862 1.02497823 11.925441

I would like confirmation of my interpretations: The model is better than a null - obtained from the likelihood ratio test.

Question: How do I test how well the model is actually working (i.e., goodness of fit)? Hosmer-Lemshow test? Ive read warnings about using the McFaddin's Pseudo R where they really aren't applicable to multinomial regressions. Ive found a HL test with ResourceSelection library and it says my model is NOT doing well at all. Now what?

Interpretation: WR and noC.1yr are the only variables that are coming out as slightly significant. But this is only between the reference value of 0 and production of 2 calves. It is not significantly different between 0 or 1 for these variables.

Question: Ive been trying to find somewhere in the vignette what the t-value is - it is just a t-test? How would I refer to the estimate as being significant? "The estimated odds for 2-offspring being produced versus 0 were 3.85 (95% CI = 1.0-14.88) which was significant (t= 1.99, P=0.05)"

Referring to my statement regarding setting the reference value. When I run this exact same model using my other options of 0 or 1 offspring - I get completely different results of which variables are significant. If I use 2 as the reference value then Age+WR+noC.yr are significant. If I use 1, then Age only is sig. So, which one to use? I have read you want to pick one that is most relevant to your hypothesis, but in this case I could motivate any of the 3 levels.

Kerry
  • 1,129
  • 3
  • 14
  • 20

1 Answers1

1

How about you'll try:

  • Looking at the residual plots

par(mfrow=c(2,2)) plot(redata.full)

  • Checking out vif's vif to see whether or not there's multicolinearity (since I don't know your data, it's hard to have an idea where it could be, maybe it's Age1 and 1:noC.1yr ?Or some other variable)

I would say this article is a good start: http://www.statmethods.net/stats/rdiagnostics.html ; although it's predominantly on ols, some of those examples are applicable to multinomial logistic. I suspect you might be a little bit confused about your dummy variables (which is absolutely normal!), is that what you mean by "reference value." How did you dummy code those [apparentlying]categorical variables? Did you factor them? What did you code as your baseline?

Jen
  • 409
  • 3
  • 14
  • Yes, reference value is the same as baseline value. You will notice in the R equation output it says `reflevel = "0"`. My baseline is 0. Yes the mlogit package requires the response variable to be a factor. I did correlation analysis of all my variables and none are above 45% (except age and age^2 but that tests for nonlinearity). – Kerry Aug 25 '14 at 06:43