4

I am running an ordinal logistic regression in R to assess the effect of 3 IVs (GDP, n. of bilateral agreements, HDI - all in log) on the diplomatic ranking (DV) ascribed to different countries (ranges from 0 to 3), for a sample of 142 countries.
The model returns very high coefficient values and confidente intervals for HDI in comparison to the other variables. What is a reasonable threshold to check if the effect is true or due to error?
Link to reproducible data is here: https://1drv.ms/u/s!Au9STQ1wQqQuhKZFKTIHOygjdEx2gw?e=dX7IaJ
R script below:

library(MASS)

csRank <- read.csv(file="CSrank.csv")
csRank$rank <- as.factor(csRank$rank)

# formulas
rankForm <- rank ~ gdp_bi_ln + spRelct_ln + undp_hdim_ln
rankForms <- rank ~ scale(gdp_bi_ln) + scale(spRelct_ln) + scale(undp_hdim_ln)

# regressions
## un-standardized values
polr1 <- polr(rankForm, data=csRank, Hess=T)
summary(polr1)
confint(polr1)

### Odds ratio and Confidence Interval
or1 <- exp(cbind(OR = coef(polr1), confint(polr1)))
or1

## standardized independent variables 
polr1s <- polr(rankForms, data=csRank, Hess=T)
summary(polr1s)
confint(polr1s)

### OR and CI
or1s <- exp(cbind(OR = coef(polr1s), confint(polr1s)))
or1s

And the output:

> summary(polr1)
Call:
polr(formula = rankForm, data = csRank, Hess = T)

Coefficients:
               Value Std. Error t value
gdp_bi_ln     0.2679     0.1286   2.083
spRelct_ln    0.6138     0.1835   3.346
undp_hdim_ln 15.2589     2.3641   6.454

Intercepts:
    Value   Std. Error t value
1|2 -3.7208  0.9740    -3.8200
2|3  0.5994  0.9151     0.6550
3|4  3.4857  1.0479     3.3265

Residual Deviance: 164.6138 
AIC: 176.6138 
(8 observations deleted due to missingness)


> confint(polr1)
Waiting for profiling to be done...
                  2.5 %     97.5 %
gdp_bi_ln     0.0173982  0.5255147
spRelct_ln    0.2661556  0.9897550
undp_hdim_ln 11.0336190 20.3249321


> or1
                       OR        2.5 %       97.5 %
gdp_bi_ln    1.307259e+00     1.017550 1.691329e+00
spRelct_ln   1.847509e+00     1.304938 2.690575e+00
undp_hdim_ln 4.235173e+06 61921.266746 6.714379e+08


> summary(polr1s)
Call:
polr(formula = rankForms, data = csRank, Hess = T)

Coefficients:
                     Value Std. Error t value
scale(gdp_bi_ln)    0.5806     0.2788   2.083
scale(spRelct_ln)   0.8313     0.2485   3.346
scale(undp_hdim_ln) 3.5968     0.5573   6.455

Intercepts:
    Value   Std. Error t value
1|2 -0.7578  0.3173    -2.3885
2|3  3.5624  0.5150     6.9167
3|4  6.4487  0.7827     8.2391

Residual Deviance: 164.6138 
AIC: 176.6138 
(8 observations deleted due to missingness)


> confint(polr1s)
Waiting for profiling to be done...
                         2.5 %   97.5 %
scale(gdp_bi_ln)    0.03770774 1.138963
scale(spRelct_ln)   0.36041268 1.340270
scale(undp_hdim_ln) 2.60079938 4.790910


> or1s
                           OR     2.5 %     97.5 %
scale(gdp_bi_ln)     1.787183  1.038428   3.123527
scale(spRelct_ln)    2.296250  1.433921   3.820076
scale(undp_hdim_ln) 36.482382 13.474505 120.410941

Rafael
  • 43
  • 3

1 Answers1

3

In general, there is no context-free number that is 'too large'. It can be that some values don't make sense, given what they mean, though. If numbers seem too large for the context, one thing to consider for models with categorical outcomes is separation. If that happened, you'd typically have very large slopes, but also very large standard errors (which you don't here).

Otherwise, a basic thing to do with any dataset / model is to make some plots and look at your data. Below I made a boxplot of undp_hdim_ln based on the ranks. It looks reasonable to me that this variable would have a high slope in differentiating between the ranks.

enter image description here

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • thanks for the respose @gung. It would also be possible that undp_hdim_ln has a slope so much higher than the others because of its measurement scale, right? (the other variables GDPbi and spRel range up to 1000 prior to log transformation, while HDI is a 0.0 to 1.0 decimal score) – Rafael Sep 19 '19 at 19:01
  • I'm not sure I follow that, @Rafael. The data all seem to be negative. Did you mean the interval is [-1, 0]? Certainly it is true that slopes are keyed to the scales of the constituent variables, so you get larger absolute numbers if the relationship is the same but you normalize X to a smaller range of values (eg, fewer kilometers than millimeters). But a big part of this is that 75% of the X values for rank 1 are below 75% of the X-values for rank 2, etc. – gung - Reinstate Monica Sep 19 '19 at 19:13