I was trying to reverse engineer the values of an unclear parameter by comparing R
outcomes to Stata
outcomes for an ordinal probit model (link). To my surprise I could not replicate the outcome (fake data set with 2000 observations). To be sure, I decided to replicate the ordinal probit models with a Stata
dataset. As the results (below) show, the differences in output were not a coincidence. I provided all data and code for both Stata and R below
Although I did not expect the results to be completely identical, they are off by quite some margin.
Because of this, I have two questions:
- Is there an
R
package that is able to replicate theStata
results, so that I can reverse engineer the values that I am looking for? - Can anyone shed any sort of light on these differences. What is it about these models that makes these differences so large and what does that say about any result from an ordinal model?
References: https://www.stata.com/manuals14/roprobit.pdf
NOTE: Everyone, please feel encouraged to check if there are any mistakes in my code that might explain the differences
Replication
In Stata, with oprobit
:
use http://www.stata-press.com/data/r14/fullauto
oprobit rep77 foreign length mpg
Iteration 0: log likelihood = -89.895098
Iteration 1: log likelihood = -78.106316
Iteration 2: log likelihood = -78.020086
Iteration 3: log likelihood = -78.020025
Iteration 4: log likelihood = -78.020025
Ordered probit regression Number of obs = 66
LR chi2(3) = 23.75
Prob > chi2 = 0.0000
Log likelihood = -78.020025 Pseudo R2 = 0.1321
------------------------------------------------------------------------------
rep77 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign | 1.704861 .4246796 4.01 0.000 .8725037 2.537217
length | .0468675 .012648 3.71 0.000 .022078 .0716571
mpg | .1304559 .0378628 3.45 0.001 .0562463 .2046656
-------------+----------------------------------------------------------------
/cut1 | 10.1589 3.076754 4.128577 16.18923
/cut2 | 11.21003 3.107527 5.119389 17.30067
/cut3 | 12.54561 3.155233 6.361467 18.72975
/cut4 | 13.98059 3.218793 7.671874 20.28931
------------------------------------------------------------------------------
In R
, with polr
from the MASS
package:
library(MASS)
polr <- polr(as.ordered(rep77) ~ foreign + length + mpg, Hess=TRUE, data=fullauto);summary(polr)
Call:
polr(formula = as.ordered(rep77) ~ foreign + length + mpg, data = fullauto,
Hess = TRUE)
Coefficients:
Value Std. Error t value
foreign 2.89680 0.79106 3.662
length 0.08283 0.02277 3.637
mpg 0.23077 0.07053 3.272
Intercepts:
Value Std. Error t value
1|2 17.9273 5.5637 3.2222
2|3 19.8649 5.6071 3.5428
3|4 22.1031 5.7188 3.8650
4|5 24.6919 5.9037 4.1825
Residual Deviance: 156.5014
AIC: 170.5014
In R
, with vglm
from the VGAM
package:
library(VGAM)
vglm <- vglm(as.ordered(rep77) ~ foreign + length + mpg, family=propodds, data=fullauto);summary(vglm)
Call:
vglm(formula = as.ordered(rep77) ~ foreign + length + mpg, family = propodds,
data = fullauto)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 -17.92738 5.47221 -3.276 0.001053 **
(Intercept):2 -19.86496 5.52892 -3.593 0.000327 ***
(Intercept):3 -22.10321 5.63780 -3.921 8.84e-05 ***
(Intercept):4 -24.69203 5.81757 -4.244 2.19e-05 ***
foreign 2.89684 0.75766 3.823 0.000132 ***
length 0.08283 0.02252 3.677 0.000236 ***
mpg 0.23076 0.06805 3.391 0.000696 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Names of linear predictors: logitlink(P[Y>=2]), logitlink(P[Y>=3]), logitlink(P[Y>=4]), logitlink(P[Y>=5])
Residual deviance: 156.5014 on 257 degrees of freedom
Log-likelihood: -78.2507 on 257 degrees of freedom
Number of Fisher scoring iterations: 6
No Hauck-Donner effect found in any of the estimates
Exponentiated coefficients:
foreign length mpg
18.116799 1.086354 1.259563
In R
, with clm
from the ordinal
package:
clm <- clm(as.ordered(rep77) ~ foreign + length + mpg, data=fullauto)
summary(clm)
formula: as.ordered(rep77) ~ foreign + length + mpg
data: fullauto
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 66 -78.25 170.50 6(0) 5.65e-10 8.9e+07
Coefficients:
Estimate Std. Error z value Pr(>|z|)
foreign 2.89681 0.79064 3.664 0.000248 ***
length 0.08283 0.02272 3.646 0.000267 ***
mpg 0.23077 0.07045 3.275 0.001055 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 17.927 5.551 3.229
2|3 19.865 5.596 3.550
3|4 22.103 5.709 3.872
4|5 24.692 5.891 4.192
(8 observations deleted due to missingness)
In R
, with orm
from the rms
package:
library(rms)
orm <- orm(as.ordered(rep77) ~ foreign + length + mpg, family=probit, data=fullauto)
dd <- datadist(fullauto)
options(datadist="dd")
summary(orm)
Effects Response : as.ordered(rep77)
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
foreign 0 1.00 1.00 1.70480 0.42468 0.87249 2.5372
length 170 203.75 33.75 1.58180 0.42687 0.74512 2.4184
mpg 18 24.75 6.75 0.88057 0.25557 0.37966 1.3815
DATA for R
fullauto <- structure(list(make = structure(c(1, 1, 1, 2, 2, 3, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8,
9, 10, 10, 11, 11, 12, 12, 12, 13, 14, 14, 14, 14, 14, 14, 15,
15, 15, 15, 15, 15, 15, 16, 17, 17, 17, 17, 17, 18, 18, 18, 18,
18, 18, 19, 20, 21, 21, 21, 22, 22, 22, 22, 23), label = "Make", format.stata = "%8.0g", class = c("haven_labelled",
"vctrs_vctr", "double"), labels = c(AMC = 1, Audi = 2, BMW = 3,
Buick = 4, Cad. = 5, Chev. = 6, Datsun = 7, Dodge = 8, Fiat = 9,
Ford = 10, Honda = 11, Linc. = 12, Mazda = 13, Merc. = 14, Olds = 15,
Peugeot = 16, Plym. = 17, Pont. = 18, Renault = 19, Subaru = 20,
Toyota = 21, VW = 22, Volvo = 23)), model = structure(c(1, 2,
3, 4, 5000, 320, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 200, 210, 510, 810, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 98, 604, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 260), label = "Model", format.stata = "%8.0g", class = c("haven_labelled",
"vctrs_vctr", "double"), labels = c(Concord = 1, Pacer = 2, Spirit = 3,
Fox = 4, Century = 5, Electra = 6, LeSabre = 7, Opel = 8, Regal = 9,
Riviera = 10, Skylark = 11, Deville = 12, Eldrado = 13, Seville = 14,
Chevette = 15, Impala = 16, Malibu = 17, MCarlo = 18, Monza = 19,
Nova = 20, Colt = 21, Diplomat = 22, Magnum = 23, StRegis = 24,
Strada = 25, Fiesta = 26, Mustang = 27, Accord = 28, Civic = 29,
Cntntl = 30, `Mark V` = 31, Vrsills = 32, GLC = 33, Bobcat = 34,
Cougar = 35, `XR-7` = 36, Marquis = 37, Monarch = 38, Zephyr = 39,
Cutlass = 40, CutlSupr = 41, `Delta 88` = 42, Omega = 43, Starfire = 44,
Toronado = 45, Arrow = 46, Champ = 47, Horizon = 48, Sapporo = 49,
Volare = 50, Catalina = 51, Firebird = 52, GranPrix = 53, `Le Mans` = 54,
Phoenix = 55, Sunbird = 56, `Le Car` = 57, Subaru = 58, Celica = 59,
Corolla = 60, Corona = 61, Rabbit = 62, Diesel = 63, Scirocco = 64,
Dasher = 65)), price = structure(c(4099, 4749, 3799, 6295, 9690,
9735, 4816, 7827, 5788, 4453, 5189, 10372, 4082, 11385, 14500,
15906, 3299, 5705, 4504, 5104, 3667, 3955, 6229, 4589, 5079,
8129, 3984, 4010, 5886, 6342, 4296, 4389, 4187, 5799, 4499, 11497,
13594, 13466, 3995, 3829, 5379, 6303, 6165, 4516, 3291, 4733,
5172, 4890, 4181, 4195, 10371, 8814, 12990, 4647, 4425, 4482,
6486, 4060, 5798, 4934, 5222, 4723, 4424, 4172, 3895, 3798, 5899,
3748, 5719, 4697, 5397, 6850, 7140, 11995), label = "Price", format.stata = "%8.0g"),
mpg = structure(c(22, 17, 22, 23, 17, 25, 20, 15, 18, 26,
20, 16, 19, 14, 14, 21, 29, 16, 22, 22, 24, 19, 23, 35, 24,
21, 30, 18, 16, 17, 21, 28, 21, 25, 28, 12, 12, 14, 30, 22,
14, 14, 15, 18, 20, 19, 19, 18, 19, 24, 16, 21, 14, 38, 34,
25, 26, 18, 18, 18, 19, 19, 19, 24, 26, 35, 18, 31, 18, 25,
41, 25, 23, 17), label = "Mileage (mpg)", format.stata = "%8.0g"),
rep78 = structure(c(3, 3, NA, 3, 5, 4, 3, 4, 3, NA, 3, 3,
3, 3, 2, 3, 3, 4, 3, 2, 2, 3, 4, 5, 4, 4, 5, 2, 2, 2, 3,
4, 3, 5, 4, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 4, 3, 1,
3, 4, NA, 3, 5, 3, NA, 2, 4, 1, 3, 3, NA, 2, 3, 5, 5, 5,
5, 4, 5, 4, 4, 5), label = "Repair Record 1978", format.stata = "%9.0g", class = c("haven_labelled",
"vctrs_vctr", "double"), labels = c(Poor = 1, Fair = 2, Average = 3,
Good = 4, Excellent = 5)), rep77 = structure(c(2, 1, NA,
3, 2, 4, 3, 4, 4, NA, 3, 4, 3, 3, 2, 3, 3, 4, 3, 3, 2, 3,
3, 5, 4, 4, 4, 2, 2, 2, 1, NA, 3, 5, 4, 4, 4, 3, 4, 3, 3,
4, 2, NA, 3, 3, 4, 4, 3, 1, 3, 4, NA, 3, 4, NA, NA, 2, 4,
2, 3, 3, NA, 2, 3, 4, 5, 5, 5, 3, 4, 3, 3, 3), label = "Repair Record 1977", format.stata = "%9.0g", class = c("haven_labelled",
"vctrs_vctr", "double"), labels = c(Poor = 1, Fair = 2, Average = 3,
Good = 4, Excellent = 5)), hdroom = structure(c(2.5, 3, 3,
2.5, 3, 2.5, 4.5, 4, 4, 3, 2, 3.5, 3.5, 4, 3.5, 3, 2.5, 4,
3.5, 2, 2, 3.5, 1.5, 2, 2.5, 2.5, 2, 4, 4, 4.5, 2.5, 1.5,
2, 3, 2.5, 3.5, 2.5, 3.5, 3.5, 3, 3.5, 3, 3.5, 3, 3.5, 4.5,
2, 4, 4.5, 2, 3.5, 4, 3.5, 2, 2.5, 4, 1.5, 5, 4, 1.5, 2,
3.5, 3.5, 2, 3, 2.5, 2.5, 3, 2, 3, 3, 2, 2.5, 2.5), label = "Headroom (in.)", format.stata = "%6.1f"),
rseat = structure(c(27.5, 25.5, 18.5, 28, 27, 26, 29, 31.5,
30.5, 24, 28.5, 30, 27, 31.5, 30, 30, 26, 29.5, 28.5, 28.5,
25, 27, 21, 23.5, 22, 27, 24, 29, 29, 28, 26.5, 26, 23, 25.5,
23.5, 30.5, 28.5, 27, 25.5, 25.5, 29.5, 25, 30.5, 27, 29,
28, 28, 29, 27, 25.5, 30, 31.5, 30.5, 21.5, 23, 25, 22, 31,
29, 23.5, 28.5, 28, 27, 25, 23, 25.5, 22, 24.5, 23, 25.5,
25.5, 23.5, 37.5, 29.5), label = "Rear Seat (in.)", format.stata = "%6.1f"),
trunk = structure(c(11, 11, 12, 11, 15, 12, 16, 20, 21, 10,
16, 17, 13, 20, 16, 13, 9, 20, 17, 16, 7, 13, 6, 8, 8, 8,
8, 17, 17, 21, 16, 9, 10, 10, 5, 22, 18, 15, 11, 9, 16, 16,
23, 15, 17, 16, 16, 20, 14, 10, 17, 20, 14, 11, 11, 17, 8,
16, 20, 7, 16, 17, 13, 7, 10, 11, 14, 9, 11, 15, 15, 16,
12, 14), label = "Trunk space (cu. ft.)", format.stata = "%8.0g"),
weight = structure(c(2930, 3350, 2640, 2070, 2830, 2650,
3250, 4080, 3670, 2230, 3280, 3880, 3400, 4330, 3900, 4290,
2110, 3690, 3180, 3220, 2750, 3430, 2370, 2020, 2280, 2750,
2120, 3600, 3600, 3740, 2130, 1800, 2650, 2240, 1760, 4840,
4720, 3830, 1980, 2580, 4060, 4130, 3720, 3370, 2830, 3300,
3310, 3690, 3370, 2730, 4030, 4060, 3420, 3260, 1800, 2200,
2520, 3330, 3700, 3470, 3210, 3200, 3420, 2690, 1830, 2050,
2410, 2200, 2670, 1930, 2040, 1990, 2160, 3170), label = "Weight (lbs.)", format.stata = "%8.0g"),
length = structure(c(186, 173, 168, 174, 189, 177, 196, 222,
218, 170, 200, 207, 200, 221, 204, 204, 163, 212, 193, 200,
179, 197, 170, 165, 170, 184, 163, 206, 206, 220, 161, 147,
179, 172, 149, 233, 230, 201, 154, 169, 221, 217, 212, 198,
195, 198, 198, 218, 200, 180, 206, 220, 192, 170, 157, 165,
182, 201, 214, 198, 201, 199, 203, 179, 142, 164, 174, 165,
175, 155, 155, 156, 172, 193), label = "Length (in.)", format.stata = "%8.0g"),
turn = structure(c(40, 40, 35, 36, 37, 34, 40, 43, 43, 34,
42, 43, 42, 44, 43, 45, 34, 43, 31, 41, 40, 43, 35, 32, 34,
38, 35, 46, 46, 46, 36, 33, 43, 36, 34, 51, 48, 41, 33, 39,
48, 45, 44, 41, 43, 42, 42, 42, 43, 40, 43, 43, 38, 37, 37,
36, 38, 44, 42, 42, 45, 40, 43, 41, 34, 36, 36, 35, 36, 35,
35, 36, 36, 37), label = "Turn Circle (ft.) ", format.stata = "%8.0g"),
displ = structure(c(121, 258, 121, 97, 131, 121, 196, 350,
231, 304, 196, 231, 231, 425, 350, 350, 231, 250, 200, 200,
151, 250, 119, 85, 119, 146, 98, 318, 318, 225, 105, 98,
140, 107, 91, 400, 400, 302, 86, 140, 302, 302, 302, 250,
140, 231, 231, 231, 231, 151, 350, 350, 163, 156, 86, 105,
119, 225, 231, 231, 231, 231, 231, 151, 79, 97, 134, 97,
134, 89, 90, 97, 97, 163), label = "Displacement (cu. in.)", format.stata = "%8.0g"),
gratio = structure(c(3.57999992370605, 2.52999997138977,
3.07999992370605, 3.70000004768372, 3.20000004768372, 3.64000010490417,
2.9300000667572, 2.41000008583069, 2.73000001907349, 2.86999988555908,
2.9300000667572, 2.9300000667572, 3.07999992370605, 2.27999997138977,
2.19000005722046, 2.24000000953674, 2.9300000667572, 2.55999994277954,
2.73000001907349, 2.73000001907349, 2.73000001907349, 2.55999994277954,
3.89000010490417, 3.70000004768372, 3.53999996185303, 3.54999995231628,
3.53999996185303, 2.47000002861023, 2.47000002861023, 2.94000005722046,
3.36999988555908, 3.15000009536743, 3.07999992370605, 3.04999995231628,
3.29999995231628, 2.47000002861023, 2.47000002861023, 2.47000002861023,
3.73000001907349, 2.73000001907349, 2.75, 2.75, 2.25999999046326,
2.4300000667572, 3.07999992370605, 2.9300000667572, 2.9300000667572,
2.73000001907349, 3.07999992370605, 2.73000001907349, 2.41000008583069,
2.41000008583069, 3.57999992370605, 3.04999995231628, 2.97000002861023,
3.36999988555908, 3.53999996185303, 3.23000001907349, 2.73000001907349,
3.07999992370605, 2.9300000667572, 2.9300000667572, 3.07999992370605,
2.73000001907349, 3.72000002861023, 3.80999994277954, 3.05999994277954,
3.21000003814697, 3.04999995231628, 3.77999997138977, 3.77999997138977,
3.77999997138977, 3.74000000953674, 2.98000001907349), label = "Gear Ratio", format.stata = "%6.2f"),
order = structure(c(1, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 47, 48, 49, 50, 51, 52, 46, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
73, 74), label = "Original order", format.stata = "%8.0g"),
foreign = structure(c(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), label = "Foreign", format.stata = "%8.0g", class = c("haven_labelled",
"vctrs_vctr", "double"), labels = c(Domestic = 0, Foreign = 1
)), wgtd = structure(c(2930, 3350, 2640, NA, NA, NA, 3250,
4080, 3670, 2230, 3280, 3880, 3400, 4330, 3900, 4290, 2110,
3690, 3180, 3220, 2750, 3430, NA, NA, NA, NA, 2120, 3600,
3600, 3740, NA, 1800, 2650, NA, NA, 4840, 4720, 3830, NA,
2580, 4060, 4130, 3720, 3370, 2830, 3300, 3310, 3690, 3370,
2730, 4030, 4060, NA, 3260, 1800, 2200, 2520, 3330, 3700,
3470, 3210, 3200, 3420, 2690, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), format.stata = "%9.0g"), wgtf = structure(c(NA,
NA, NA, 2070, 2830, 2650, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 2370, 2020, 2280, 2750, NA,
NA, NA, NA, 2130, NA, NA, 2240, 1760, NA, NA, NA, 1980, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3420, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1830, 2050, 2410,
2200, 2670, 1930, 2040, 1990, 2160, 3170), format.stata = "%9.0g")), label = "Automobile Models", row.names = c(NA,
-74L), class = c("tbl_df", "tbl", "data.frame"))