I'm a graduate student working on a thesis whose aim is to test complementarity between 2 practices (CI and INNO, according to theory of supermodularity). To do so, I calculated two new variables (each as a mean of 6 other variables) corresponding to the 2 practices, then derived from them 4 binary variables based on the combination of the two (that is, I used medians of the two variables as a threshold, then created four categories like 11,10,01,00: HH,LH,HL,LL
).
I took these four binaries, (plus two continuous variables and three other dummies as controls) and ran a multiple regression on Y
(operational performance), without a constant term.
I used both SPSS and Stata, but obtained two very different outcomes. I don't understand why!!! Moreover, correlation matrices gave the same results. Here are all the outcomes:
SPSS
Variabili escluse a,b
Modello Beta In t Sig. Correlazioni parziali Statistiche di collinearità
Tolleranza VIF Tolleranza minima
1 LL .c . . . ,000 . ,000
a. Variabile dipendente: PERF_medio
b. Regressione lineare che passa per l'origine
c. Predittori nel modello : industryT, industryM, industryE, HL, LH, HH,
log_age, log_size
Coefficienti non standardizzati
B STD ERROR DEV
LH ,188 ,110
HL ,156 ,115
HH ,467 ,102
log_size ,008 ,040
log_age -,003 ,039
industryE 3,416 ,244
industryM 3,487 ,255
industryT 3,551 ,246
As you can see, the LL
variable is excluded; Stata doesn't exclude this. Moreover multicollinearity tests give positive results, whereas VIF performed with Stata after regression do not.
STATA:
regress PERF_medio LL LH HL HH log_size log_age industryE industryM industryT, noconstant
note: industryM omitted because of collinearity
Source SS df MS Number of obs = 190
F( 8, 182) = 1253.59
Model 2654.3682 8 331.796025 Prob > F = 0.0000
Residual 48.1709704 182 .264675662 R-squared = 0.9822
Adj R-squared = 0.9814
Total 2702.53917 190 14.2238904 Root MSE = .51447
PERF_medio | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
LL | 3.486898 .2548016 13.68 0.000 2.984153 3.989643
LH | 3.675309 .2746033 13.38 0.000 3.133493 4.217124
HL | 3.643261 .2646596 13.77 0.000 3.121066 4.165457
HH | 3.954367 .2753474 14.36 0.000 3.411083 4.49765
log_size | .008443 .0400578 0.21 0.833 -.0705944 .0874803
log_age | -.0027773 .0393044 -0.07 0.944 -.0803283 .0747737
industryE | -.0706952 .0935333 -0.76 0.451 -.2552442 .1138538
industryM | 0 (omitted)
industryT | .0639002 .0960992 0.66 0.507 -.1257117 .253512
estat vif, uncentered
Variable | VIF 1/VIF
-------------+----------------------
log_size | 42.17 0.023711
HH | 16.90 0.059170
LL | 15.21 0.065753
log_age | 12.70 0.078726
LH | 10.26 0.097499
HL | 8.73 0.114505
industryT | 2.23 0.447811
industryE | 2.21 0.451551
-------------+----------------------
Mean VIF | 13.80
Thus, SPSS excludes LL
, but Stata does not. What's wrong? Why are coefficients so different?