Comparison of β of two linear regressions

Question

In my empirical research I use data from two waves of a survey: wave $i$ and wave $j$. My aim is to find out whether there is a difference in the magnitude of the influence of certain explanatory variables on the dependent variable. To test this question I run a separate regression for each wave ($i$ and $j$):
\begin{align} Y_i &= β_{1i}X_{1i} + β_{2i}X_{2i} \tag{1} \\ Y_j &= β_{1j}X_{1j} + β_{2j}X_{2j} \tag{2} \end{align} Next I test the significance of ($β_{1i} - β_{1j}$) and ($β_{2i} – β_{2j}$). I would like to make sure that this is the right way to test the significance of the changes in the influence.

How exactly do you test the significance of the differences? Also, if you're comparing "magnitudes," then wouldn't you want to look at the differences of the absolute values of the coefficients? — whuber, Jul 18 '18 at 16:45
I use a Wald test. The focus is on the significance of the difference of betas, so the absolute value is just a matter of semantics. — yarela, Jul 19 '18 at 05:36
Well, that's important semantics, because it changes the question. I understand a Wald test to be part of a Maximum Likelihood procedure, but you describe *two* separate procedures, which leaves one wondering exactly what you might mean by a Wald test in this context. This would lead us to supposing you conceive of your two regressions as being one ML procedure, but that's possible only by assuming the data in the waves are independent. Is that really the case? — whuber, Jul 19 '18 at 12:35

score 5 · Answer 1 · answered Jul 18 '18 at 18:34

This is straightforward using SEM software, like lavaan, to run two separate simultaneous regression models with and without constraints. You can use a Wald test to test whether individual parameters (e.g., the difference in coefficients) is different from 0, or you can do a likelihood ratio test to compare constrained and unconstrained models.

The unconstrained model:

model1 <- '
    Yi ~ ai*X1i + bi*X2i
    Yj ~ aj*X1j + bj*X2j
    Yi~~Yj
    adiff := aj - ai
    bdiff := bj - bi'
fit1 <- sem(model1, data = data)
summary(fit1)

The tests for adiff and bdiff are tests of whether the individual parameter estimates differ between the two models. These tests allow the residual variance of each regression to differ. Note that you need to allow the residual variances to correlate, since they probably do if they are of the same outcome at different times.

Next, the constrained model, which holds the coefficients equal between the two models:

model2 <- '
    Yi ~ a*X1i + b*X2i
    Yj ~ a*X1j + b*X2j
    Yi~~Yj'
fit2 <- sem(model2, data = data)
summary(fit2)

Finally, compare the fit of the two models using a likelihood ratio test.

lavTestLRT(fit2, fit1)

If significant, there is evidence that at least one of the parameters differs between the models. Otherwise, there is not enough evidence that the parameters differ; unconstraining the coefficients does not add much to the fit of the model.

Thank you! I did use a Wald test. However a reviewer of the paper recommended to use a single regression with an indicator for waves (as mentioned below), so I wanted to make sure that my way is fine. — yarela, Jul 18 '18 at 18:58
There are a few problems with that method. The first is that you wouldn't expect the residual variances to be the same across waves, so you need to account for that. The second is that within person, you would expect outcomes to be correlated, so you need to adjust for that too. If you start using fixed or random effects you end up estimating more parameters than you need and moving further from your original model. I think a Wald test from two simultaneous regression models is your best bet. — Noah, Jul 18 '18 at 19:45

Heteroskedastic Jim · Answer 2 · 2018-10-29T02:22:30.857

Having seen the OP's comment on a response stating a reviewer's preference for regression with dummies, I demonstrate that below. Essentially, I repeat what others users have said about using dummies and interactions to control predictor entry in the model. However, we can use generalized least squares or any multilevel software that will allow a model for the variances to account for potential heteroskedastic nature of the responses from different waves. This approach is equivalent to using SEM software.

library(lavaan) # For comparison to regression only
library(nlme) # For GLS

# Data prep
dat <- clubSandwich::MortalityRates[, c(
  "year", "state", "mrate", "cause", "totpercap", "spiritpercap")]
dat <- dat[dat$cause == "Motor Vehicle" & dat$year %in% c(1970, 1980), ]
dat <- dat[, c("year", "state", "mrate", "totpercap", "spiritpercap")]
head(dat)

    year state    mrate totpercap spiritpercap
2   1970     1 62.18053      1.38         0.70
42  1980     1 42.21172      1.86         0.74
110 1970     2 61.80470      3.62         2.07
150 1980     2 30.05453      3.76         1.70
218 1970     4 84.74664      2.73         1.05
258 1980     4 68.25127      3.03         0.98

We now have data from each state in 1970 and 1980. mrate will serve as the response variable. totpercap and spiritpercap as predictors.

dat$year.f <- factor(dat$year) # We make the year a dummy variable

Now we can run our regression models. First, a model where all predictors make year specific predictions. We use a general correlation structure (corSymm) to permit the response to correlate since two responses from the same state by different years are not independent. We permit the response to have different variances by year using the varIdent argument:

(m0 <- gls(
  mrate ~ 0 + year.f + year.f:totpercap + year.f:spiritpercap, dat,
  corSymm(form = ~ 1 | state), varIdent(form = ~ 1 | year.f),
  method = "ML"))

Generalized least squares fit by maximum likelihood
  Model: mrate ~ 0 + year.f + year.f:totpercap + year.f:spiritpercap 
  Data: dat 
  Log-likelihood: -435.8944

Coefficients:
             year.f1970              year.f1980    year.f1970:totpercap    year.f1980:totpercap 
             67.5682105              47.8940439              -0.9308490              12.9756910 
year.f1970:spiritpercap year.f1980:spiritpercap 
              0.9838328             -21.1029073 

Correlation Structure: General
 Formula: ~1 | state 
 Parameter estimate(s):
 Correlation: 
  1    
2 0.822
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | year.f 
 Parameter estimates:
     1970      1980 
1.0000000 0.8592329 
Degrees of freedom: 102 total; 96 residual
Residual standard error: 24.84121

The predictors enter the model as interactions with the year dummies. The residual standard error is the value for 1970, we multiply the reported value by .859 to obtain the residual standard error for 1980.

Next, a model constraining the totpercap coefficient to be the same across waves, dump the interaction:

coef(mtot <- gls(
  mrate ~ 0 + year.f + totpercap + year.f:spiritpercap, dat,
  corSymm(form = ~ 1 | state), varIdent(form = ~ 1 | year.f),
  method = "ML"))

             year.f1970              year.f1980               totpercap year.f1970:spiritpercap 
               56.38870                49.51440                10.64116               -14.55980 
year.f1980:spiritpercap 
              -16.62852

Similarly for spiritpercap:

mspi <- gls(
  mrate ~ 0 + year.f + year.f:totpercap + spiritpercap, dat,
  corSymm(form = ~ 1 | state), varIdent(form = ~ 1 | year.f),
  method = "ML")

Finally, a model where all coefficients are constrained to be the same across years:

coef(mtotspi <- gls(
  mrate ~ 0 + year.f + totpercap + spiritpercap, dat,
  corSymm(form = ~ 1 | state), varIdent(form = ~ 1 | year.f),
  method = "ML"))

year.f1970   year.f1980    totpercap spiritpercap
59.240021    51.043122     7.910692   -11.070475

We can then ask, should we constrain totpercap using a model comparison approach:

anova(m0, mtot)

     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m0       1  9 889.7887 913.4135 -435.8944                        
mtot     2  8 890.4969 911.4967 -437.2484 1 vs 2 2.708178  0.0998

This permits comparison using IC and likelihood ratio tests. For what it's worth, analysis like this is preferably done with larger sample sizes.

We can do the same for spiritpercap using: anova(m0, mspi). We can ask if it is okay to constrain both: anova(m0, mtotspi).

This is how a regression-based approach would proceed. It is exactly equivalent to the SEM-based approach. I will run the equivalent SEM models and simply report the log likelihoods for comparison. First, transform the data from long to wide format for SEM:

dat.for.lavaan <- reshape(
  dat, idvar = "state", timevar = "year", direction = "wide")

And run the freest SEM model, meanstructure is required to estimate year intercepts as in regression:

logLik(
  sem("
mrate.1970 ~ totpercap.1970 + spiritpercap.1970
mrate.1980 ~ totpercap.1980 + spiritpercap.1980
mrate.1970 ~~ mrate.1980", dat.for.lavaan, meanstructure = TRUE)
)
'log Lik.' -435.8944 (df=9)

logLik(m0) # equivalent multilevel model
'log Lik.' -435.8944 (df=9)

And all constrained:

logLik(
  sem("
mrate.1970 ~ a * totpercap.1970 + b * spiritpercap.1970
mrate.1980 ~ a * totpercap.1980 + b * spiritpercap.1980
mrate.1970 ~~ mrate.1980", dat.for.lavaan, meanstructure = TRUE)
)
'log Lik.' -437.3729 (df=7)

logLik(mtotspi) # equivalent multilevel model
'log Lik.' -437.3729 (df=7)

So the SEM approach and regression approach are equivalent. You can compare parameter estimates. When they differ, it is usually a question of needing to standardize one. For example, the residual covariance between the 1970 outcome and 1980 in SEM if standardized will be the same as the GLS estimate of the correlation between the responses.

score 1 · Answer 3 · answered Jul 18 '18 at 17:52

1

You may run a single regression with indicator variable (say $Z$ being 1 for wave 1 and 0 for wave 2) and its interactions with $X$'s.

Your model now becomes

$$Y=\beta_1X_1+\beta_2X_2+\beta_3Z+\gamma_1X_1Z+\gamma_2X_2Z$$

Significance of $\gamma_i$ now means that effect of $X_i$ on $Y$ is significantly different for wave 1 and wave 2.

answered Jul 18 '18 at 17:52

Łukasz Deryło

3,735
1
10
26

2

Note that you wouldn't necessarily expect the residual variance to be the same in each wave, so you would need to correct for heteroscedasticity with a robust standard error. – Noah Jul 18 '18 at 18:18
To be honest, my first idea was to calclulate 95%CI's for $\beta$'s in two separate regressions (one for each wave) and see if they overlap. – Łukasz Deryło Jul 18 '18 at 18:26
1

In general that is not a good way of testing whether two parameters differ from each other because. It can be an okay heuristic but strictly speaking it is not a valid test. – Noah Jul 18 '18 at 18:35
I know that but it is a hands on method that overcomes heteroscedasticity problem you mentioned – Łukasz Deryło Jul 18 '18 at 18:37
This heuristic was analyzed a while back at https://stats.stackexchange.com/questions/18215. For instance, the lack of overlap of two independent two-sided symmetric 95% CIs amounts to a rejection of the null (of equal parameter values) at approximately the 0.5% level, not the 5% level. – whuber Jul 19 '18 at 02:17

score 0 · Answer 4 · answered Jul 18 '18 at 17:29

The usual interpretation would be - one unit increase in variable X1 leads to Beta1 increase in Y. You could extend this to understand what the difference in impact of X1 is on Y across the two waves, assuming the explanatory and target variables remain the same. However I do not understand how you could possibly calculate the significance of this difference between them.

Comparison of β of two linear regressions

4 Answers4