Joint model with interaction terms vs. separate regressions for a group comparison

Question

After gathering valuable feedback from previous questions and discussions, I have came up with the following question: Suppose that the goal is to detect effect differences across two groups, male vs. female for example. There are two ways to do it:

running two separate regressions for the two groups, and employ Wald test to reject (or not) the null hypothesis $H_0$: $b_1-b_2=0$, where $b_1$ is the coefficient of one IV in male regression, and $b_2$ is the coefficient of the same IV in female regression.
pool the two groups together, and run a joint model by including a gender dummy and an interaction term (IV*genderdummy). Then, the detection of the group effect will be based on the sign of interaction and the t-test for significance.

What if Ho is rejected in case (1), i.e. group difference is significant, but the coefficient of interaction term in case (2) is statistically insignificant, i.e. group difference is insignifant. Or vice versa, Ho is not rejected in case (1), and interaction term is significant in case (2). I have ended up with this outcome several times, and I was wondering what outcome would be more reliable, and what is the reason behind this contradiction.

the difference between procedures is that one assume same variance for both groups. The separate analysis assumes different variances. — probabilityislogic, Aug 05 '12 at 05:17
Thanks a lot!Are you aware please of any reference discussing the issue of variances when comparing different models? — Bill718, Aug 05 '12 at 14:03
Related: https://stats.stackexchange.com/questions/373890/separate-models-vs-flags-in-the-same-model/373909#373909 — kjetil b halvorsen, Mar 13 '21 at 01:03

score 7 · Answer 1 · answered Aug 05 '12 at 04:05

7

The first model will fully interact gender with all other covariates in the model. Essentially, the effect of each covariate (b2, b3... bn). In the second model, the effect of gender is only interacted with your IV. So, assuming you have more covariates than just the IV and gender, this may drive somewhat different results.

If you just have the two covariates, there are documented occasions where the difference in maximization between the Wald test and the Likelihood ratio test lead to different answers (see more on the wikipedia).

In my own experience, I try to be guided by theory. If there is a dominant theory that suggests gender would interact with only the IV, but not the other covariates, I would go with the partial interaction.

answered Aug 05 '12 at 04:05

mCorey

363
1
6

Thanks! Yes, actually there are various covariates, not only one IV, I just mentioned one IV in the question for simplicity. The thing is that there isn't a strong theory that could support interaction between gender and certain covariates, it is exploratory analysis, so I need to experiment with many interactions and model fits; the initial model contains 30 predictors... – Bill718 Aug 05 '12 at 13:11
@Bill718 Also the separate models will have a different intercept, while the single model will not, unless you specify gender alone as an additional IV (not just as an interaction). – Robert Kubrick Mar 26 '15 at 17:53

score 5 · Answer 2 · answered Aug 05 '12 at 04:36

5

Anytime two different procedures are used to test a particular hypothesis there will different p-values. To say one is significant and the other is not can be just making a black and white decision at the 0.05 level. If one test gives a p-value of 0.03 and the other say 0.07 I would not call the results contradictory. If you are going to be that strict in thinking about significance it is easy to have either situation (i) or (ii) arise when boardline significance is the case.

As I mentioned in response to the previous question my preference for looking for an interaction is to do one combined regression.

answered Aug 05 '12 at 04:36

Michael R. Chernick

39,640
28
74
143

Yes, it is true that the combined regression seems to perform better, at least in my case, and it is a very flexible method, since someone could try with different interactions and model fits.I just wanted, by "statistical" curiosity let's say, to find out what is the reason behind the somehow different results . Regarding p-values, I have heard some people accepting significance only at a=0.5% level or less. I am more flexible, using a=1% level, but the big headache comes when the p-values are completely different. – Bill718 Aug 05 '12 at 13:39
I have seen studies for example, where one IV is very significant when an ordered logit modet is employed, while the same IV becomes insignificant when an OLS is applied. So, in that case, the explanation of the results can be a bit tricky. Thanks a lot for your comments and feedback! – Bill718 Aug 05 '12 at 13:40
+1, the point about $0.07\approx 0.03$ is an excellent one. – gung - Reinstate Monica May 27 '13 at 16:03

JDav · Answer 3 · 2012-08-05T12:33:08.780

2

In the second case, standard software would suggest you a t-stat with t-student pvalues whereas for the first case the Wald tests may have two options. Under errors normality assumption Wald statistic follows an exact Fisher statistic (which is equivalent to the t-stat as it assumes error's normality). Whereas under asymptotic normality, Wald statistic follows a Chi2 distribution (which is analague to the a t-stat following a normal distribution asimptotically) What distribution are you assuming ? Depending on this your p-values risk to give you different results.

In Textbooks you will find that for bilateral single tests (one parameter) both, t-student and Fisher statistics are equivalent.

If your sample is not large then comparing comparing chi2 and t-stat pvalues would yield different results for certain. In that case assuming an asymptotic dsitribution would not be reasonable. IF your sample is rather small then assuming normality seems more reasonable, this implies t-stat and Fisher pvalues for case 2 and 1 respectively.

edited Aug 05 '12 at 12:33

answered Aug 05 '12 at 11:34

JDav

751
4
8

Indeed, I have two samples of unequal size, the first has 3000 observations, but the second is relatively small, 500 observations. And the software reports chi-square when computing Wald statistics. So, it seems that this is the reason of discrepancy. Both samples are normally distributed though, especially in the case of the large sample. Many thanks! – Bill718 Aug 05 '12 at 13:18
1

I'm sorry to deceive you but unequal subsample sizes is not an issue. Moreover yours looks like a large sample to me. so both procedures should yield similar results. I noticed that @probabilityislogic made a good point. Using one pooled sample implies equal residual variances, so that may be a source of heterogeneity. Don't know how you are implementing the separate regression procedure, but it's easy to make mistakes if you are calculating the stat yourself. This makes the pooled regression a safe straightforward approach. – JDav Aug 05 '12 at 13:37
1

To solve for the unequal variances issue across groups (heterosckedasticity)try a White (aka Newey-west,Sandwich or Robust if you use stata)variance estimator. This approach corrects for unkown types of heteroscedascity. – JDav Aug 05 '12 at 13:43
Oh, ok, I see, actually the observations in the sample come from different regions of a country, so it is very possible I guess that heterogeneity issues exist! – Bill718 Aug 05 '12 at 14:01

Joint model with interaction terms vs. separate regressions for a group comparison

3 Answers3

Linked

Related