2

I have the two following regression: $$y=X_1\beta_1+\varepsilon_1$$ and $$y=X_2\beta_2+\varepsilon_2$$

$x_1$ and $x_2$ are indicator variables and very similar. One expert claim that the two coefficients are the same. I want to know what kind of test I can perform to see if $β_1=β_2$.

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
user150086
  • 21
  • 2

2 Answers2

1

If the two predictors are orthogonal, an optimal test is to compare the model

$$ y_i \sim \beta_0 + \beta_1 X_1 + \beta_2 X_2 $$

with the submodel

$$ y_i \sim \beta_0 + \beta_1 (X_1 + X_2) $$

with the likelihood ratio test. The null hypothesis is that $\beta_1 = \beta_2$.

  • My main problem is that the two variable are highly correlated and I want to test if they are different and pick one of them in my model. – user150086 Feb 21 '17 at 00:50
  • If they're highly correlated, you may just reduce it to a single variable with PCA. There's no need to do what you're asking. – SmallChess Feb 21 '17 at 01:23
  • 1
    @StudentT, Not sure there's any point in doing PCA on two binary variables. The combination can only take on four values. – whatisleverage Feb 21 '17 at 02:47
  • @user150086, if they are that closely related, the difference in effect size probably isn't large, so you will have low power to detect a statistically significant difference between their effects no matter what you do. You could just pick the one with a larger effect size, or the one that optimizes some predictive criteria. – whatisleverage Feb 21 '17 at 02:50
  • If you want to talk about a likelihood ratio test, you have to assume something about the distribution of $\varepsilon_i.$ – Michael Hardy Feb 21 '17 at 03:18
  • 2
    Yes @MichaelHardy. I assumed we were talking about the normal errors linear model, where you could use a LRT or an F test. – whatisleverage Feb 21 '17 at 04:31
0

You are analyzing a single model $y=X\beta+\varepsilon$. The regression model will return $y=\beta_0+X_1\beta_1$ with $\beta_0$ being the intercept. To determine the difference between the two models you need to look at both the slope $\beta_1$ and the intercept $\beta_0$. You can use the indicator variable as a regressor by itself and in combination with your other independent variables. If the indicator variable is $d$ it will be $d=0$ for the $X_1$s you include above and $d=1$ for the $X_2$s. The model becomes $y\sim d, X, (X*d)$. The regression will return $y=\beta_0+\beta_1X+\beta_2d+\beta_3Xd$. When $d=0$ the coefficients $\beta_2$ and $\beta_3$ vanish. When $d=1$ the intercept is $(\beta_0+\beta_2)$ and the slope is $(\beta_1+\beta_3)X$. Keep the $\beta$s with significant p-values and you end up with a model that tells exactly what and where is the difference between the two groups. See Ways of comparing linear regression interepts and slopes?.

LDBerriz
  • 535
  • 3
  • 9