2

I have one continuous dependent, two categorical independent and one continuous covariate. How I can deal with the violation of homogeneity of regression coefficients which is the ANCOVA assumption. Do I need to check this assumption when I enter the interaction between two independent variables too?

Rose Hartman
  • 2,095
  • 7
  • 30
jack
  • 33
  • 1
  • 3
  • Do you perhaps mean the [assumption of homogeneity of regression coefficients](https://en.wikipedia.org/wiki/Analysis_of_covariance#Assumption_5:_homogeneity_of_regression_slopes)? That one is specific to ANCOVAs (and the interaction you mention is relevant to testing for violations of it), unlike the assumption of [homogeneity of variance](https://stats.stackexchange.com/questions/81914/why-is-homogeneity-of-variance-so-important), which is true for most general linear models. – Rose Hartman May 25 '17 at 03:20
  • @ Rose Hartman, thanks for the comment. I mean the interaction between two independent variables. Becuase based on my understanding when we enter the interaction of IV variables into the model having parallel slopes would be meaningless. – jack May 25 '17 at 13:41
  • If you include an interaction then you're no longer running an ANCOVA, but you're still running a meaningful model. It would just be called "multiple regression" instead. Heteroscedasticity refers to the **variance** of the residuals; it has nothing to do with the presence or absence of the interaction. I think you may have asked a different question than you intended (in which case the solution by @MarkWhite isn't relevant to your problem). – Rose Hartman May 25 '17 at 23:25
  • @Rose Hartman. I read different references such as using multivariate statistics by Barbara G. Tabachnick and based on them Analysis of covariance is an extension of analysis of variance where the main effects and interactions are assessed after the effects of some other concomitant variable have been removed. – jack May 26 '17 at 03:01
  • @Rose Hartman, so in all these references, in one ANCOVA model, an interaction between two independent variables has been added into the model. So if an Ancova model with an interaction is not ANCOVA anymore why all these references called it still ANCOVA? – jack May 26 '17 at 03:05
  • Technically, an ANCOVA is an ANOVA with one additional continuous predictor added (the "covariate"). If an interaction between the covariate and a categorical predictor is also added, it's no longer called an ANCOVA but it's still a reasonable model. Some people are sloppy about the terminology. A model with interactions is also often run, and that model compared to the ANCOVA to test the assumption of homogeneity of regression coeffs, but technically the one including the interactions isn't an ANCOVA. – Rose Hartman May 26 '17 at 18:38
  • Also note that a factorial ANCOVA has two or more categorical predictors and typically includes interaction(s) between them. That's still a kind of ANCOVA. If there are interaction(s) between categorical predictor(s) and the covariate, then it's not called an ANCOVA any more. – Rose Hartman May 26 '17 at 18:39
  • Can you please respond to whether or not you intended to ask about heteroscedasticty? If not, your question needs to be edited and @MarkWhite may wish to edit his answer. – Rose Hartman May 26 '17 at 18:41
  • @Rose Hartman,you mean the interaction between a factor and a covariate not an interaction between two factors right? if there is an interaction between a factor and a covariate we cannot use ANCOVA but when there is an interaction between two factor, this interaction needs to be included in the ANCOVA model. right? my question was about heteroscedasticty of slopes. – jack May 27 '17 at 23:33
  • 1
    @Rose Hartman, In ANCOVA, the regression slopes need to be parallel and it means no interaction between a factor and a covariate. If there is an interaction between a factor and a covariate then there will be a violation of the assumption. so the equality of slope is an important assumption to check.So my question is when there is a violation of this assumption and using ANCOVA is not appropriate anymore what would be the next step and what kind of model can be used instead. – jack May 27 '17 at 23:33
  • please review definition of [heteroscedasticity](https://en.wikipedia.org/wiki/Heteroscedasticity). There is no such thing as "heteroscedasticty of slopes". I've edited your question to better reflect what I think you're asking --- please revise if I've misunderstood. – Rose Hartman May 29 '17 at 02:23

2 Answers2

2
  1. Ditch the ANCOVA and fit your model in a regression object.

  2. How bad is the heteroscedasticity? Could you include a scatterplot of fitted by residual values? Standard regression may be robust to the heteroscedasticity you have. Your heteroscedasticity could also be coming from a quadratic or cubic trend in your data—it's hard to tell without seeing the plot.

  3. The standard approach is to use robust standard errors, such as the Huber-White standard error. You can fit your model using the lm() function in R. There are good tutorials on getting standard errors that are robust to heteroskedasticity here and here.

  4. Another common approach is weighted least squares (WLS) regression instead of the standard ordinary least squares (OLS). You can set weights using the weight argument in the lm() function in R. That approach is discussed here.

Mark White
  • 8,712
  • 4
  • 23
  • 61
  • 1
    +1 Although I don't think there's any reason to rule out an ANCOVA because of heteroscedasticity, is there? – Rose Hartman May 25 '17 at 03:22
  • I'm not an ANOVA person—I find regression (although obviously all part of the general linear model)—to be more intuitive and flexible with things like this. Are there robust methods for ANCOVA? – Mark White May 25 '17 at 03:23
  • 1
    Yep, definitely. Running it in R, you'd fit it in `lm` just like any regression model, so all of the tools you mention in your answer are available. There's nothing special about ANCOVAs after all --- it's just a regression with one continuous predictor and one categorical and no interaction. I'm not sure why it deserves its own special name, but apparently it does. :) – Rose Hartman May 25 '17 at 03:25
  • @Mark White, thanks so much for the comment. So your suggestion is using WLS or Multivariate regression instead of using ANCOVA. I am right? – jack May 25 '17 at 13:52
  • Yes. They are technically the same thing (as @RoseHartman points out), but depending on the statistical software, they are carried out and perhaps calculated in very different ways (ahem... SPSS) – Mark White May 25 '17 at 13:55
  • Yeah, SPSS has a lot to answer for. The transparency of `lm` vs. the convoluted SPSS menu structure is the #1 reason (in a long list of reasons) that I prefer R for teaching stats. – Rose Hartman May 25 '17 at 23:23
  • Just a heads up that it appears OP didn't mean to ask about heteroskedasticty at all, but rather to ask about the assumption of homogeneity of regression coeffs. – Rose Hartman May 29 '17 at 02:20
1

An analysis of covariance (ANCOVA) is a model with one continuous covariate and or more categorical predictors (if there is more than one, then it's a factorial ANCOVA). It is a kind of multiple regression model.

In order to test whether an ANCOVA model is appropriate for your data, one thing you need to check is that the effect of the covariate is the same in each group --- in other words, that an interaction isn't needed to allow differences in the effect of the covariate for different levels of the categorical predictor(s). If that assumption is violated, then you should include the interaction in the model. At that point, it's not called an ANCOVA any more, but it's still a multiple regression model --- so you would just called it multiple regression. Multiple regression is any linear model with a single continuous outcome and more than one predictor (if there's only one, it's called "simple regression"). As I mentioned, ANCOVAs (and ANOVAs, for that matter) are just some common kinds of multiple regression models that get their own names.

If you run this in R, you'll use the same function to estimate the model it whether or not it includes the interaction (i.e. whether or not it can be called an ANCOVA, or just "multiple regression").

Here's the ANCOVA version of the model:

ancova <- lm(outcome ~ covariate + factor, data = my_data)

And here it is with the interaction added, making it not an ANCOVA any more:

mul_reg <- lm(outcome ~ covariate * factor, data = my_data)

To test whether the interaction is significant, you can test the two models against each other (this is handy if you have more than 2 levels in your categorical predcitor(s), since they would then be represented in more than one regression coefficient):

anova(mul_reg, ancova)

If there is a significant improvement in model fit ($R^2$) when you allow the interaction, then you can say the assumption of homogeneity of regression coeffiicents is violated. At that point, you would use the mul_reg model instead of the ancova model to interpret your effects, and you would describe your model as "multiple regression" rather than "an analysis of covariance".

Rose Hartman
  • 2,095
  • 7
  • 30