3

So I want to make this equation for example:

y = mu + Strain + Insect + Strain*Insect + BW_final

Of all these variables, strain and Insect are controlled variables, but BW_final is an independent variable which isn't necessarily controlled. So I want my model to include Strain and Insect as variables, but I want BW_final to be a covariate. How do I do that? This is what I have right now:

lm(yield ~ Strain + Insect + Strain:Insect + BW_final)
Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25
  • Welcome. The terms "covariate" and "control" are often used interchangeably. Is it the interaction of `Strain:Insect` that is confusing you? Your model *is* controlling for `BW_final`. – Thomas Bilach Aug 21 '20 at 16:16
  • Thanks! I thought covariate had to be separately designated, but apparently it doesn't! Quite new to statistics hehe, had some theory on it but have never actually had to use it a lot so far. – user2296226 Aug 21 '20 at 17:14

2 Answers2

3

“Covariate” is a term we use to discuss the role of a variable in a model, but the model doesn’t know or care what we call it. All the model knows (assuming an OLS regression, which seems safe to assume) are $\hat{\beta} =(X^TX)^{-1}X^Ty$ and the corresponding standard values and p-values on the parameter estimates.

It’s then up to you to test the parameters that interest you. If something is a covariate but not the variable of interest, don’t test it. For example, in ANCOVA, the interest is in the categorical variable, not in the covariate. Perhaps you are interested in the effect of drug dose on a particular medical measurement, and you separate the men and women. You would test the drug dose but perhaps not care about the gender indicator variable.

There are all sorts of issues about whether you should test interactions and if control variables are worth including, but those are issues for regression modeling strategies and experimental design, subjects that are addressed in books, not SE posts.

Dave
  • 28,473
  • 4
  • 52
  • 104
0

The variables Strain, Insect, and BW_final listed inside of the lm() function call are your covariates. A control variable is routinely referred to as an independent variable. In fact, the terms predictor, input, control, and covariate are often used interchangeably in regression contexts. The variable BW_final is listed after the ~ and is interpreted as a symbolic representation of a covariate. Your equation is doing exactly what you want it to do.

Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25