Reduce subsetting of the dataset?

Question

I collected a lot of trapping data of a certain rodent species. I constructed a model to see what affects the individual's activity. I constructed this LM (linear model):

Activity ~ Sex+Weight+Exploration+Totaltimestrapped

But this dataset has all the individuals in it. I want to know whether weight, for example, weight is correlated with activity within males, females, adult males, adult females, juvenile males and juvenile females. I therefore subsetted the dataset into one with only males in it, only females, etc. And ran the same model again in all these different datasets (without sex of course).

An interaction between sex and weight is not what I want, because I don’t want to know if heavier males are more active than heavier females. I just want to know if weight is significantly correlated with activity within males or females etc.

This means that I ran a lot of LMs which is not correct. Is there a way to construct a model that reduces the subsetting of the dataset?

It's not wrong to fit separate models if you want to allow all estimates, including that of the error variance, to differ among sexes. But if you want to allow to allow only the slope of activity vs weight to differ among sexes, with common estimates for other slopes, then consider adding weight-sex interaction terms. To see how your model will work write it out in full for each combination of dummy variables representing a different sex. — Scortchi - Reinstate Monica, Apr 21 '15 at 15:40
You mention " males, females, adult males, adult females, juvenile males and juvenile females" implying that these are categorical not continuous variables (of course gender is discrete). Regression is a kind of analysis of variance (ANOVA) -- perhaps you should be looking at ANOVA analysis. The basic hypothesis is "does the mean of the predictor vary among the different values of the 'treatments' (the discrete values of the predictors)" — John Mark, Apr 21 '15 at 16:09
@JohnMark: Don't think the OP's asking whether activity or weight vary (significantly) across different sexes; but whether there's a (significant) relationship between activity & weight, for each of the sexes considered separately. — Scortchi - Reinstate Monica, Apr 22 '15 at 11:39

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

Suppose you include a term for interaction between sex ($s=0$ for males, $s=1$ for females) & weight ($w$) in your model for expected activity ($\operatorname{E}A$):

$$\operatorname{E}A = \beta_0 + \beta_1 s + \beta_2 w + \beta_3 sw + \beta_4 e + \beta_5 t$$ where the $\beta$s are the coefficients you want to estimate (& $e$ & $t$ are exploration & total times trapped).

For males it simplifies to

$$\operatorname{E}A = \beta_0 + \beta_2 w + \beta_4 e + \beta_5 t$$

For females to

$$\operatorname{E}A = (\beta_0 + \beta_1) + (\beta_2 + \beta_3) w + \beta_4 e + \beta_5 t$$

So the slope of activity vs weight (keeping exploration and total times trapped constant) is $\beta_2$ for males & $\beta_2 + \beta_3$ for females. The other slopes are common to both males & females. The standard error of $\beta_2 + \beta_3$, for confidence intervals or hypothesis tests, can be calculated from the variance–covariance matrix—see here & here.

Whether you're considering juvenile males, adult males, &c. as different sexes isn't clear: in any case specifying sex with three dummy variables is equivalent to including a indicator variable $m$ for maturity & its interaction $ms$ with sex.

Reduce subsetting of the dataset?

1 Answers1

Linked