You have a classification variable in trt
which has 3 groups, a dependent variable blood
, and the rest independent variables. You can set up 2 dummy variables say d1
and d2
which will have values of 0 or 1 as follows:
trt = 1: d1 = 0; d2 = 0
trt = 2: d1 = 1; d2 = 0
trt = 3: d1 = 1; d2 = 1
Which will make your treatment groups unique.
Construct the model including the dummy variables and their interactions with the independent variables as:
blood ~ age sex week d1 d2
and all their interactions
(age sex week)
d1(age sex week) d2(age sex week)
d1(age sex) d2(age sex)
d1(age week) d2(age week)
d1(sex week) d2(sex week)
The resulting model will have an intercept and coefficients for all the dependent variables and their interactions. You will be able to determine significance for each group by dropping the combinations whered1 or d2 = 0
which will give you specific slopes and intercepts for each treatment group and their independent variables and combinations.
See Ways of comparing linear regression interepts and slopes? for a full explanation of how to interpret the results. This works as long as your data set has enough observations to support this extended analysis.
Here is the analysis using your data:
data.in <- read.delim(file = "lead.full.txt", header = T, sep = " ")
# set the dummy variables
# if the treatment group is 1: d1 = 0; d2 = 0
# if the treatment group is 2: d1 = 0; d2 = 1
# if the treatment group is 3: d1 = 1; d2 = 1
data.in$d1 <- ifelse(data.in$trt == 1, 0, 1)
data.in$d2 <- ifelse(data.in$trt == 3, 1, 0)
m1 <- lm(blood ~
age + sex + week +
d1 + age*d1 + sex*d1 + week*d1 +
d2 + age*d2 + sex*d2 + week*d2,
data = data.in)
summary(m1)
Here are the results:
Call:
lm(formula = blood ~ age + sex + week + d1 + age * d1 + sex *
d1 + week * d1 + d2 + age * d2 + sex * d2 + week * d2, data = data.in)
Residuals:
Min 1Q Median 3Q Max
-20.9957 -3.6552 -0.0733 3.6797 25.3043
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.56058 0.95231 25.790 < 2e-16 ***
age 4.55547 0.97582 4.668 3.88e-06 ***
sex 0.86996 0.97200 0.895 0.3712
week -0.30435 0.16034 -1.898 0.0582 .
d1 1.07777 1.41618 0.761 0.4470
d2 -0.32161 1.42394 -0.226 0.8214
age:d1 1.57303 1.34725 1.168 0.2435
sex:d1 -3.46820 1.34729 -2.574 0.0103 *
week:d1 -0.57788 0.22682 -2.548 0.0111 *
age:d2 -1.34909 1.30789 -1.031 0.3028
sex:d2 1.31538 1.31703 0.999 0.3184
week:d2 -0.05158 0.22744 -0.227 0.8207
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.119 on 514 degrees of freedom
Multiple R-squared: 0.2706, Adjusted R-squared: 0.255
F-statistic: 17.34 on 11 and 514 DF, p-value: < 2.2e-16
The (Intercept)
,age
, sex*d1
and week*d1
have p-values less than 0.05
.
To construct the model we consider only the variables with p-values less or than 0.05
.
The model generated is:
blood = (Intercept) + (Estimate[age])*age + (Estimate[sex:d1])*sex:d1 + (Estimate[week:d1])*week:d1
The other coefficients Estimate
are ignored because they are not statistically significant.
Now we analyze each treatment group.
trt = 1: d1 = 0; d2 = 0
.
We do not need to consider any factors that include d1
or d2
as they are 0
.
The model is: blood = (Intercept) + (Estimate[age])*age
trt = 2: d1 = 0; d2 = 1
.
We do not need to consider factors that include d1
as it is 0
. We must consider factors that include d2
which equals 1
. However, nor d2
or any factors that include d2
are significant, so they are not included in the model.
The model is: blood = (Intercept) + (Estimate[age])*age
Which is identical to trt = 1
. So, there is no statistical difference between trt = 1
and trt = 2
.
trt = 3: d1 = 1; d2 = 1
.
We need to consider factors that include d1
or d2
as they are both = 1
. However, d2
is not significant so it is not in the model.
The model is: blood = (Intercept) + (Estimate[age])*age + (Estimate[sex:d1])*sex:d1 + (Estimate[week:d1])*week:d1
Which is different from trt = 1
and trt = 2
.
So trt = 3
is significantly different from trt = 1
and trt = 2
.