linear regression indicator for variables all have different values in R

Question

I have 6 values:

id,
age(=0 if<24;=1>24)
sex(=0 if female;1=male)
week
blood
trt(1,2,3)

Question asks write a model for each group with linear trend in week, age, sex, and all their interactions(up to week* age* sex). Also consume common random effects for intercept and slope of week across 3 groups.

I was thinking pull out data with trt=1 first, and write model with blood~age+week+sex+age:sex+age:week+sex:week+age:sex:week. Is this model right for trt= 1? How do I consider indicator for age and sex inside model? I'm confused that how to consume common random effects for intercept and slope for week? Does it mean I should add (week|id) to my model?? Any hints to write model?

file

LDBerriz · Answer 1 · 2020-04-29T18:05:26.400

You have a classification variable in trt which has 3 groups, a dependent variable blood, and the rest independent variables. You can set up 2 dummy variables say d1 and d2 which will have values of 0 or 1 as follows:

trt = 1: d1 = 0; d2 = 0
trt = 2: d1 = 1; d2 = 0
trt = 3: d1 = 1; d2 = 1

Which will make your treatment groups unique. Construct the model including the dummy variables and their interactions with the independent variables as:

blood ~ age sex week d1 d2

and all their interactions

(age sex week) d1(age sex week) d2(age sex week) d1(age sex) d2(age sex) d1(age week) d2(age week) d1(sex week) d2(sex week)

The resulting model will have an intercept and coefficients for all the dependent variables and their interactions. You will be able to determine significance for each group by dropping the combinations whered1 or d2 = 0 which will give you specific slopes and intercepts for each treatment group and their independent variables and combinations. See Ways of comparing linear regression interepts and slopes? for a full explanation of how to interpret the results. This works as long as your data set has enough observations to support this extended analysis.

Here is the analysis using your data:

data.in <- read.delim(file = "lead.full.txt", header = T, sep = " ")

# set the dummy variables
# if the treatment group is 1: d1 = 0; d2 = 0
# if the treatment group is 2: d1 = 0; d2 = 1
# if the treatment group is 3: d1 = 1; d2 = 1

data.in$d1 <- ifelse(data.in$trt == 1, 0, 1)
data.in$d2 <- ifelse(data.in$trt == 3, 1, 0)

m1 <- lm(blood ~ 
            age + sex + week + 
            d1 + age*d1 + sex*d1 + week*d1 + 
            d2 + age*d2 + sex*d2 + week*d2, 
            data = data.in)

summary(m1)

Here are the results:

Call:
lm(formula = blood ~ age + sex + week + d1 + age * d1 + sex * 
    d1 + week * d1 + d2 + age * d2 + sex * d2 + week * d2, data = data.in)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.9957  -3.6552  -0.0733   3.6797  25.3043 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 24.56058    0.95231  25.790  < 2e-16 ***
age          4.55547    0.97582   4.668 3.88e-06 ***
sex          0.86996    0.97200   0.895   0.3712    
week        -0.30435    0.16034  -1.898   0.0582 .  
d1           1.07777    1.41618   0.761   0.4470    
d2          -0.32161    1.42394  -0.226   0.8214    
age:d1       1.57303    1.34725   1.168   0.2435    
sex:d1      -3.46820    1.34729  -2.574   0.0103 *  
week:d1     -0.57788    0.22682  -2.548   0.0111 *  
age:d2      -1.34909    1.30789  -1.031   0.3028    
sex:d2       1.31538    1.31703   0.999   0.3184    
week:d2     -0.05158    0.22744  -0.227   0.8207    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.119 on 514 degrees of freedom
Multiple R-squared:  0.2706,    Adjusted R-squared:  0.255 
F-statistic: 17.34 on 11 and 514 DF,  p-value: < 2.2e-16

The (Intercept),age, sex*d1 and week*d1 have p-values less than 0.05.

To construct the model we consider only the variables with p-values less or than 0.05.

The model generated is:

blood = (Intercept) + (Estimate[age])*age + (Estimate[sex:d1])*sex:d1 +  (Estimate[week:d1])*week:d1

The other coefficients Estimate are ignored because they are not statistically significant.

Now we analyze each treatment group.

trt = 1: d1 = 0; d2 = 0.

We do not need to consider any factors that include d1 or d2 as they are 0.

The model is: blood = (Intercept) + (Estimate[age])*age

trt = 2: d1 = 0; d2 = 1.

We do not need to consider factors that include d1 as it is 0. We must consider factors that include d2 which equals 1. However, nor d2 or any factors that include d2 are significant, so they are not included in the model.

The model is: blood = (Intercept) + (Estimate[age])*age

Which is identical to trt = 1. So, there is no statistical difference between trt = 1 and trt = 2.

trt = 3: d1 = 1; d2 = 1. We need to consider factors that include d1 or d2 as they are both = 1. However, d2 is not significant so it is not in the model.

The model is: blood = (Intercept) + (Estimate[age])*age + (Estimate[sex:d1])*sex:d1 + (Estimate[week:d1])*week:d1

Which is different from trt = 1 and trt = 2. So trt = 3 is significantly different from trt = 1 and trt = 2.

I think your answer is perfect for one model. But I wanna write a model for each group.(total 3 models). I'm confused about how to include dummy values for each group, and write model with linear trend in week, age, sex, and all their interactions(up to week* age* sex). — melody, Apr 28 '20 at 04:56
If you can write a short script that recreates the data that you want to analyze I can show you a specifically how it is done. Of course it does not have to be your own data just something that has a similar structure. — LDBerriz, Apr 28 '20 at 12:40
Hey, I post image of data. Data includes some missing values. — melody, Apr 28 '20 at 16:52
Hi Melody, See if you can post a file with actual numbers in rows and columns. You can use an Excel worksheet, or a .csv file or just about any text file that has your data in rows and columns. I tried using text recognition to extract the data but is proving to be tedious and inaccurate. To show you what to do, it is necessary to read the data. — LDBerriz, Apr 29 '20 at 12:12
Hi, link: https://github.com/lingling1123/data/blob/master/lead.full.txt — melody, Apr 29 '20 at 14:34

linear regression indicator for variables all have different values in R

1 Answers1