In a study, we compared, using the Student t-test (data are normal), the means of a protein expression for 2 groups of patients (n=100). We found that the expression was statistically different (p-value < 0.005).
One reviewer of our work is asking if the ages of the patients, in the two different groups, can affect the statistical significance that we found?
Could you please tell me what approach I should use to assert if the age of the patients are biasing the test or not?
Here are some details about the procedure that I am using, especially regarding the comparison between the t.test results and the regression results.
I am using R ('t.test' and 'glm' methods) for all the computations. I have simplified my dataset, create some artificial data, and removed the age from the dataset, as my new question from above comments is: does it make sense to have different results from a t.test and the regression.
#50 random values
x <- rnorm(50)
#60 other random values
y <- rnorm(60)
# perform a t.test
t.test(x,y)
Welch Two Sample t-test
data: x and y
t = 1.956, df = 25.253, p-value = 0.04161
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01016491 0.39826273
sample estimates:
mean of x mean of y
0.7273823 0.5333334
#format the data
df <- data.frame(y=c(x,y),group=c(rep("x",50),rep("y",60)))
#perform a regression
fit <- glm(y~group,data=df)
#print the resuls
summary(fit)
Call:
glm(formula = y ~ bc, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.48892 -0.23710 0.04165 0.22003 0.46359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.72738 0.09225 7.885 1.37e-08 ***
bcy -0.19405 0.11298 -1.717 0.0969 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As you can see the t.test is significant but not the coefficient of the regression.. Does it make sense ?