I was hoping to find here a solution to some aspects of linear regression I had trouble understanding.
Let's take an example of regression with the following variables:
$y:\:$ depression (continuous)
$x:\:$ time (treated as continuous and coded as following:
- 0=timepoint 1;
- 1=TP 2;
- 2=TP 3;
- 3=TP 4;
- 4=TP 5)
Everywhere I look, the definition for the intercept goes something like this: the intercept is the expected mean value of y when x=0. As I understand, in this case the intercept should be the mean for depression when time=0. However, these seems not to be the case. When a calculate the mean for timepoint 1 I get 39.65, but the intercept is 39.91 (see below).
As I already stated, for me "mean value of y when x=0" is the same as saying "mean value of depression at timepoint 1 (coded as 0)", so it doesn't make sense to me why the two values differ.
I also want to mention:
- When I have only 2 timepoint in variable Time, the intercept is the same as the mean
- When i treat time as a factor, the intercept is the same as the mean
- I've checked with other variables and datasets too
mean(subset(data, Time==0)$depression)
[1] 39.65254
m0 <- lm("depression~Time", data=data)
summary(m0)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.9158 0.7156 55.781 < 2e-16 ***
Time -1.6381 0.3029 -5.408 1.05e-07 ***