In regression, I thought if you added a predictor unrelated to the criterion, R2 would stay the same. However R2 increases non-trivially in the example below, even though the correlation between response and predictor3 is virtually 0. What's going on?
More generally, can someone tell me what happens to R2 and betas under these conditions:
Add predictor unrelated to y or any other predictors (assume that R2 and betas remain unchanged?)
Add predictor unrelated to y but related to other predictors (apparently R2 goes up and the predictors remain unchanged?)
R code:
R = matrix(cbind(1,.80,.2,0,
.80,1,.7,.3,
.2,.7,1, .3,
0,.3,.3,1),nrow=4)
U = t(chol(R))
nvars = dim(U)[1]
numobs = 100000
set.seed(1)
random.normal = matrix(rnorm(nvars*numobs,0,1), nrow=nvars, ncol=numobs);
X = U %*% random.normal
newX = t(X)
raw = as.data.frame(newX)
names(raw) = c("response","predictor1","predictor2","predictor3")
cor(raw)
lm1<-lm(response ~ predictor1 + predictor2, data=raw)
lm2<-lm(response ~ predictor1 + predictor2 + predictor3, data=raw)
summary(lm1)
summary(lm2)