I am performing a regression on a large dataset that is fairly noisy. The line I am running in R is:
lmfit <- lm(predictVariable ~ dataSet[,1:10])
so I have 10 endogenous variables. The results are somewhat strange:
significant t-statistics for all the coefficients
$R^2 = 0.25$
But when I run a regression again on the fitted and actual variables, I get this:
run1 <- lm(predictVariable ~ lmfit$fitted.values)
The coefficient in front of fitted values is 1.0001 and the intercept is ~0. But now when i run it in the reverse way:
run1 <- lm(lmfit$fitted.values ~ predictVariable)
The coefficient is now 0.26 and the intercept is also ~0. Why are the coefficients in the last two regressions so different? Does this mean my endogenous variables are colinear, or that my fit is poor? When I plot the fitted.values versus the actual, I see that there are many actual values close to 0 but the fitted value is higher and lower. Can anyone help me understand these regression results?