How can I explain that a value, first positive correlated with the target, has a negative influence on the target in my model?
I have built a linear model in R. Had 30 variables, and 60 observations.
First step: feature selection (processed with K10 CV, and searched for minimum MSE Second step: build the LM with the best number of features.
reg.best=regsubsets(target ~., data=data, nvmax=30)
coef(reg.best ,17) #17 was the number with least errors.
model <- lm(target ~ X1 + X2 + X2 + X(n=17) , data = data)
If I put the 'best' features in X,the result looks good (small P-value, all variables significant and a Adjusted R2 of 0.95.
The thing that I can't explain to the business:
cor(data)
gives a correlation between X1 and Y: 0.82(!)
summary(model)
Estimate Std. Error t value Pr(>|t|)
X1 -0.9100893836144 0.1611999070127 -5.646 0.000001044953 ***
How can I explain that a value, first positive correlated with the target, has a negative influence on the target in my model?