0

Here is what I did:

1) I divided the mtcars dataset into a training set (80%) and a validation set (20%).

2) I built a simple linear model predicting mileage (mpg) based on displacement (disp).

3) I built a multi-linear model predicting mileage (mpg) based on displacement, horsepower and weight (disp + hp + wt).

Here is the R code I used:

set.seed(123) 
trainingRowIndex <- sample(1:nrow(mtcars), 0.8*nrow(mtcars))
trainingSet <- mtcars[trainingRowIndex, ]
validationSet  <- mtcars[-trainingRowIndex, ]
# Build simple linear model (disp only)
lmMtcars <- lm(mpg ~ disp, data=trainingSet) 
summary (lmMtcars)
# Build multi-linear model (disp + hp + wt)
mlmMtcars <- lm(mpg ~ disp + hp + wt , data=trainingSet)
summary (mlmMtcars)

In the simple linear model, the disp has a p-value of 2.33e-07 (basically meaning that it is a good predictor). However, in the multi-linear model, disp has a p-value of 0.87220 (meaning that it is not a good predictor). I would expect disp to be a rather good predictor for mgp as there is a strong (negative) correlation between the two (=-0.8475514).

Why is the disp p-value not significant in the case of the multi-linear model? Is there something wrong with my code or do I miss something here?

Thanks,

Cornelius
  • 3
  • 1
  • 1
    basically, multicollinearity: e.g. see https://stats.stackexchange.com/questions/138426/my-p-values-increase-when-adding-variables-is-the-model-still-valid – Ben Bolker Jul 10 '18 at 21:36
  • You had one variable, which was "significant." You added another, and now your original predictor is insignificant. One would assume that the addition of the second predictor made the first variate "insignificant," which means that the new predictor is able to model all of the variance the original predictor could, and then even more. This happens when you have high collinearity (or multicollinearity in the case with a matrix of covariates). This is a well-discussed topic on this site; I recommend reading into Variance Inflation Factors (VIF). There is a brilliant `vif()` function in R. – ERT Jul 10 '18 at 21:44
  • This was asked simultaneously on StackOverflow, and then migrated to CV: https://stats.stackexchange.com/questions/355487/r-different-predictor-p-value-in-linear-and-multi-linear-models – Ben Bolker Jul 10 '18 at 22:32

0 Answers0