2

I have trying to improve a multiple regression model (reducing RMSE further), and have found heteroscedasticity evidence for 2 variables. I have found 2 options for reducing heteroscedasticity in linear regression: first is to get square root Y (response variable), and second is to use box-cox transformation (as per https://www.r-bloggers.com/how-to-detect-heteroscedasticity-and-rectify-it/). All examples I have found show single input variable solutions. I'm trying to figure out how this will play out in multiple linear regression analysis. In analysis of each of the 3 relationships (predictors and responses) individually, only 2 show evidence of heteroscedasticity. How do I apply to only 2 out of 3? I am playing around eg square root of y to entire multiple regression.

In R, below, I've created a new column which is squareroot of output variable y (using caret library):

  model2 <- train (ysqrt ~ x1 + x2 + x3, trainX, method = "lm", trControl = trainControl (method="cv", number=10))

This just doesn't seem right though.

Thanks in advance.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
user637251
  • 41
  • 6

1 Answers1

1

What about using White robust standard errors?

https://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors

Take a look at this to apply this method with R:

Replicating Stata's "robust" option in R