I am working on a few (both simple and multivariable) regression analyses, and I have cases where the residuals are non-normal, to varying degrees. As I've understood, the Gauss-Markov theorem states that normality of residuals is not necessary for the coefficient point estimates to be correct, i.e., I can trust that the regression summary tells me the BLUE coefficient estimates. However, the standard errors may be biased, and thus, the corresponding t- and p-values may not be correct.
A previous question (How to obtain p-values of coefficients from bootstrap regression?) asked if it was possible to calculate p-values from bootstrapped coefficients and their CIs. However, if I bootstrap the coefficients and use the bootstrapped mean and the bootstrapped SE to re-calculate t- and p-values, would that be a sound approach?
Here is the code I use, in R:
# create linear model
mod <- lm(Y~X,data=dataset); summary(mod)
# create function to return coefficient
mod.bootstrap <- function(data, indices) {
d <- data[indices, ]
mod <- lm(Y~X, data=d)
return(coef(mod))
}
# set seed
set.seed(1234)
# begin bootstrap using the boot package
mod.boot <- boot(data=dataset, statistic=mod.bootstrap, R=2000)
# now here is how I re-calculate t- and p-values
bootmean <- mean(mod.boot$t[,2])
booter <- sd(boot.t[,2])
tval <- bootmean/booter
p <- 2*pt(-abs(tval),df=mod$df.residual)
The new t- and p-values come out fairly close to the LM summary, albeit a bit more conservative, as expected. Is this something I could report? I am very new at this, so I can't really tell if my logic is valid or not.
Edit: Clarified a bit.