Barring the question of how to operationalize outliers, or the utility of doing so, and assuming dependent variables and independent variables are all scaled in the main regression specification (centered and divided by their standard deviations), should the scaling happen before or after outlier removal?
Specifically, I'm wondering if the p-values from coefficients will be affected at all by this decision?
Here is a simulation of what I'm talking about:
meeting_count <- c(.01, .02, .01, .05, .03, .025)
revenue_pre_scaled = scale(revenue)
summary(lm(revenue_scaled[0:4] ~ meeting_count[0:4]))
revenue_post_scaled = scale(revenue[0:4])
summary(lm(revenue_post_scaled ~ meeting_count[0:4]))
Perhaps this is just dumb luck, but here are the summary outputs
> summary(lm(revenue_post_scaled ~ meeting_count[0:4]))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.025 0.526 -1.95 0.19
meeting_count[0:4] 45.572 18.893 2.41 0.14```
> summary(lm(revenue_scaled[0:4] ~ meeting_count[0:4]))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.42611 0.00536 -79.57 0.00016 ***
meeting_count[0:4] 0.46401 0.19237 2.41 0.13734