Suppose I have a linear model Y=AX, and I tune A based on observed data. I know that correlation among my independent variables, X, will increase the uncertainty in my model coefficients, A. How do I quantify that uncertainty?
To illustrate my question with a concrete example, I composed a dummy system in which y = a0 x0 + a1 x1 + e. Independent variables x0 and x1 are normally distributed 50±10, and e is 0±10. I generated 1000 sample points, modeled the system as y = a0 x0 + a1 x1, and solved for a0 and a1. I repeated 10,000 times and found a0 = 1.000 ± 0.023 a1 = 1.000 ± 0.023
Then I repeated the experiment, but this time I engineered the data set so that x0 and x1 are highly correlated, with r-squared of 0.9. This time I found a0 = 1.000 ± 0.100 a1 = 1.000 ± 0.100
Clearly, the correlation among the independent variables led to a significant increase in the standard deviation around the estimates of the model coefficients.
My question is: If I have known correlation among my independent variables, how can I estimate the resulting uncertainty in my model coefficients? Or, to relate the question to my example, how could I have used knowledge of a 0.9 r-squared between x0 and x1 to predict that uncertainty around a0 and a1 would increase from 0.023 to 0.100?