Confidence interval around weighted sum of regression coefficient estimates?

Question

Let's say you have $N$ random variables $Y_i$, where $Y_i = \beta_i X + \epsilon_i$. $X$ values are the same for all $Y_i$, but the error terms have different variance. I estimate each $\beta_i$ with OLS to obtain $\beta_i^{est}$, each with standard error $SE_i$.

Now, I want to estimate the weighted sum of $Y_i$ for some new independent value $X^{new}$: $\sum_i{w_iY_i}=(\sum_i{w_i\beta_i^{est}}) X^{new}$. What is the confidence interval around $(\sum_i{w_i\beta_i^{est}})$?

It seems if each $\beta_i$ is the same and the error terms have the same variance, then the higher N is, the smaller the confidence interval around the weighted sum should be. From some simulations, it seems like it should be $\sqrt(\sum_i{w^2_iSE^2_i})$ but I am not sure exactly how to prove it. — rinspy, Aug 20 '18 at 15:40
It's easy to prove. That's just the formula for the standard error of a linear combination of random variables, following directly from basic properties of covariance. Of course the result isn't actually a confidence interval yet: you still have to multiply it by a suitable factor to create upper and lower limits. However, we're dancing around the question of why one wouldn't just regress $\sum w_iY_i$ against $X$ and get the answer directly, in a more useful form, in a way that accommodates possible correlations among the $\epsilon_i.$ — whuber, Aug 20 '18 at 16:19
But of course: $$var(aX + bY) = \frac{\sum_i{(aX_i+bY_y-a\mu_x-b\mu_y)^2}}{N} = \frac{\sum_i{(a(X_i - \mu_x) +b(Y_y-\mu_y))^2}}{N} = a^2var(X) + b^2var(Y) + 2abcov(X, Y)$$ Not sure why I didn't see it before! You are right about regressing the sum directly to take into account correlations among error terms - it may make my actual problem more computationally intensive but I should try it out. If you write it up as an answer I will gladly accept it. — rinspy, Aug 21 '18 at 08:45

score 1 · Accepted Answer · answered Jun 03 '19 at 14:30

As per @whuber, "It is easy to prove. That's just the formula for the standard error of a linear combination of random variables, following directly from basic properties of covariance. Of course the result isn't actually a confidence interval yet: you still have to multiply it by a suitable factor to create upper and lower limits."

Indeed:

$$var(aX + bY) = \frac{\sum_i{(aX_i+bY_y-a\mu_x-b\mu_y)^2}}{N} = \frac{\sum_i{(a(X_i - \mu_x) +b(Y_y-\mu_y))^2}}{N} = a^2var(X) + b^2var(Y) + 2abcov(X, Y)$$

Confidence interval around weighted sum of regression coefficient estimates?

1 Answers1

Linked