2

Let's say you have $N$ random variables $Y_i$, where $Y_i = \beta_i X + \epsilon_i$. $X$ values are the same for all $Y_i$, but the error terms have different variance. I estimate each $\beta_i$ with OLS to obtain $\beta_i^{est}$, each with standard error $SE_i$.

Now, I want to estimate the weighted sum of $Y_i$ for some new independent value $X^{new}$: $\sum_i{w_iY_i}=(\sum_i{w_i\beta_i^{est}}) X^{new}$. What is the confidence interval around $(\sum_i{w_i\beta_i^{est}})$?

rinspy
  • 3,188
  • 10
  • 40
  • 1
    It seems if each $\beta_i$ is the same and the error terms have the same variance, then the higher N is, the smaller the confidence interval around the weighted sum should be. From some simulations, it seems like it should be $\sqrt(\sum_i{w^2_iSE^2_i})$ but I am not sure exactly how to prove it. – rinspy Aug 20 '18 at 15:40
  • 3
    It's easy to prove. That's just the formula for the standard error of a linear combination of random variables, following directly from basic properties of covariance. Of course the result isn't actually a confidence interval yet: you still have to multiply it by a suitable factor to create upper and lower limits. However, we're dancing around the question of why one wouldn't just regress $\sum w_iY_i$ against $X$ and get the answer directly, in a more useful form, in a way that accommodates possible correlations among the $\epsilon_i.$ – whuber Aug 20 '18 at 16:19
  • 3
    But of course: $$var(aX + bY) = \frac{\sum_i{(aX_i+bY_y-a\mu_x-b\mu_y)^2}}{N} = \frac{\sum_i{(a(X_i - \mu_x) +b(Y_y-\mu_y))^2}}{N} = a^2var(X) + b^2var(Y) + 2abcov(X, Y)$$ Not sure why I didn't see it before! You are right about regressing the sum directly to take into account correlations among error terms - it may make my actual problem more computationally intensive but I should try it out. If you write it up as an answer I will gladly accept it. – rinspy Aug 21 '18 at 08:45

1 Answers1

1

As per @whuber, "It is easy to prove. That's just the formula for the standard error of a linear combination of random variables, following directly from basic properties of covariance. Of course the result isn't actually a confidence interval yet: you still have to multiply it by a suitable factor to create upper and lower limits."

Indeed:

$$var(aX + bY) = \frac{\sum_i{(aX_i+bY_y-a\mu_x-b\mu_y)^2}}{N} = \frac{\sum_i{(a(X_i - \mu_x) +b(Y_y-\mu_y))^2}}{N} = a^2var(X) + b^2var(Y) + 2abcov(X, Y)$$

rinspy
  • 3,188
  • 10
  • 40