0

I am doing some work to iteratively improve the mean performance of a process.

For each iteration $i$ I gather data data on the performance $X$. At each iteration I have $10000$ samples which give a mean of $\bar{x_i}$ and a variance $s_i^2$. I only have summary statistics, not the original data.

I want to test if the means are improving over time but doing a simple linear regression is a bit crude because I have information on the variability of each sample mean.

What would be a more sensible way to include the sample variances in my regression model?

Hugh
  • 3,659
  • 16
  • 22
  • 2
    You might want to use weighted least squares regression. Weighting each event by $1/s_i^2$ is the proper way of factoring in knowledge of the relative uncertainties of the sample means. There are other CV posts on this issue; you should search for weighted least squares to get an idea. For a standard code implementation (though slightly off topic), you can use the `metafor` package, if you are using R. – jwimberley Feb 06 '17 at 15:06
  • 1
    I found the posts I was thinking of (making this a possible duplicate): http://stats.stackexchange.com/questions/235693/linear-model-where-the-data-has-uncertainty-using-r (with an answer from myself) and http://stats.stackexchange.com/questions/113987/lm-weights-and-the-standard-error – jwimberley Feb 06 '17 at 15:08
  • 1
    Thank you @jwimberley weighting by $1/s^2$ is a lot more elegant than the solution that I had in mind (to simulate data with the sample statistics). As an extension, if I had unequal sample sizes would it be correct to weight by $n_i/s_i^2$? That might account for the accuracy of the sample statistics. – Hugh Feb 06 '17 at 17:26
  • 2
    yes, it would. I actually assumed this is what you meant by s_i, I.e. that these were the sample standard errors of the mean, so I'm glad you caught that. – jwimberley Feb 06 '17 at 17:28

0 Answers0