I am writing some MapReduce code to calculate ordinary least squares from a sample of data. I'd like to include standard error, but I am running into a problem in calculating the variance of the noise. According to my reference, the variance is calculated by:
However, I don't compute $\hat\beta$ until the end. Thus, this estimate requires the entire sample set, which would defeat the point of MapReduce. If I wanted the standard error of the mean of a sample, for example, I could simply use an online variance calculation and aggregate the value until the end. Are there any alternatives for the OLS case where I could somehow aggregate a value throughout the job and use it at the end?