I am not entirely sure how. I think you could work it through. Basically, propagation of error is the total derivative of your fit equation, with covariances, example. When you do your fit routine, with what ever method is appropriate to your problem, just get make sure to use one that gives you the standard deviations of all of the parameters. Then you can solve for any combination of parameters' variances that you want to. I did that adaptively in the Appendix section of this paper. That is, I not only solved the error propagation for my parameter of interest, but I optimized for it.
For your problem, for the total error of each of the dependent variables, you have a contribution from each of the error propagation terms. Now the added confusion of using a "population model," depends on exactly how you obtain it, and after that I am lost in what you are doing, exactly. However, I do not think that it changes the rules for error propagation, just what the errors themselves are.
I am going to take a guess as to what an average model is, and mention that if that means that we fit models separately, then average the results, that one might start with a mixture model, and fit that, as following that one might get a better fit, and, then calculation of $R^2$ is routine. For example, $s (p b_1 e^{-b_1 t} +(1-p) b_2 e^{-b_2 t})$ is a mixture model scaled by $s$ times a unit CDF($\infty$) consisting of the sum of two independent exponential distributions, that is one of the most commonly used pharamcokinetic models. If you are doing autocorrelation or some other averaging, please let me know and I will change my suggestion to match it, if I can.
Another possibility is to noise reduce the data using adaptive B-splines or locally adaptive Pixon. What should be done depends on what the purpose of obtaining an average is.