If I understand his point correctly, in answer to a previous question @StéphaneLaurent has highlighted the point that the value of population variance explained of a linear regression (i.e., $\rho^2$) depends on whether you see the predictors as fixed or random. From what I can tell, the literature refers to this distinction by different names including fixed score versus random score regression (e.g., Smithson, 2001) or sometimes as the "fixed-x assumption" (e.g., Aldrich, 2000).
Thus, in a fixed score regression there is a single $\rho^2$, which we might denote $\rho^2_f$. In a random score regression the $n \times p$ predictor data $X$, where $n$ is the sample size and $p$ is the number of predictors, is assumed to be drawn from a $p$-dimensional distribution. Thus, in the random score regression, there is a $\rho^2$ given $n$ and the sampled predictor values, which we can denote $\rho^2_i$. Finally, there is the variance explained were an infinite amount of data sampled both from the predictors and the outcome variable, which I'll denote $\rho^2_a$.
I assume that as sample size increases in a random score regression, the sample led predictor values will more closely match the underlying predictor distribution. As such the variance of $\rho^2_i$ across different samples should get smaller. Presumably also, there may be a point where the variance of $\rho^2_i$ gets sufficiently small that for practical purposes, the distinction between fixed score regression and random score regression becomes unimportant.
I also assume that confidence intervals around $\rho^2_a$ will be wider than those around of $\rho^2_f$ because there with random score models there is an additional source of variability. Thus, I'm curious both about how researchers interested in random score regression estimate this additional source of variability. I'm also interested in what sample size is required before the distinction is no longer practically important.
Questions
- How does sample size relate to the importance of the distinction between fixed score and random score regression?
- Is there a sample size at which the variance in $\rho^2$ estimation differs minimally across fixed score and random score regression?
- Are there any methods for estimating the additional source of variance related to random score regression in estimating $\rho^2$?
- Is there any published research on these topics?
References
- Aldrich, J. (2000). The origins of fixed X regression. PDF
- Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61(4), 605-632.