I'm taking a class on survey sampling and I have some problem understanding the R implementation of simple random sampling (SRS). Please look at this piece of code:
library(survey)
data(api)
N <- nrow(apipop)
srs_design <- svydesign(id=~1, fpc=~fpc, data=apisrs)
a <- svymean(~api00+api99+I(api00^2)+I(api99^2), srs_design)
a
# mean SE
# api00 656.59 9.2497
# api99 624.68 9.5003
svycontrast(a, quote(api00 - api99))
# nlcon SE
# contrast 31.9 2.0905
I have no idea how the standard error of the estimation of api00 - api99
is computed. In other words, I don't know which formula svycontrast(a, quote(api00 - api99))
is estimating, $\hat{\overline{api00}} - \hat{\overline{api99}}$ or $\hat{\overline{api00 - api99}}$?
Here is my effort in answering this question:
After some searching, I came up with an idea: svycontrast
constructes a new random variable before computing the value. Say, there are two random variables in interest, $X$ and $Y$. To estimate $X - Y$, it'll construct a new random variable by letting $Z = X - Y$ and $z_i = x_i - y_i$ for each observation, so that it can estimate $Z$ by averaging $z_i$, which is a SRS sample of $Z$.
But if I'm right, why is the following expression giving different results? Averaging $z_i = x_i^2 - y_i^2$ should give me the same estimation, but they are different.
svycontrast(a, quote(api00^2 - api99^2))
# nlcon SE
# contrast 40872 2672.7
svycontrast(a, quote(`I(api00^2)` - `I(api99^2)`))
# nlcon SE
# contrast 39906 2589.1
In this case, which formula is being estimated?
- $\hat{\overline{api00}^2} - \hat{\overline{api99}^2}$, or
- $\hat{\overline{api00^2}} - \hat{\overline{api99^2}}$, or
- $\hat{\overline{api00^2 - api99^2}}$, or
- something else?
To sum up, my question is: how does svycontrast(stat, contrasts, ...)
perform estimations on contrasts
, and which formula is it estimating?