My dataset contains a set of samples from a set of normal RVs. Each RV is normally distributed with equal variances and varying means. However, I have only two samples from each RV.
How to estimate the variance in this case?
My dataset contains a set of samples from a set of normal RVs. Each RV is normally distributed with equal variances and varying means. However, I have only two samples from each RV.
How to estimate the variance in this case?
These data can be described by two variables: one, a categorical variable $x$, identifies each random variable. Another, $Y$, gives an observation in the sample. Thus, in a tabular rendering of your dataset you would see two columns--one for the sample and another for the result--and two rows for each sample.
Your model allows the mean $\mu$ to vary with $x$:
$$Y(x) \sim \operatorname{Normal}(\mu(x), \sigma^2).$$
Equivalently,
$$E[Y(x)] = \mu(x) + \varepsilon(x)$$
where the $\varepsilon(x)$ are independent and identically distributed Normal$(0,\sigma^2)$ variables. This is the standard regression setting.
Arbitrarily writing one observation from each sample of the random variable $x$ as $y_1(x)$ and the other as $y_2(x),$ the (unbiased) least squares estimate of $\sigma^2$ is
$$\hat\sigma^2 = \frac{1}{2n}\sum_{x} (y_1(x) - y_2(x))^2.$$
In retrospect this is obvious because $y_1(x)-y_2(x)$ have Normal$(0,2\sigma^2)$ distributions and are independent.
Recall (population) variance is a measure of variability around a (population) mean.
Your dataset contains several sets of two samples from each RV, for which you know the variances are equal.
We first must estimate the sample mean, and then use that mean to estimate the sample variance.
The problem is, we have to 'spend' some observations to estimate the mean and then 'spend' further observations to estimate the sample variance.
At minimum, you'd need two points to calculate an average, and at least one more point to estimate the squared deviation of each point from the sample average.