1

I got an initial mean $\mu_1$ and std $\sigma_1$ by sampling samples, these samples are generated by an unknown distribution and later I drop these samples. Then I sampled some samples and got the mean $\mu_2$ and std $\sigma_2$ from the new sample and I kept the new samples. So how can I get the new std?

My idea is $(\sigma_1+\sigma_2)/2$ but I think this result is biased.

Another idea is $\mu_3=(\mu_1+\mu_2)/2$, and I reconstruct the previous samples by making an array with all items are $\mu_1$. Then combine the previous samples with the new samples and use the $\mu_3$ to get the $\sigma_3$.

Update: I clarify my question

I want to estimate mean and variance from unknow sample, due to memory cost, I can sample the data and get the mean and variance and then drop these samples, and then I sample data from the distribution and compute the mean and variance of the new samples, before dropping the new sample, I want to estimate the mean and variance of the distribution based on the current mean and variance and that of the previous step.

GoingMyWay
  • 1,111
  • 2
  • 13
  • 25
  • Do you have sample size? $\Sigma$ is just std or variance-covariance matrix? – user158565 Aug 08 '19 at 04:00
  • @user158565 I have the sample size. $\Sigma$ is std. – GoingMyWay Aug 08 '19 at 05:09
  • Perhaps I don't understand what you're trying to do, but why can't you just find $\bar{x}$ and $s^2$ for the second sample? – Dave Aug 08 '19 at 14:42
  • Isn't the "new std" $\sigma_2$ by definition? If you dropped samples, surely that means you don't want to include them in your estimates, right? If all you want to do is *combine* the two sample sets, then your question is answered at https://stats.stackexchange.com/questions/51622, https://stats.stackexchange.com/questions/43159, https://stats.stackexchange.com/questions/30495, and other places. – whuber Aug 08 '19 at 15:07
  • @Dave I have to combine the first and the second round results. – GoingMyWay Aug 09 '19 at 06:33
  • @whuber Hi I updated my question. – GoingMyWay Aug 09 '19 at 06:42

1 Answers1

0

Let $\bar X$ and $S$ be sample mean and standard deviation. (In statistics, $\mu$ and $\sigma$ are used for parameters).

Your problem can be resolved based on two formula:

$$\bar X = \frac {\sum_{i=1}^n X_i}n$$ $$S=\sqrt{\frac {\sum_{i=1}^nX_i^2-\frac {(\sum_{i=1}^nX_i)^2} n}{n-1}}$$

Applying them to two sets of $\bar X$ and $S$ from two samples, you can get two sets of $\sum_{i=1}^n X_i$ and $\sum_{i=1}^nX_i^2$. Adding them together you get new set $\sum_{i=1}^n X_i$ and $\sum_{i=1}^nX_i^2$ over the two samples. Applying two formula again, you get what you want.

user158565
  • 7,032
  • 2
  • 9
  • 19