0

Suppose we have two independent normal distributions $A \sim \mathcal{N}(\mu_a, {\sigma_a}^2)$ and $B \sim \mathcal{N}(\mu_b, {\sigma_b}^2)$.

Suppose that we have a sample $X$, which is drawn from $A$.

Now, suppose that we replace $p\%$ of the values in $X$ with data from $B$, and call the result $X'$.

What will be the change in mean and standard deviation going from $X$ to $X'$?

Tom Hosker
  • 267
  • 1
  • 7
  • 2
    How do you select the $p\%$ of values? Do they lie in some predetermined region, or satisfy some equation, or perhaps are they random? And are you asking about *expected* changes in mean and SD or *actual* changes in a dataset? – whuber Jun 05 '19 at 15:21
  • If you select the $p$% at random, then you have a [mixture distribution](https://en.wikipedia.org/wiki/Mixture_distribution). – BruceET Jun 05 '19 at 16:10
  • @BruceET Almost, but not quite: in a sample of $n$ iid observations from a mixture distribution, the proportion of values from $B$ will be Binomial$(n,p)$ rather than exactly $p.$ This will affect the variance. – whuber Jun 05 '19 at 16:21
  • @whuber "How do you select the $p\%$ values?" Suppose the data in $X$ comes in the form: [$d_1$, $d_2$, ...]. We chop off the last so many data-points, and replace them with data drawn (in a random fashion) from $B$. The order of the data in $X$ ought not to matter. So I suppose, in effect, we're selecting the data to be replaced at random? – Tom Hosker Jun 06 '19 at 17:03
  • Yes: if the order does not matter, that's tantamount to randomly replacing a fixed number of the values. But are you asking about the *expected changes* in these *random variables* or are you asking about the *actual changes* in mean and SD resulting from the actual changes made to the *data*? – whuber Jun 06 '19 at 17:17
  • @whuber "And are you asking about expected changes in mean and SD or actual changes in a dataset?" I'd like a confidence interval for the new mean and standard deviation, if that's possible. – Tom Hosker Jun 06 '19 at 17:22
  • @whuber Sorry. I ought to clarify a bit more. I want to know what the mean and SD of $X'$ are, where $X'$ is the modified *sample*. – Tom Hosker Jun 06 '19 at 17:24
  • Two applications of the general technique described at https://stats.stackexchange.com/a/51927/919 will do it: view the process in terms of three disjoint sets of data: the common set; the set removed; and the set added. – whuber Jun 06 '19 at 18:11

0 Answers0