I do not have access to the original data set.
AB: Overall mean: 149.41 sd: 89.13 N: 2284
B: Subset mean: 110.98 sd: 73.53 N: 917
I need to determine the original mean and standard deviation or variance of the original A set that is combined in the set AB
In order to determine the mean of A given sd, and N of AB and B we can do:
mean(AB-B) = (149.41*2284 - 110.98*917)/(2284-917) = 175.19
Is it possible to determine the standard deviation or variance of the set AB=(AB-B) given the limited data?
Update: @WHuber suggested @Ben's post https://stats.stackexchange.com/a/384951/70282 which suggests:
I converted that to R, tested it and indeed it works.
pooledSD=function(n1,n2,m1,m2,s1,s2) {
sqrt( 1/(n1+n2-1)*( (n1-1)*s1^2 + (n2-1)*s2^2 + (n1*n2)/(n1+n2)*(m1-m2)^2))
}
Testing the above on a synthetic data works perfectly for the union of sets.
Using algebra and solving for s1^2, I get:
I tested the above function now and it works!
P.S. I appreciate the additional background that Ben gives below.