I have data in the "dat" vector and I am looking to report the weighted mean and also some information on the variation of that mean.
As a toy example you can see the data in the "value" vector and the weight of the data in the "weight" vector:
dat = data.frame(value = c(1,2,3,4,5,6,7,8,9),weight = c(200,2,3,4,5,6,7,8,9))
dat
The weighted mean and sd are 1.98 and 2.28
library(Hmisc)
mu = wtd.mean(dat$value , dat$weight )
sd = sqrt(wtd.var(dat$value ,dat$weight))
mu
sd
> mu
[1] 1.983607
> sd
[1] 2.280653
And Weighted confidence intervals are 3.47 to .49
upperConfidenceInterval = mu + 1.96*(sd/sqrt(9))
lowerConfidenceInterval = mu - 1.96*(sd/sqrt(9))
upperConfidenceInterval
lowerConfidenceInterval
[1] 3.473633
> lowerConfidenceInterval
[1] 0.4935802
BUT the data in this toy example is not normal and in my real data set it is not normal either.
**SO when it comes to providing info on the variation of the data does the weighted sd and confidence interval make sense? OR can I use Chebyshev's inequality with k = 2 to say **
upperConfidenceInterval = mu + 2*(sd/sqrt(9))
lowerConfidenceInterval = mu - 2*(sd/sqrt(9))
at least 75% of the distribution is between 3.5 and .46?
Since the data is not normal and if I don't use Chebyshev's inequality....can you use 1st and 3rd quartiles to give a measure of spread?
Some say to report the 1st and 3rd quartiles so the 1st and 3rd quartiles of the UNWEIGHTED data are 3 and 7. Remember the mean of the WEIGHTED data was 1.98 which is not in the range of UNWEIGHTED 1st and 3rd Quartiles so using UNWEIGHTED 1st and 3rd quartiles doesn't seem to make sense:
quantile(dat$value)[2] # 1st quartile
quantile(dat$value)[4] # 3rd quartile
The WEIGHTED 1st and 3rd Quartiles are .06 and .26 and again the weighted mean is not between the WEIGHTED quartiles:
quantile( dat$value *(dat$weight)/sum(dat$weight) )[2] # 1st quartile
quantile( dat$value *(dat$weight)/sum(dat$weight) )[4] # 3rd quartile
> quantile( dat$value *(dat$weight)/sum(dat$weight) )[2] # 1st quartile
25%
0.06557377
> quantile( dat$value *(dat$weight)/sum(dat$weight) )[4] # 3rd quartile
75%
0.2622951
Since quantiles don't make since I am thinking using the weighted standard deviation and to use Chebyshev's inequality to say at least 75% of the distribution is between 3.5 and .46. Do you agree?