How to calculate expected value and standard deviation if I have 100 values divided into 15 groups (normal distribution)?

Question

These are the values:

18.2    17.6                                                
------------------------------------------------------------------------------------------------------------
18.9    18.6                                                
------------------------------------------------------------------------------------------------------------
20  19.7    19.8    19.6                                        
------------------------------------------------------------------------------------------------------------
21.4    20.6    21                                          
------------------------------------------------------------------------------------------------------------
21.9    22.1    22.2    22.2    21.6                                    
------------------------------------------------------------------------------------------------------------
23.1    22.6    23.2    23  22.9    23.2    23.2    23  22.5    22.5                
------------------------------------------------------------------------------------------------------------
23.8    24.1    23.8    23.6    24.3    23.7    24  24.2    24.4    23.8    23.5            
------------------------------------------------------------------------------------------------------------
24.7    24.9    25  25.4    25.1    25.3    25.3    25.2    25.2    24.7    24.6    24.5    24.5    
------------------------------------------------------------------------------------------------------------
25.6    26  25.6    25.6    25.6    25.7    25.6    26.3    26.3    26.2    26  25.9    26.2    26
------------------------------------------------------------------------------------------------------------
27  27.2    27  27.2    26.7    27  26.6    27.3    26.8    26.6    26.9    27  26.5    
------------------------------------------------------------------------------------------------------------
28.2    27.7    28.1    27.9    27.6    27.7    28.1    28.2    27.5                    
------------------------------------------------------------------------------------------------------------
28.6    29.1    28.7    29.1                                        
------------------------------------------------------------------------------------------------------------
30.3    29.6    29.7    29.5                                        
------------------------------------------------------------------------------------------------------------
30.9    31  31  30.5                                        
------------------------------------------------------------------------------------------------------------
32.3    31.6
------------------------------------------------------------------------------------------------------------

These are the groups:

 - 17.5 -   18.5
 - 18.5 -   19.5
 - 19.5 -   20.5
 - 20.5 -   21.5
 - 21.5 -   22.5
 - 22.5 -   23.5
 - 23.5 -   24.5
 - 24.5 -   25.5
 - 25.5 -   26.5
 - 26.5 -   27.5
 - 27.5 -   28.5
 - 28.5 -   29.5
 - 29.5 -   30.5
 - 30.5 -   31.5
 - 31.5 -   32.5

I'm supposed to calculate the standard deviation and the expected value ... How is it different comparing to having only 15 values and not divided them into groups? If I only have 15 values, I know how to calculate the stand.deviation and the exp.value. I use the formulas that are well-known:

Exp.value: ............(1/n)*Sum(xi)..................where xi = all the 15 values <br>
Stand.deviation: ...(1/(n-1))*Sum( xi - u)^2...where xi = all the 15 values and u =  exp.value

is this homework? What is the rationale behind the grouping? What is the application / context? — David LeBauer, May 25 '12 at 20:20
(1) Note that the "Stand.deviation" formula actually is for an *estimator* of a *variance* and that both formulas incorrectly refer to 15 values, whereas 100 values are in this dataset. (The use of "exp.value" in place of "mean" suggests these data should be treated as a *population,* whence the `n-1` probably should be replaced by `n`.) (2) You ask about the difference between treating these data as 100 values and 15 values. Why don't you just compute the formulas you think should apply and compare what you get? — whuber, May 25 '12 at 20:37
I have no idea what do they mean by that... Presumably, the exp.value is gonna be calculated the same as in the case of 15 values. However, the stand.deviation should probably be calculated for each group separately and then getting the stand.deviation by using those stand.deviation of each group. [The resuls are: exp.value: 25.37 and stand.deviation: 3.103] — user1111261, May 25 '12 at 20:39
My guess is that they are referring to grouped means and grouped standard deviations. A question like that came up once recently either here or on the mathematics site. The grouped mean is obtained by taking a weighted average of the midpoint of the bin where the weight is the number of cases falling in the bin. The same goes for the variance except the square differences are taken by square the difference of the midpoint of the bin with the weighted mean obtained previously as the grouped mean. I don't see where the normal distribution assumption comes into play. — Michael R. Chernick, May 25 '12 at 21:13
You're probably thinking of http://stats.stackexchange.com/questions/18797/, @Michael. A related question is http://stats.stackexchange.com/questions/10433/. — whuber, May 25 '12 at 21:15
@whuber Those are both appropriate references but not the question I saw. I will look for my reference. — Michael R. Chernick, May 25 '12 at 21:23
If you have the actual values, why not ignore the categories and just calculate the mean and standard deviation? Using categories makes the calculation less precise. Or is this a homework exercise to help you understand that? — Joel W., May 25 '12 at 21:58
@JoelW That seems like the right thing to do but the question was phrased in a way to make me think he was looking for something else based on the reference he was reading. He could do it either way with his data but your suggestion would gie a mor accurate answer. — Michael R. Chernick, May 25 '12 at 23:14
@whuber my source was math.stackexchange.com/questions/148629/calculating-expected-value-standard-deviation-when-i-have-frequencies-and-intervals-in-percentage or math.stackexchange.com/questions/148629 for short — Michael R. Chernick, May 25 '12 at 23:16
I think I finally understood the difference the grouping makes. The expected value is not calculated as an arithmetic mean of the values but I have to find the midpoint of each group and then the weighted average of all the midpoints will give me the expected value. However, the calculation of the stand.deviation is no different... — user1111261, May 26 '12 at 09:08

leonbloy · Accepted Answer · 2012-05-26T18:40:56.393

The statement is confusing, but I guess:

Given the 100 sample values, you know the common way (not unique) to estimate the mean and standard deviation:

$$ \bar{x}= \frac{1}{100} \sum_{i=1}^{100} x_i=25.175 , \hspace{2cm} \hat{\sigma} = \sqrt{\frac{1}{99} \sum_{i=1}^{100} (x_i - \bar{x})^2 } = 3.47529 $$

But suppose, instead, that you have grouped your data (for example, to compute an histogram) assuming intervals centered at representative points $(18,19,20 \cdots 31, 32)$ That means that, for example, the data point $x_i=29.6 $ will be counted as belonging to the interval $(28.5,29.5)$ (representative value=29), and that's the only information you retain.

Then, the "new" (approximate) mean and variance will be computed only counting how many values fall in each group, which is equivalent to compute the above but assuming that each value is replaced by the representative value (the center of each interval).

$$ \bar{x}= \frac{1}{100} \sum_{i=1}^{100} y_i= \frac{1}{100} \sum_{g=1}^{15} n_g \, y_g = 25.37 $$

$$ \hat{\sigma} = \sqrt{\frac{1}{99} \sum_{g=1}^{15} n_g (y_g - \bar{x})^2 } = 3.1031 $$

Here, $n_g$ is the number of values that fall in each interval, and $y_g$ is the central point of that interval.

Of course, this is less precise, because you are loosing information - it would hardly make sense to compute this instead of the former, but I guess this is an exercise to understand the difference.

How to calculate expected value and standard deviation if I have 100 values divided into 15 groups (normal distribution)?

1 Answers1