I'm trying to put together descriptive statistics for a set of differently sized groups of students. For each group of students, let's say I know the # of students who are left handed. I'd like to formulate what it means for a group to have a "higher than average" percentage of left-handedness.
What's a statistically sound way to define mean and standard deviation for this type of data?
Let's say the groups are A, B, C (with student size 30, 50, and 100) And the percentages of left-handedness are 5%, 25%, and 55%. The groups represent what schools the students attend, and I'd like to have descriptive data about the left-handedness tendency of each school (e.g. school C is 1.5 std above from the mean in it's rate of left-handedness). If I combine all the groups together, I can calculate the overall rate of left-handedness, but then how do I get at something like a standard deviation across the groups?
Would it make sense to do a weighted average and weighted std calculation (using student group size for the weighting)? Or does it make more sense to take each percentage as its own data point and do a non-weighted mean and std? Is the overall rate of left-handedness (across all student groups) in this case the same thing as a weighted average?
All I'm trying to do at this point is describe the data in a statistically sound way. Any pointers to resources I can read up on to get a better understanding would also be appreciated. The real data has ~600 groups, with each group varying in size from 2 to 1500. Also, how can I derive a threshold for which groups are "too small" to consider and leave them out of the overall descriptive calculations?