0

I have a list of employees at a company. I want to show some comparison of some demographics (say, %female) of those who have left the company and those who have stayed with the company. However, I would like to properly quantify the error. I'm running into a confusion in doing this calculation:

  • On one hand, I have the EXACT percentage of everything within the company (demographics, active/separated employees), so it seems like this is descriptive statistics
  • On the other hand, there definitely seems to be an inherent error on these percentages. If one female leaves some department of size 5, this is far more variable than if a female leaves some department of size 10000. This is what I would like to quantify as errors to these percentages. However, all descriptions of how to calculate a 'Difference Between Proportions', say, this one, refer to samples from a population which would involve an inferential statistic.

So am I looking at this the wrong way? Do I really have some greater population that includes all possible employees of the company, and the 'sample' is what was selected with the current employees? Or is this really a descriptive calculation and the variance of these percentages are calculated a different way. Thanks. (And sorry for the newbie question, but can't find any answers to this.)

Joe
  • 151
  • 4
  • For your first question, I would read these responses: http://stats.stackexchange.com/questions/2628/statistical-inference-when-the-sample-is-the-population – mandata May 19 '15 at 17:44
  • You seem to be conflating the ideas of standard deviation and error, or do you mean 'standard error'? – mandata May 19 '15 at 17:46
  • Thank you for your help mandata. Sorry I don't have a great deal of statistical training, so my terminology may be incorrect. I am referring to the notion I described above. If it is true that having one female join or leave a department with 10 people would radically change a descriptive percentage of females compared to a department with 10000 people, then there is some 'error' which describes this 'variance'. I'm not sure what it's called, but this is what I'm referring to. But it seems based on your links that the proper way to think about it is a 'sample' of some larger population. – Joe May 19 '15 at 20:15

1 Answers1

0

A department that belongs to a larger organization is not a sample.

Samples must be random, so that results extracted from the sample can be extrapolated to the population. This sample (all the individuals in a department) is not random, so it can only produce biased inference.

Statistics derived from different departments are not entirely comparable because the department is a factor in the dynamics of the behaviour of individuals within the department.

Smaller groups will have larger variance. This is an artifact of statistics. There is more uncertainty, and this suggests error in the process, but is generally referred to as greater variability. It is not called error. You can adjust the variance of different size groups to compare them, or test if their variance is significantly larger or smaller (as long as one group is not too small). So the the difference in variability is generally not a big deal.

mandata
  • 848
  • 6
  • 10
  • I'm not trying to extrapolate from the department to the organization, I'm trying to extrapolate from employees in a department to the population of candidates that could, have, or will join the department. To what degree is this random? Well there certainly is SOME randomness. It's just blind luck for some people why they apply to an organization. But, no, it certainly isn't coin-flip random. – Joe May 20 '15 at 14:55
  • But I'm completely fine if it's not and standard inferential statistics is not the way to go, I just want to quantify this 'variability'. So then the question is, how to I quantify this variance? Would you have a good reference because I can't find anything really? – Joe May 20 '15 at 14:55