5

When I run an test for something (say 10 trials) and want to find the standard deviation of all 10 trials, I am getting confused if I should use the sample or population standard deviation. My initial thought would be the sample standard deviation since I could run 10 more trials and now have more data points thus never having the complete population.

A lot of the examples I see online involve student grades or finance applications (which I never deal with) but I am having trouble finding a concrete answer on what to do when it is possible to run more tests and have more data points but using all the data points you have to get a standard deviation.

pnd1987
  • 53
  • 1
  • 1
  • 4

2 Answers2

4

The two forms of standard deviation are relevant to two different types of variability. One is the variability of values within a set of numbers and one is an estimate of the variability of a population from which a sample of numbers has been drawn.

The population standard deviation is relevant where the numbers that you have in hand are the entire population, and the sample standard deviation is relevant where the numbers are a sample of a much larger population.

For any given set of numbers the sample standard deviation is larger than the population standard deviation because there is extra uncertainty involved: the uncertainty that results from sampling. See this for a bit more information: Intuitive explanation for dividing by $n-1$ when calculating standard deviation?

For an example, the population standard deviation of 1,2,3,4,5 is about 1.41 and the sample standard deviation is about 1.58.

Michael Lew
  • 10,995
  • 2
  • 29
  • 47
  • 1
    I guess I am still a bit confused. Say I wanted to the determine the hardness of a certain material and got 5 data points (1,2,3,4,5) would I use population or sample? Nothing is stopping me from running an additional 5 tests and getting more data points (6,7,8,9,10) – pnd1987 Aug 30 '20 at 22:34
  • 2
    Are you intending to use your standard deviation as an estimate of the variability of the material hardness in general, or just the variability in your measurements? If it's the former then you have a sample and want to estimate standard deviation of the notional population that is the material in general. If it's the latter then you should use the population standard deviation. The fact that you say that there is nothing stopping you from obtaining more values means that you are sampling: use the sample standard deviation. – Michael Lew Aug 30 '20 at 23:18
0

My question is similar pnd1987's question. I wish to use a standard deviation in order to appraise the repeatability of a measurement. Suppose I'm measuring one stable thing over and over. A perfect measuring instrument (with a perfect operator) would give the same number over and over. Instead there is variation, and let's assume there's a normal distribution about the mean.

We'd like to appraise the measurement repeatability by the SD of that normal distribution. But we take just N measurements at a time, and hope the SD of those N can estimate the SD of the normal distribution. As N increases, sampleSD and populationSD both converge to the distribution's SD, but for small N, like 5, we get only weak estimates of the distribution's SD. PopulationSD gives an obviously worse estimate than sampleSD, because when N=1 populationSD gives the ridiculous value 0, while sampleSD is correctly indeterminate. However, sampleSD does not correctly estimate the disribution's SD. That is, if we measure N times and take the sampleSD, then measure another N times and take the sampleSD, over and over, and average all the sampleSDs, that average does not converge to the distribution's SD. For N=5, it converges to around 0.94× the distribution SD. (There must be a little theorem here.) SampleSD doesn't quite do what it is said to do.

If the measurement variation is normally distributed, then it would be very nice to know the distribution's SD. For example, we can then determine how many measurements to take in order tolerate the variation. Averages of N measurements are also normally distributed, but with a standard deviation 1/sqrt(N) times the original distribution's.

Note added: the theorem is not so little -- Cochran's Theorem

dcouzin
  • 3
  • 1
  • 1
    Welcome to CV. This does not really answer the question. Please submit a new question (and make a reference to this question if needed) when you are in this situation. The site works better this way and you increase your chance to get an answer. Thank you. – Pitouille Nov 17 '21 at 05:28
  • 1
    This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stats.stackexchange.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stats.stackexchange.com/help/whats-reputation), you can also [add a bounty](https://stats.stackexchange.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/307646) – Shayan Shafiq Nov 17 '21 at 06:46