4

When reading scientific articles I see different approaches for summarising continuous variables in describing the study population.

For instance, if I want to describe the mean age of study population, I can display it in combination with the standard deviation, or with its 95% CI. Although these are obviously related, my question is which of those makes the most sense. The way I see it, the SD provides information regarding specific study population, whereas the 95% CI tells me how well my sample mean 'matches' the mean of the total population. Therefore, if I want to describe the study population the SD seems to be the best option to me. However, if the mean age is the outcome variable of my study, I can imagine using the 95% CI.

Any thoughts on this?

Joep_S
  • 183
  • 1
  • 6
  • Related (but not duplicate) to http://stats.stackexchange.com/questions/32318/difference-between-standard-error-and-standard-deviation. – Michael M Jun 10 '16 at 09:31
  • S.D is just a measure of dispersion.95% confidence interval is probability-led and gives range/limits for the mean estimate for a general population( probably a dynamic population). It seems that there is somewhat confusion you are facing.The S.D. is observed value. We make use of it through Standard error for an estimate of range of mean for a general population. –  Jun 10 '16 at 09:36

1 Answers1

3

It depends on what information you want. The CI describes the population you sampled from, the standard deviation can describe both the sample and the population (if we take the sample standard deviation $s$ to be a good estimator $\hat\sigma$ of the true population value $\sigma$).

Looking at your question, you seem to be bumping into one of those times when casual vocabulary and statistical vocabulary trample on each other's toes. What you call "specific study population" is your sample and the "total population" is simply population.

With that terminological note, using confidence interval to describe your sample isn't appropriate because the sample is finite and "complete" and you use descriptive statistics (standard deviation,etc.) on completely sampled groups. Inferential statistics such as the CI should only be used make statements about the "incomplete", i.e. the population form which you drew your sample. In this sense, the CI doesn't describe your sample at all, but rather the population you drew your sample from. (Somewhat more precisely, but still simplifying a bit, the CI is an interval computed from the sample using a procedure that, in at least 95% of random samples from any population, will include the population's mean. [Thanks @whuber!] But make sure to look at the comments below for some "fine print" and discussion on the definition of the CI.) You correctly got at this intuition in your question, even if you stumbled a bit on the vocabulary.

In terms of practical advice, it really depends on what you want to do. If you just want to claim that your sample was well-balanced / representative / whatever, use descriptive statistics on your sample. If you want to make inferences about certain parameters in the broader population, use inferential statistics.

Bottom line: it depends on whether you want to describe your sample or use your sample to make statements about the general population.

Livius
  • 2,066
  • 14
  • 17
  • Plus, it makes no sense to talk about a *confidence* interval for observed data. A 95% quantile is a 95% quantile, not a CI. [See here.](http://stats.stackexchange.com/a/217377/1352) – Stephan Kolassa Jun 10 '16 at 09:57
  • This is quite well written. I stumbled, though, at the characterization of CI, because it's too easily misinterpreted: the CI one computes from a sample definitely is *not* the "range of values you should get if you were to repeat your sampling procedure many times." See http://stats.stackexchange.com/questions/26450/why-does-a-95-ci-not-imply-a-95-chance-of-containing-the-mean for more. – whuber Jun 10 '16 at 12:54
  • @whuber I was aware of that issue and couldn't really think of a good way to explain the real meaning of CI in a sidebar (hence my 'but still simplifying a bit'). The common interpretation "95% chance of containing the true value" doesn't jive with the frequentist interpretation (as thoroughly discussed in that other question). I was trying to get at the inherent link to repetition (frequencies) in frequentism. Do you have a suggestion for getting at that aspect of the construction? The inverted hypothesis test is a nice, short explanation but less obviously connected to the repetition bit. – Livius Jun 10 '16 at 13:07
  • @whuber Or maybe the ideal solution is to just remove that sentence entirely and not even try to explain the subtleties of frequentist probability and confidence here? – Livius Jun 10 '16 at 13:09
  • It's tricky. Perhaps, to emphasize the idea that the CI is describing the population, you could characterize (say) a 95% CI as *an interval computed from the sample using a procedure that, in at least 95% of random samples from any population, will include the population's mean.* (Although that's not a simplification, it's incomplete in two ways: first, there are some possible populations for which the interval will contain their mean *at most* 95% of the time. Second, "any population" really means only populations *whose distributions are among those assumed by the CI procedure*.) – whuber Jun 10 '16 at 13:34