6

I was just involved in a Q/A were a poor guy was requested to run a statistical test to prove that algorithm A is better than two other algorithms. However, he has only 4 data points. Does it really make sense to make a statistical test on 4 points? Where is the limit? On three?

To clarify, I understand that there are 12 numbers reported but for me it looks more like either 4 three-dimensional data points, or 3 four-dimensional data points.

In their answers, authors introduce some assumptions about the underline distributions in order to artificially increase the number of data points and in the process compute mean values of four numbers; or perform t-tests on pairs of algorithms (comparing 8 numbers in total for each pair) and again making unfounded assumptions about underline distribution.

How reliable is this process when you don't know the underline distribution and because you don't have enough data you can't hope to be able to infer/validate it? Isn't it more fair to just say that there isn't much you can do with so few data?

iliasfl
  • 2,514
  • 17
  • 30
  • FYI, even though your question might stand on its own, I looked at the link you provided and it doesn't look that the person only has 3 data points... – Patrick Coulombe Jan 24 '14 at 00:04
  • Thanks, indeed it is 4 datasets, not 3. I don't think it changes much about the question though... – iliasfl Jan 24 '14 at 00:08
  • 4 datasets averaged over, for each algorithm = at least 12 data points... he has at least 4 data points PER GROUP, with 3 groups. – Patrick Coulombe Jan 24 '14 at 00:08
  • It is a matter of point of view. Either 4 three-dimensional data points, or 3 four-dimensional data points. – iliasfl Jan 24 '14 at 00:12
  • 3
    He doesn't have 4 data points, he has 12. And the question has answers at the link. So, what is new about your question? If nothing, then I think this should be closed. – Peter Flom Jan 24 '14 at 00:12
  • I just saw the answer by Greg. It basically introduces some assumptions about the underline distributions in order to artificially increase the number of data points and in the process computes means of 4 real numbers etc. In Marcs's answer every t-test uses 8 numbers, not 12, and again assumptions about underline distribution. For me, my question stands, maybe the answer is that 8 or 12 data points are more than enough. – iliasfl Jan 24 '14 at 00:34
  • 2
    You can certainly do a t-test with 2 groups of 4. – Peter Flom Jan 24 '14 at 01:01
  • Indeed, one of my answers here illustrates an example where a t-test is performed where there are 3 observations compared with a single observation, and which clearly suggests to us that it's quite possible to do a two-sample t-test with 3 observations *total* (reducing the sample size in the larger group by 1). Indeed, I've seen a one-sample test with a single observation. So the title question has a trivial answer ("yes, obviously"). In tiny samples (i) you tend to rely more heavily on the assumptions, and (ii) power is very low, so you need either huge effects or very small variances. – Glen_b Jan 24 '14 at 01:12
  • Here's [the post I mentioned](http://stats.stackexchange.com/questions/44475/is-there-a-statistical-test-to-compare-two-samples-of-size-1-and-3/) – Glen_b Jan 24 '14 at 01:14
  • Thanks for your comments. I am covered by the phrase "rely more heavily on the assumptions", which in real world bring us to on how you can validate those assumptions, which brings us back to my original question. I didn't check the whole thing but that guy got a p-value of 0.97... Anyway, not sure about the down votes of my question. I didn't meant to offend anyone, thanks. – iliasfl Jan 24 '14 at 02:03
  • 1
    You can do an honest-to-goodness test with *one* value (and modest, realistic assumptions): http://stats.stackexchange.com/a/1836. BTW, downvotes are not intended to reflect offense or annoyance, but rather to indicate (as the hovertext indicates) that a question is ill-posed or badly researched. I think there's a decent question here but as it stands readers have to wade through the comments to understand what you are asking: could you please edit it? Note, too, that the last paragraph may be perceived as a misplaced rant (which it is) and ought to be deleted. – whuber Jan 24 '14 at 16:58
  • Thanks for the reference. I did some heavy edits based on comments here and removed sentiment. – iliasfl Jan 24 '14 at 21:52

3 Answers3

14

I have a friend who used to work for the US defense department (long time ago, cold war era) and was once asked to answer a question using a single data point. When he insisted that he needed more data he was told that the person who had provided the single data point had been caught and executed for espionage shortly after providing the single data point, so there would be no more data coming. That is when my friend started to learn about Bayesian statistics.

I also remember seeing an article several years ago, possibly in the American Statistician, possibly in Chance, that derived a way to compute a confidence interval for a mean based on a single data point (the 95% interval from a value of x was something like -x to 3*x) if you were willing to make certain assumptions (and the usual diagnostics were not of any help with only 1 point).

So, yes, you can do valid statistics with very small sample sizes, but you will tend to have low power/precision and large sample properties will not help you, so violations of any assumptions will have a potentially much larger impact.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • 6
    I believe you may be talking about [this paper](http://www.jstor.org/discover/10.2307/2684348) - D. Edelman (1990) "A Confidence Interval for the Center of an Unknown Unimodal Distribution Based on a Sample of Size 1", *The American Statistician*, Vol. 44, No. 4, pp. 285-287 – Glen_b Jan 24 '14 at 06:21
  • That is indeed interesting. Thanks for sharing the story and the paper. – Underminer Jan 24 '14 at 21:50
  • 2
    @Glen_b I provided a more recent (2001) reference to TAS in my answer at http://stats.stackexchange.com/a/1836/919. It references the Edelman paper (see the bottom right of the first page). – whuber Jan 24 '14 at 22:14
  • Thanks @whuber - that would be a better fit to 'several years ago'. A very readable paper, too. – Glen_b Jan 25 '14 at 00:05
  • I think the one referenced by @whuber is the one I was remembering. – Greg Snow Jan 25 '14 at 01:45
  • Great stuff! Very useful information provided here by Greg, Glen, and whuber. – Graeme Walsh Jan 25 '14 at 02:14
2

Short answer: yes, but your results will usually be useless.

Long answer: Statistics often involves forming some kind of inference about underlying parameters based on data, with constraints on the probability of a False-Positive and/or of a False-Negative. In a typical test, i.e. testing if a sample came from a given distribution, we put an upper bound (called alpha) on the probability of a Type-I Error (False Positive), mostly for two reasons:

  • In practice that is the only kind of error you can put a bound on, b/c of the nature of your null hypothesis
  • False Positives are usually considered more terrible than False Negatives (a corollary of Occam's Razor)

Holding alpha constant, beta (upper bound of probability of False-Negative) is generally larger for smaller data sets. And when beta is large, your overall probability of producing a Positive is very small, and so your test will almost always return Negative, which is not very different from just accepting your Null Hypothesis from the get-go. In this situtation we say the statistical test is not very powerful.

1

I helped with a geological project where the researchers had a single data point, accompanied by a very reliable uncertainty bound. They were interested in testing a geological model (a set of differential equations describing the evolution of tectonic plates) which made a very specific prediction for the value of that single datum. Given its uncertainty distribution, we could straightforwardly calculate a p.value, given the model is true, and reject the null hypothesis convincingly. So, in that case, I would argue that we successfully 'did statistics' with a single data point (and its uncertainty).

gregory_britten
  • 1,253
  • 9
  • 15
  • 1
    Was this result published somewhere? It would be useful to be able to give doubters an example that such cases can legitimately occur in research. – Glen_b Jun 13 '18 at 04:21