2

What test can be used to indicate a $p$-value when both sets of data in a comparison have standard deviation of 0? For example, Set 1 has values of 1, 1, 1 and Set 2 has values of 5, 5, 5.

Or is it OK to just say that since the standard deviation are 0 for each set and no numbers are the same in each set, that the $p$-value is <0.05?

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
Darryl
  • 21
  • 1
  • 2
    What kind of distribution/experiment does your data come from? Sometimes the mean and deviation are related, e.g. Poisson distribution. – Sextus Empiricus Apr 02 '18 at 13:21

2 Answers2

4
  1. I think it may make sense to conduct a statistical test on this kind of data, but you haven't given much context to know what could be done.

  2. You definitely cannot just conclude that $p < 0.05$ just because there is no variance in the samples. One problem is that to reach a p-value, you need to define a null hypothesis. It's not clear from your question that you've defined a null hypothesis. (What kind of equivalence would be looking for? Means, medians, stochastic equality?) A second problem is that you still need to take the sample size into account. Imagine the edge case where you have one observation for each sample. Can you jump to $p < 0.05$ in this case?

  3. One case you might get data like in your example would be if there are two candidates for a job, say, and you have three independent ratings for each, on a discrete 1 to 5 scale, like a Likert scale. In this case, we can treat the responses as ordered categories and conduct a Cochran-Armitage test. The following does this in R, using functions from a couple of different packages.

Another option may be certain permutation tests.

if(!require(coin)){install.packages("coin")}
if(!require(multiCA)){install.packages("multiCA")}

Input =(
"Rating      1 2 3 4 5
Set
Set1         3 0 0 0 0
Set2         0 0 0 0 3
")

Table = as.table(read.ftable(textConnection(Input)))

library(coin)

chisq_test(Table,
           scores = list("Rating" = c(-2, -1, 0, 1, 2)))

   ### Asymptotic Linear-by-Linear Association Test
   ###
   ### data:  Rating (ordered) by Set (Set1, Set2)
   ### Z = -2.4495, p-value = 0.01431
   ### alternative hypothesis: two.sided

library(multiCA)

multiCA.test(Table)

   ### Multinomial Cochran-Armitage trend test
   ### 
   ### data:  Table
   ### W = 6, df.Set = 1, p-value = 0.01431
   ### alternative hypothesis: true slope for outcomes 1:nrow(x) is not equal to 0
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Sal Mangiafico
  • 7,128
  • 2
  • 10
  • 24
  • 2
    or more easily a two-sample Wilcoxon/Mann-Whitney test: `wilcox.test(x=c(1,1,1),y=c(5,5,5))` – Ben Bolker Apr 02 '18 at 16:01
  • Yes, and another point to make is that the *p* value will change depending on the test selected. For example, with Cochran-Armitage I got a *p* value of 0.014, with Mann-Whitney 0.047, with Mood's median test 0.10, and attempting to use a t-test or ordinal regression failed. And of course the interpretation of the results each of these tests varies. – Sal Mangiafico Sep 21 '19 at 16:54
-1

The short answer is that you cannot do a statistical test on this type of data. The reason is because there is no way to measure chance variation.

I don't remember where I heard this analogy, but this may be a case where you need to ask yourself if a statistical inference is necessary. The analogy is ¿What is the $P$-value to assess that chickens have fewer legs than cows? Of course, barring any anomalies, your data sets will be {2,2,2,...,2} and {4,4,4,...4} (not even sure a sample size for either needs to be provided).

Hope this helps.

Gregg H
  • 3,571
  • 6
  • 25
  • 4
    (1) Tests are possible even without ways to "measure chance variation." (2) *Of course* one can test such data: it comes down to what distributional assumptions will be made. Even with no assumptions at all it's possible to construct reasonably powerful tests (such as permutation tests). – whuber Apr 02 '18 at 16:01
  • Please apply this to the chicken/cow example. ¿How does one meaningful interpret a permutation of the data? – Gregg H Apr 02 '18 at 16:07
  • @GreggH Not the greatest assumption given what we know about legs, but you could say that from a naive approach, legs are discrete countable and so given no other a priori information we could assume counts of legs are Poisson distributed. Based on just that assumption, you would conclude that you need to observe some number of chicken and cow legs to reject a null hypothesis that leg numbers are drawn from a Poisson distribution of equal mean for chickens and cows. You'd also likely conclude that the Poisson distribution was a poor choice, but it would have been a more conservative one. – Bryan Krause Apr 02 '18 at 16:20
  • I think I may be able to articulate my reservations slightly better. I am unconvinced of @whuber's first assertion. Even in part two, with the suggestion of use of permutations, there is an introduction of what might be expected by chance. (And my first follow-up comment was about context, but I'm not sure that is relevant.) If there is no assessment of what happens by chance, ¿what exactly is the foundation of statistical inference (parametric or not)? – Gregg H Apr 02 '18 at 16:50
  • See https://stats.stackexchange.com/a/1836/919 for #1. For #2, chance typically enters explicitly through random sampling or as a modeling assumption. By rejecting the possibility of all statistical testing you are effectively asserting that these data could not have arisen in such ways, but that conclusion is not generally true. – whuber Apr 02 '18 at 17:23