8

Suppose we have $n$ samples $s_1,...,s_n$ from an unknown real-valued distribution $D$. We are interested in a statistic to test if $D$ is symmetric around zero. (In my application, $n$ is only about 50, so I'm interested in the non-asymptotic regime.)

If $D$ is symmetric and $s\sim D$, then $\Pr(s>0)=\Pr(s<0)$, so by comparing the count of positive versus negative $s_i$, we get a simple statistic for a zero median, which is necessary for symmetry. In the hopes of a stronger test, I was going to use a two-sample Kolmogorov-Smirnov test to compare $s_1,...,s_n$ with $-s_1,...,-s_n$. But then it belatedly struck me that perhaps I wasn't the first person to think about this problem :)

Any reference suggestions or (ideally) actual test statistics would be very much appreciated.

Bill Bradley
  • 741
  • 3
  • 11
  • You don't have a test for symmetry, you have a test for whether the median equals 0. The following link https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907265/ (On Bootstrap Tests of Symmetry About an Unknown Median) contains a few tests of symmetry, it may be of value to you. – jbowman Feb 26 '18 at 01:49
  • @jbowman Good point about the median; I've corrected the above. And thank you for that reference, which I'll read. I notice that the Kolmogorov-Smirnov test is Section 2.3 of that reference, which suggests that it's somewhat standard. – Bill Bradley Feb 26 '18 at 02:25
  • 1
    Another good reference is Miao, GEL, Gastwirth 2006:https://www.researchgate.net/publication/239329925_A_New_Test_of_Symmetry_about_an_Unknown_Median – lzstat Nov 01 '19 at 06:21

2 Answers2

2

If you only look at the counts of positive and negative values then you lose a ton of information, so your test will not be powerful. That also fails to test symmetry fully, since it cannot distinguish symmetric distributions from non-symmetric distributions with equal probabilities of positive and negative outcomes.

Consequently your latter idea (for a two-sample Kolmogorov-Smirnov test comparing $\mathbf{s}$ and $-\mathbf{s}$) shows that you are thinking in the right direction. Whilst this is a good place to start, the specific test you are proposing has serious problems due to the fact that (a) it essentially "double-counts" the data; and (b) the two data vectors are not independent of each other. Below I will show that this leads to a non-uniform p-value under the null hypothesis. To do this, let's first program your proposed test in R. (I've added a few bells and whistles using the general methods set out here.)

symmetry.test <- function(x, median = 0, exact = NULL, ...) {
  
  #Get data information
  DATA.NAME <- deparse(substitute(x))
  xx <- x-median
  n <- length(x)
  
  #Implement KS test
  TEST <- ks.test(x = -xx, y = xx, alternative = 'two.sided', exact = exact, ...)
  TEST$method      <- 'Symmetry test using KS test for magnitudes of sub-samples'
  TEST$alternative <- paste0('Sampling distribution is not symmetric around ', 
                             round(median, 4))
  TEST$data.name   <- paste0('Sample vector ', DATA.NAME, ' containing ', n, ' values')
  
  TEST }

Now, let's try implementing this on a symmetric distribution (e.g., the standard normal distribution) and look at the resulting distribution of the p-value. I will do this by simulating $M=10^6$ random samples of size $n=100$ and computing the p-value of your test. As you can see from the histogram below, the p-value for the test is not uniform in this case.

#Set parameters for simulation
n <- 100
M <- 10^6

#Simulate p-values from the test (with symmetric distribution)
set.seed(1)
PVALS <- rep(0, M)
for (i in 1:M) {
  DATA <- rnorm(100)
  TEST <- symmetry.test(DATA)
  PVALS[i] <- TEST$p.value }

#Show histogram of p-values
hist(PVALS, xlim = c(0,1), breaks = (0:100)/100, freq = FALSE, col = 'blue',
     main = 'Simulation of p-value from proposed test', 
     xlab = 'p-value', ylab = 'Density')

enter image description here

So, unfortunately your proposed test doesn't really work. If might be possible to salvage it my modifying the test somehow to take account of the deviations from standard assumptions. As a starting point, I'd suggest that it might be better to do a two-sample test comparing the vectors $\mathbf{s}_-$ and $\mathbf{s}_+$ defined by $s_i^- \equiv \max(0, -s_i)$ and $s_i^+ \equiv \max(0, s_i)$. That would ameliorate the problem of "double-counting" the data and it would greatly lessen the statistical dependence between the two vectors. Since these data vectors are "censored" you would need some non-parametric test that can handle censored data (the standard KS-test is not built for this case). If you were to develop the test in that direction, you might be able to build one that gives a uniform p-value for symmetric distributions.

Of course, as a final observation, you could just use one of the standard symmetry tests in the statistical literature for the case where the median is unknown. That is the standard case of interest in the literature, and there are many existing tests that have been developed. There is a slight loss of power from failing to use the known median in your problem, but it should not make too much difference once you have a decent amount of data.

Ben
  • 91,027
  • 3
  • 150
  • 376
-1

Sort by increasing values, then by decreasing values, compute the correlation coefficient (always negative), add 1, and divide the result by 2. The final value is 0 if and only if the sample is symmetric. The maximal value is 1/2. No assumption about the value of the mean or the median is needed. There are tables of p-values for various sample sizes under the assumption of normality or of uniformity: https://arxiv.org/abs/2005.09960 (ref. to papers included). Hope it helps.

Petitjean
  • 29
  • 2
  • This is a nice approach to test for *symmetry,* but it doesn't test for *symmetry about $0,$* which is what the question asks. – whuber Sep 20 '21 at 13:30
  • 1
    If the symmetry test is passed, accept that median=mean, then test the mean to zero: https://en.wikipedia.org/wiki/Student's_t-test#One-sample_t-test – Petitjean Sep 21 '21 at 15:19
  • 1
    You have thus conducted two tests with two p-values: how do you combine them into a single p-value? – whuber Sep 21 '21 at 15:47
  • Yes there are two tests (two p-values, not to be combined). It is informative: in the case it is concluded that the distribution is not symmetric around zero, it tells you if this is because (a) the distribution is symmetric and the mean is not zero, or (b) the distribution is not symmetric and the mean is zero, or (c) the distribution is not symmetric and the mean is not zero. – Petitjean Sep 22 '21 at 16:22
  • 2
    Since the question asks for a *single* test of symmetry, you *must* combine the two results you obtain into a single, valid p-value. – whuber Sep 22 '21 at 17:39
  • 1
    Let the reader decide. – Petitjean Sep 23 '21 at 18:12