Testing a Null Hypothesis of the form $H_0\colon d(X, Y)<ε$

Question

Given the issues of normality testing (Is normality testing 'essentially useless'?), in particular for large sample sizes, I was wondering if there are any feasible "approximate" normality tests.

Typically, in Null Hypothesis testing, we test a hypothesis of the form $X = Y$, i.e. one tests if, given all the observed data, is there any credence that two distributions differ.

$$ (1) \qquad \boxed{H_0\colon X=Y} $$

This has the effect that in practice tests for normality almost always fail for large sample sizes. So, instead, I am interested in testing a Null Hypothesis of the form

$$ (2) \qquad \boxed{H_0\colon d(X, Y) < ε}$$

where $d$ is a suitable statistical divergence such as the Kullback–Leibler divergence or Wasserstein metric.

In particular, I am interested in automatically testing whether a large sample $X = (x_i)_{i=1:N}$ in $\mathbb R^d$ comes from a distribution close to a standard normal distribution. Here, the allowed closeness is defined by the design parameter $ε$.For example, we may need this to take into account numerical errors and other noise (PRNGs not producing perfect normal number etc).

Question: What are some numerically feasible test I could use?

Background: Creating a unit test for neural network architectures at initialization.

There is good empirical and theoretical evidence, that, in order to have stable training for deep neural networks, it is necessary that these networks satisfy a normalization condition at initialization; that is, we want that if $x∼(0,1)$ then $f(x, θ_0)∼(0,1)$, and this should hold layer- or block-wise. My goal is to build an automated probabilistic unit-test, that takes as input some neural network architecture and tests whether this condition holds approximately. Towards this goal I want to implement a function

def test_approx_normal(x: list[float], tol: float) -> bool:

that returns False if and only if the Null Hypothesis, i.e. that the sample comes from some distribution that is close to a standard normal distribution, is rejected, and True otherwise.

The standard Kolmogorov–Smirnov test does not seem suitable for this task, at least the following example shows $p$-values all over the place.

import numpy as np
from scipy.stats import kstest, norm as normal
from scipy.stats import norm as normal

for n in (10, 100, 1000, 10_000):
    x = normal.rvs(size=(n,))
    A = normal.rvs(size=(n, n)) / np.sqrt(n)
    y = A @ x
    print(n, kstest(x, normal.cdf), kstest(y, normal.cdf), sep="\n")

This is not surprising because this test is not "numerically robust" as it tests for equality between distributions (1) instead of approximate equality (2). However, we clearly see that the distribution follows a standard normal approximately:

The "issues" to which you refer imply there's no point to performing Normality tests, approximate or not, on large datasets. — whuber, May 17 '21 at 13:31
The problem with your two-sided distributional test is that it is not specific enough to permit a sampling distribution of any statistic to be defined. It could potentially work in a parametric setting where the alternatives are narrowly constrained. Ordinarily, one wouldn't conduct a formal test anyway: the chief reason to look at data distributions is to *characterize* them for assessing the likely performance of a test (or other analysis) of direct interest. There are exceptions where theory suggests a distribution and you want to check it: that's where a parametric test becomes feasible. — whuber, Dec 20 '21 at 04:01

score 1 · Answer 1 · answered Dec 15 '21 at 04:00

1

The Kullback-Leibler divergence isn't suitable here because it's designed for mutually absolutely continuous distributions: it doesn't make sense to compute the KL divergence between a discrete empirical distribution and an absolutely continuous reference distribution.

A significant obstacle with a Wasserstein metric would be computing the null distribution of the test statistic, which would be an infimum over an infinite-dimensional $\epsilon$ neighbourhood of the Normal distribution. As far as I know, this distribution isn't known.

If I were going to try this, I'd go for the supremum norm on cumulative distribution functions as the metric, because that's relatively well-studied and computationally simple. Rather than a test of the null hypothesis that distance $>\epsilon$, I'd still probably want a confidence interval for the distance.

answered Dec 15 '21 at 04:00

Thomas Lumley

21,784
1
22
73

You mean something in the direction of a [Kolmogorov–Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)? Could you please elaborate a bit on this point? Also see the edit on my post, what I really try to get is a sort of "probabilistic unit test" for testing approximate normality. – Hyperplane Dec 17 '21 at 13:06
For probabilistic unit testing, which is a bit different from the question when I answered, I'd go for the KS test -- or actually for Massart's inequality, which is slightly more general and the probabilistic test R uses for random number generators as I describe here: https://notstatschat.rbind.io/2018/08/01/testing-probability-distribution-generators/ – Thomas Lumley Dec 19 '21 at 01:41
So I gave it a try with the KS test, and it does not suit my needs. Again, the issue is that it tests for **equality** (1) between distributions instead of **approximate equality** (2). The test that I want should give high p-values for both $x$ and $y$ in the example code for large values of $n$. – Hyperplane Dec 19 '21 at 17:51

score 0 · Answer 2 · answered Dec 20 '21 at 00:43

I think you can use:

$Z=\displaystyle\frac{\hat{\theta}-\theta}{se(\hat{\theta})} \rightarrow N(0,1)$ as $n \rightarrow \infty$.

where $se$ is the ESTIMATED standard deviation.

So I think you could:

$\theta=d(X,Y)-\epsilon$

$\hat{\theta}=\hat{d}(X,Y)-\epsilon$

where $\hat{d}$ is the estimated KL divergence, you can estimate this by using empiric distributions for both $X$ and $Y$ or using the known distribution (but unknown parameters) and to estimate the parameters for both densities and then evaluate KL divergence.

Finally you got:

$CI=(\hat\theta-z_{\alpha/2}.se(\hat{\theta}),\hat\theta+z_{\alpha/2}.se(\hat{\theta}))$

$CI=(\hat{d}(X,Y)-\epsilon -z_{\alpha/2}.se(\hat{\theta}),\hat{d}(X,Y)-\epsilon+z_{\alpha/2}.se(\hat{\theta}))$

then:

$\hat{d}(X,Y)-\epsilon -z_{\alpha/2}.se(\hat{\theta})\leq d(X,Y)-\epsilon \leq \hat{d}(X,Y)-\epsilon+z_{\alpha/2}.se(\hat{\theta})$

$\hat{d}(X,Y) -z_{\alpha/2}.se(\hat{\theta})\leq d(X,Y) \leq \hat{d}(X,Y)+z_{\alpha/2}.se(\hat{\theta})$

so you look over how $\epsilon$ would lie in it. I'd compute $se(\hat\theta)$ by bootstrap.

DISCLAIMER: I'm not sure if $Z$ distribution will hold for KL divergence.

Testing a Null Hypothesis of the form $H_0\colon d(X, Y)<ε$

Question: What are some numerically feasible test I could use?

Background: Creating a unit test for neural network architectures at initialization.

2 Answers2