1

There are many ways to test if the data comes from normal distribution. like this post

But the question is how to test if the data follows half-normal distribution?

I know Ks-test, but as the post above states: "The KS test is well-known but it has not much power. It can be used for other distribution than the normal."

So is there a more powerful way?(I considered copy the data and set them negative to test for full normal distribution, but this seems a little problematic)

Jiaqi Ding
  • 11
  • 1
  • 2
    Why bother with a test? Use a smoother, then superimpose the density curve you believe the distribution follows. A picture is worth a thousand tests. – AdamO Jun 14 '21 at 16:11
  • To follow up on @AdamO's suggestion, I want to suggest to you the notion that model selection is a *relative* procedure, not an absolute one. We have no way of knowing whether data "truly" follow a given distribution. What we really care about is whether the data follow a distribution close enough for a given application. A null hypothesis test cannot possibly answer this for you; in fact, null hypothesis testing tells us very little, on the whole. – user3716267 Jun 14 '21 at 16:19
  • Thanks, guys! I totally understands this, and my density curve clearly shows a half-normal dis. But the point is I am curious that is there a general method to test? Because for normal dis, we have Shapiro Wilk. – Jiaqi Ding Jun 15 '21 at 02:31
  • The response to 'How do i get more power?' is ... *against which kinds of alternatives*? – Glen_b Jul 30 '21 at 04:49

1 Answers1

1

Disclaimer: I am no statistician

However, since I am facing the same problem, let me tell you how I am currently trying to tackle it. My method does not verify that data is distributed half-normally, but only flags cases where this is most likely not the case. Note, however, that my method might well be fundamentally flawed.

Working Principle

The nice thing about a half-normal distribution is, that it is determined by only one parameter, the standard-deviation of the underlying normal $\sigma_N$. Source [1] gives equations to relate the mean $\mu_H$, standard deviation $\sigma_H$, quantiles etc. of the half-normal to the $\sigma_N$. Even more, it gives closed-form equations to obtain $\sigma_N$, given $\mu_H$ or $\sigma_H$. For example we see that $$ \sigma_N = \mu_H\sqrt{\frac{\pi}{2}}. $$ This relationship allows us to predict all further datapoints of the distribution just from $\mu_H$. This is something we can leverage: For example, we can use it to predict how likely it is to e.g. obtain the measured standard-deviation $\sigma_H$ given the mean $\sigma_N$.

Methods

Method 1: Variance Test (computational)

Since I was not able to find an analytic expression for the distribution of sample-variances of a half-normal distribution (cf. this question), I tried to do solve it using a small Python script:

import numpy as np
def calc_p_var(mean_h: float, n: int, std_h: float, N: int = 100000) -> float:
        """
        Estimates the probability to measure the standard-deviation std_h given
        half-normal mean mean_h and sample_size n.
        """
        #simulate possible outcomes for standard deviations given the mean
        std_n = mean_h*np.sqrt(np.pi / 2)
        X = np.abs(std_n * np.random.randn(N, n))
        stds_exp = np.std(X, axis=1)
        values, bins = np.histogram(stds_exp, bins=101)
        values = values/N
        #determine p
        return np.sum(values[bins[1:] <= std_h])

Method 2: $L_2$-Test (analytical)

Based on this idea of Xi'an, we can also use the $L_2$ norm, $\Vert X\Vert_2^2:=\sum_i X_i^2$ as a test statistic. The reasoning behind this is as follows: If $X \sim \mathcal{N}(0, \sigma_N)^2$ and we define $Y:=\vert X\vert$, then $\Vert X \Vert_2^2 \sim \Vert Y \Vert_2^2$. This implies that $$ \frac{\Vert Y \Vert_2^2}{\sigma_N^2} = \frac{\Vert X \Vert_2^2}{\sigma_N^2} \sim \chi^2_n $$ (2, 3). This means that we found an analytic expression for the distribution of $\Vert Y \Vert_2^2/\sigma_N^2$ which we can use as a test statistic. In more detail we have that $$ \frac{2\Vert Y \Vert_2^2}{\mu_H^2\pi} \sim \chi^2_n. $$

import numpy as np
import scipy.stats

def calc_p_l2(Y: np.ndarray) -> float:
    """
    Estimates the probability that a distribution does not originate from a half-normal distribution.
    """
    var_n = np.mean(Y)**2 * np.pi / 2
    n = len(Y)
    chi = np.sum(Y.flatten()**2) / var_n
    return scipy.stats.chi2.cdf(chi, n)

I would be really glad if I could some qualified feedback on this method.

check
  • 71
  • 5