How to test if data follows a half-normal distribution?

Question

There are many ways to test if the data comes from normal distribution. like this post

But the question is how to test if the data follows half-normal distribution?

I know Ks-test, but as the post above states: "The KS test is well-known but it has not much power. It can be used for other distribution than the normal."

So is there a more powerful way?(I considered copy the data and set them negative to test for full normal distribution, but this seems a little problematic)

Why bother with a test? Use a smoother, then superimpose the density curve you believe the distribution follows. A picture is worth a thousand tests. — AdamO, Jun 14 '21 at 16:11
To follow up on @AdamO's suggestion, I want to suggest to you the notion that model selection is a *relative* procedure, not an absolute one. We have no way of knowing whether data "truly" follow a given distribution. What we really care about is whether the data follow a distribution close enough for a given application. A null hypothesis test cannot possibly answer this for you; in fact, null hypothesis testing tells us very little, on the whole. — user3716267, Jun 14 '21 at 16:19
Thanks, guys! I totally understands this, and my density curve clearly shows a half-normal dis. But the point is I am curious that is there a general method to test? Because for normal dis, we have Shapiro Wilk. — Jiaqi Ding, Jun 15 '21 at 02:31
The response to 'How do i get more power?' is ... *against which kinds of alternatives*? — Glen_b, Jul 30 '21 at 04:49

check · Answer 1 · 2021-07-30T12:51:01.913

Disclaimer: I am no statistician

However, since I am facing the same problem, let me tell you how I am currently trying to tackle it. My method does not verify that data is distributed half-normally, but only flags cases where this is most likely not the case. Note, however, that my method might well be fundamentally flawed.

Working Principle

The nice thing about a half-normal distribution is, that it is determined by only one parameter, the standard-deviation of the underlying normal $\sigma_N$. Source [1] gives equations to relate the mean $\mu_H$, standard deviation $\sigma_H$, quantiles etc. of the half-normal to the $\sigma_N$. Even more, it gives closed-form equations to obtain $\sigma_N$, given $\mu_H$ or $\sigma_H$. For example we see that $$ \sigma_N = \mu_H\sqrt{\frac{\pi}{2}}. $$ This relationship allows us to predict all further datapoints of the distribution just from $\mu_H$. This is something we can leverage: For example, we can use it to predict how likely it is to e.g. obtain the measured standard-deviation $\sigma_H$ given the mean $\sigma_N$.

Methods

Method 1: Variance Test (computational)

Since I was not able to find an analytic expression for the distribution of sample-variances of a half-normal distribution (cf. this question), I tried to do solve it using a small Python script:

import numpy as np
def calc_p_var(mean_h: float, n: int, std_h: float, N: int = 100000) -> float:
        """
        Estimates the probability to measure the standard-deviation std_h given
        half-normal mean mean_h and sample_size n.
        """
        #simulate possible outcomes for standard deviations given the mean
        std_n = mean_h*np.sqrt(np.pi / 2)
        X = np.abs(std_n * np.random.randn(N, n))
        stds_exp = np.std(X, axis=1)
        values, bins = np.histogram(stds_exp, bins=101)
        values = values/N
        #determine p
        return np.sum(values[bins[1:] <= std_h])

Method 2: $L_2$-Test (analytical)

Based on this idea of Xi'an, we can also use the $L_2$ norm, $\Vert X\Vert_2^2:=\sum_i X_i^2$ as a test statistic. The reasoning behind this is as follows: If $X \sim \mathcal{N}(0, \sigma_N)^2$ and we define $Y:=\vert X\vert$, then $\Vert X \Vert_2^2 \sim \Vert Y \Vert_2^2$. This implies that $$ \frac{\Vert Y \Vert_2^2}{\sigma_N^2} = \frac{\Vert X \Vert_2^2}{\sigma_N^2} \sim \chi^2_n $$ (2, 3). This means that we found an analytic expression for the distribution of $\Vert Y \Vert_2^2/\sigma_N^2$ which we can use as a test statistic. In more detail we have that $$ \frac{2\Vert Y \Vert_2^2}{\mu_H^2\pi} \sim \chi^2_n. $$

import numpy as np
import scipy.stats

def calc_p_l2(Y: np.ndarray) -> float:
    """
    Estimates the probability that a distribution does not originate from a half-normal distribution.
    """
    var_n = np.mean(Y)**2 * np.pi / 2
    n = len(Y)
    chi = np.sum(Y.flatten()**2) / var_n
    return scipy.stats.chi2.cdf(chi, n)

I would be really glad if I could some qualified feedback on this method.

How to test if data follows a half-normal distribution?

1 Answers1

Working Principle

Methods

Method 1: Variance Test (computational)

Method 2: $L_2$-Test (analytical)

Linked