1

I am testing the set of data of coefficient J. It looks like this

J coeff

I want to test the normality of this set using the K-S test, but I.m not sure how the parameters should be adjusted here. I wrote

from scipy.stats import kstest
seed(1234)
# normality test
result = kstest(J_ridge.flatten(), 'norm',alternative='greater')
>> KstestResult(statistic=0.49068097451020654, pvalue=0.0)

But I don't know what alternative to use or other parameters needed in the K-S test.

Apprentice
  • 642
  • 1
  • 24
  • 6
    Hi there, welcome to the site! It's worth thinking about why exactly you need a test of normality. You could take a look at the answers here for more context: https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless – Nayef Aug 31 '21 at 12:41

1 Answers1

1

Perhaps a more useful test of normality, if $\mu$ and $\sigma$ are unknown, would be the Shapiro-Wilk test.

The null hypothesis of the Kolmogorov-Smirnov test is that the population from which data are sampled has a specific normal distribution (with specified mean $\mu$ and and standard deviation $\sigma.)$ Consequently, if you used a K-S test, you would need to estimate $\mu \approx \bar X = \frac{1}{n}\sum_{i=1}^n X_i$ and $\sigma^2 \approx S^2= \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2,$ but you would have to allow for that estimation in determining the P-value of the K-S test.

By contrast, the null hypothesis of the Shapiro-Wilk test is that the population from which data are randomly sampled is some normal distribution (with unspecified parameters). Another advantage is that the S-W test has better power (is more likely to detect actual non-normality) for a given sample size.

Example in R:

Sample of size $n=500$ from $\mathsf{Norm}(\mu=100, \sigma=10):$

set.seed(831)
x = rnorm(500, 100, 10)
summary(x);  length(x);  sd(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  73.63   94.31  100.08  100.30  106.79  127.85 
[1] 500        # sample size
[1] 9.575485   # sample standard deviation

The null hypothesis is not rejected at the 5% level; P-value 0.674. So data $X$ are 'consistent with normal'.

shapiro.test(x)

        Shapiro-Wilk normality test

data:  x
W = 0.99754, p-value = 0.674

Linear transformation of $X$ changes parameters, but $Y$ still passes the normality test (P-value unchanged).

y = .5*x + 2
summary(y);  length(y);  sd(y)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  38.81   49.16   52.04   52.15   55.39   65.93 
[1] 500
[1] 4.787742
shapiro.test(y)$p.val
[1] 0.6739557

Nonlinear transformation, destroys normality. S-W test rejects normality for $W$ (P-value below 5%).

summary(w); length(w);  sd(w)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5421    8895   10016   10151   11404   16347 
[1] 500
[1] 1935.915
shapiro.test(w)$p.value
[1] 0.001056042

Graphical displays:

The histogram of $W$ is right-skewed, not normal.

enter image description here

The normal quantile plot of $W$ (right panel) is distinctly nonlinear.

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76
  • (+1 but a couple of quibbles). (1) It's straightforward to allow for the parameters' being estimated in the K-S test - see [A naive question about the Kolmogorov Smirnov test](https://stats.stackexchange.com/q/110272/17230) & [How can one compute Lilliefors' test for arbitrary distributions?](https://stats.stackexchange.com/q/237779/17230) (2) You can only say a test's more powerful than another against specified alternative hypotheses to the null. The S-W test is more powerful than the K-S test against a range of alternatives commonly ... – Scortchi - Reinstate Monica Jan 03 '22 at 19:54
  • 1
    ... thought to be of particular interest, not against all alternatives - see [Is Shapiro–Wilk the best normality test? Why might it be better than other tests like Anderson-Darling?](https://stats.stackexchange.com/q/90697/17230) – Scortchi - Reinstate Monica Jan 03 '22 at 19:55