2

I am not sure if this type of question is appropriate but I am hoping for some feedback in any case.

I am trying figure out how to create a distribution table myself. The table I am looking for is based on Fisher as seen in the book Fuller's Introduction to Statistical Time Series (1996) - please see Table 7.1.2 on p. 364 in this book.

The $\xi$ distribution is given by the following (taken from p. 363 in same book):

$$ P\{m^{-1} \xi>g\} = \sum_{j=1}^k\left(-1\right)^{j-1}\left(\begin{matrix} m \\ j \end{matrix}\right)\left(1-jg\right)^{m-1} $$

The reason why I want to create the table myself is because I am testing on sample sizes larger than 1000 and the distribution table in the book is capped at 1000. I have searched Google Scholar and Wiki but I have not been able to find the distribution anywhere except in the mentioned book.

Can anybody demonstrate how to create distribution table in a simple spreadsheet?

Thanks

Sha
  • 123
  • 5

1 Answers1

2

A quick Google search shows that A. A. Nowroozi has computed a table for $g$ (the table you referred to is a table of $m\times g$) up to $m=2000$, and M. Shimshoni has computed a similar table for $m\leq 3000$. See Table for Fisher's Test of Significance in Harmonic Analysis and On Fisher's Test of Significance in Harmonic Analysis.

If you intend to make a table yourself, then you can do so by utilizing root-finding algorithms. Here is a snippet of R code that demonstrate how to make a table for $m\leq5000$ with bisection method (to simplify the work you can use the built-in uniroot function):

f = function(g, m) {
  if (g >= 1) {
    return(0)
  } else if (g <= 0) {
    return(1)
  }

  k = floor(1 / g)
  # ditch high order terms to reduce computation
  if (k > 150) {
    k = 150
  }
  xi = sapply(1:k, function(j) {(-1)^(j - 1) * choose(m, j) * (1 - j * g)^(m - 1)})

  return(sum(xi))
}

Bisection = function(m, p) {
  u = 1
  l = 0
  x = (u + l) / 2
  y = f(x, m) - p

  # set accuracy
  while (abs(y) >= 1e-08) {
    if (sign(y) == sign(f(u, m) - p)) {
      u = x
    } else {
      l = x
    }

    x = (u + l) / 2
    y = f(x, m) - p
  }

  return(x)
}

m = seq(500, 5000, by = 100)
xi1 = m * sapply(m, Bisection, p = 0.1)
xi2 = m * sapply(m, Bisection, p = 0.05)
xi3 = m * sapply(m, Bisection, p = 0.01)
xi = cbind(xi1, xi2, xi3)
colnames(xi) = c("0.10", "0.05", "0.01")
rownames(xi) = m
round(xi, 3)

and here is the resulting table:

       0.10   0.05   0.01
500   8.418  9.123 10.721
600   8.606  9.313 10.916
700   8.764  9.473 11.079
800   8.901  9.612 11.220
900   9.022  9.733 11.344
1000  9.130  9.842 11.454
1100  9.227  9.939 11.553
1200  9.316 10.029 11.644
1300  9.397 10.111 11.727
1400  9.473 10.186 11.803
1500  9.543 10.257 11.875
1600  9.608 10.323 11.941
1700  9.670 10.384 12.003
1800  9.728 10.443 12.062
1900  9.783 10.498 12.118
2000  9.834 10.550 12.170
2100  9.884 10.599 12.220
2200  9.931 10.647 12.268
2300  9.976 10.692 12.313
2400 10.019 10.735 12.357
2500 10.060 10.776 12.399
2600 10.100 10.816 12.439
2700 10.138 10.854 12.477
2800 10.175 10.891 12.514
2900 10.210 10.927 12.550
3000 10.244 10.961 12.585
3100 10.278 10.994 12.618
3200 10.310 11.026 12.650
3300 10.341 11.058 12.681
3400 10.371 11.088 12.712
3500 10.400 11.117 12.741
3600 10.428 11.146 12.770
3700 10.456 11.173 12.798
3800 10.483 11.200 12.825
3900 10.509 11.226 12.851
4000 10.535 11.252 12.877
4100 10.560 11.277 12.902
4200 10.584 11.301 12.926
4300 10.607 11.325 12.950
4400 10.631 11.348 12.973
4500 10.653 11.371 12.996
4600 10.675 11.393 13.019
4700 10.697 11.415 13.040
4800 10.718 11.436 13.062
4900 10.739 11.457 13.082
5000 10.759 11.477 13.103
Francis
  • 2,972
  • 1
  • 20
  • 26
  • 4
    (1) Note that this is an *inverse* distribution table for the upper tail (and therefore solves a harder problem--but perhaps of lesser statistical relevance). (2) However, it achieves less than the intended accuracy due to cancellation in the alternating sums. There is little problem in the upper tail (which likely is the intended application), but you can see it crop up in the first differences of the 0.01 column. Accuracy in the lower tail will be destroyed for such large values of $m$. (This can be verified by using exact arithmetic for rational $g$ in a symbolic algebra program.) – whuber Mar 24 '16 at 17:58
  • Francis, thanks for your very eloborate and extensive answer - this is more than I had hoped for but exactly what I was looking for. So thanks very much. The r-code is really nice - I did a few tests and as you pointed it out it is computationally heavy. Regarding the uniroot function I will dive into that at some point. In any case, thanks for the help. – Sha Mar 25 '16 at 01:16
  • Whuber, I agree with the sensitivity on such large numbers. However, for my case I combined the result of a Fisher Test with Kolmogorov-Smirnov to say something about various wave functions salted with Gaussian noise. For the tests I did, the accuracy was quite good even up to the 20 %-level (not sure why anyone want to use 20% in real life but still) with 3000 sample points. – Sha Mar 25 '16 at 01:25