0

I am trying to test whether data from a sample I have follows a t distribution with n degrees of freedom for a given n.

I am looking for something more powerful/recent than Kolmogorov-Smirnov test. It need not be a test for t- distributions specifically, it can be anything geared for unimodal and symmetric distributions.

Any references/Matlab code would be very helpful.

Thanks a lot.

PS: My sample has thousands of points so I don't need anything involving bootstrapping and so on.

  • 1. Your title and the body of your post ask for very different things. Do you want a test for a $t_n$ or a test for a unimodal symmetric distribution? A beta(2,2) distribution and a double exponential distribution are both unimodal&symmetric but neither are like any $t_n$. Please amend your question. – Glen_b Jan 12 '15 at 00:05
  • 2. No test will tell you that your data *are* from some distribution. With large samples, good tests highly likely to tell you they are *not* -- even when the distribution is very very close to the one you're testing against (since when do real data *exactly* follow simple models?). $\hspace{9cm}$ 3. If you do persist with goodness of fit tests, do you have any particular kinds of alternatives you seek power against? – Glen_b Jan 12 '15 at 00:08
  • 4. I understand seeking more power, but why would *recency* be relevant? $\qquad$ 5. The arguments that establish the useful properties of bootstrapping are asymptotic. Bootstrapping is quite well suited to large samples and may often fail to achieve good properties in small samples. – Glen_b Jan 12 '15 at 00:15
  • 6. Is this a standard t, or are there unspecified location and scale parameters? – Glen_b Jan 12 '15 at 00:21
  • Hi Glen_b, yes I am testing goodness of fit against standard t distribution. I thought it might be too much to hope for a test geared precisely for the t-distribution, so was hoping for something geared for unimodal symmetric distribtuions. – senanindya Jan 20 '15 at 06:01
  • Anyway, Anderson-Darling looks quite promising. I'll go with that – senanindya Jan 20 '15 at 06:02
  • There are tests specifically for symmetry and one-sided tests for number of modes. – Glen_b Jan 20 '15 at 06:30

2 Answers2

1

Some statements pending further clarification:

1) Power is a property of a specific alternative -- or in the case of power functions, a specific collection of alternatives. You should not necessarily expect to have a goodness of fit test that has great power against all alternatives -- some tests have more power against some kinds of alternatives, others against other kinds; knowing something about the alternatives that most interest you will help identify a test which suits the kind of power properties you seek. The more specific you can be about alternatives of interest, the better the chances of finding a test with good power, generally speaking.

2) A test for a completely specified distribution which has good properties against a wide variety of alternatives that people tend to find interesting is the Anderson-Darling test. In particular, it is more sensitive to differences of distribution in the tails (at the expense of being slightly less sensitive in the middle) than the Kolmogorov-Smirnov and performs extremely well in power studies at the normal; I would expect it has similarly good power in this case.

3) If you seek a test where location and scale are unspecified, besides adapting the Anderson-Darling to that situation, you might also consider a test something like the Shapiro-Francia type test in the normal case -- based off a correlation between order statistics and $t_n$-quantiles (in particular, I'd suggest using $n(1-r^2)$).

4) you might be able to base a test off the likelihood itself, Fisher style.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
1

As all decent statistical and scientific software packages, MATLAB contains functionality for fitting distributions and testing goodness-of-fit (GoF). As far as I understand, without knowing data distribution's probability density function, the empirical procedure for GoF testing is two-fold.

The first step would be to fit data to distribution, for which you can use either MATLAB internal functions, or external ones (yours or contributed by the community): http://blogs.mathworks.com/pick/2012/02/10/finding-the-best. The second step is, obviously, to select and perform a proper GoF test, as discussed below.

More details and some nice MATLAB code examples can be found in this relevant discussion here on CV. I think that, in addition to the mentioned in the discussion Kolmogorov-Smirnov, Anderson-Darling, Shapiro-Wilk, Shapiro-Francia and Liliefors GoF tests, it is feasible/important to consider a chi-square GoF test. Along with many others, chi-square GoF test is included in MATLAB Statistics Toolbox: http://www.mathworks.com/help/stats/distribution-tests.html. However, if you don't have access to Statistics Toolbox, you can use either example code from the above-mentioned CV discussion, or the chi2test() function's code from this course notes document (see pages 7-13).

In regard to the statistical power of various GoF tests, I ran across an interesting research paper, comparing the statistical power of most of the above-mentioned GoF tests (with an unfortunate exception of chi-square GoF test), showed that Shapiro-Wilk is the most powerful test, followed by Anderson-Darling, Lilliefors and Kolmogorov-Smirnov, correspondingly.

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93