2

I'm in search of a non parametric test to check whether my data are Pareto distributed, but I couldn't find a proper reference for it. I'm using R to simulate these so if there's any R in built test, would be acceptable for the moment. But I prefer to construct my own R functions so it would be highly appreciated if I could get a good reference which has a non parametric statistical test to check whether the data are Pareto distributed.

  • Your question appears to be based on a mistaken premise; you don't generally need to know the parameter *values* to conduct a parametric test. – Glen_b Apr 06 '19 at 01:55
  • Thanks for denoting. I'll edit the question – Dovini Jayasinghe Apr 08 '19 at 04:31
  • 1
    The edit simply replaces the original problem with a new one; you certainly don't need to know population parameters to estimate them (that's why you'd estimate them). – Glen_b Apr 08 '19 at 04:39

1 Answers1

1

lFirst let us note a few simplifications that will make life easier:

(a) if $X_i$ are iid Pareto $(c,\alpha)$, with $\alpha$ being the index and $c$ the left bound, then $Y_i=\log(X_i)$ are shifted exponential $(\log(c),\alpha)$ (where $\alpha$ is rate, not scale);

(b) If $Y_i,i=1,2,...,n$ are shifted exponential, then $Z_k=Y_k-Y_{(1)}\,,$ $k=1,...,n-1$ (dropping the Z-value that would be exactly 0 and relabelling the rest) are iid exponential($\alpha$); we may call that smaller sample size $m$. That is, if we take logs and subtract the smallest observation from the rest, we have reduced the problem to a simple test of 1-parameter exponentiality with unknown coefficient.

So let us focus on tests for this simpler problem.

There are numerous tests for exponentiality with unknown parameter; indeed its an issue addressed in several posts already on site. [You should be careful not to simply apply a test for a fully specified distribution to this situation.]

For example, Lilliefors (1969)[1] constructed one such, based on the Kolmogorov-Smirnov test with estimated parameter. We needn't restrict ourselves to the sample sizes covered there, since the test is trivial to simulate at whatever sample size we wish; we have the advantage over Lilliefors of having accuracy to far more computing power on a cheap laptop than he could access in the 60s.

This is not automatically the best choice; indeed you probably want to focus on the tail, in which case a modified version of the Anderson-Darling test (to account for parameter estimation) might be a better choice, or perhaps one based on a similar idea to the Shapiro-Francia test. These alternatives are briefly discussed in the first link below.

Numerous other tests (and perhaps more importantly, diagnostic displays) are noted in answers and comments in How do I check if my data fits an exponential distribution and some useful points are made in What are the standard statistical tests to see if data follows exponential or normal distributions?.


An issue worth keeping in mind is that failure to reject a Pareto with a goodness of fit test doesn't indicate that the data were drawn from a Pareto distribution, nor even that some other simple two-parameter family might not fit even better. If alternatives are realistic possibilities, you should consider their plausibility as well; this will help avoid the common-but-mistaken tendency to believe the model you started with simply because you lack evidence against it.


However, we should beware -- it's rather easy to answer the wrong question with goodness of fit tests and be led astray. Some of the points made in Is normality testing 'essentially useless'? (particularly ones made by Harvey Motulsky, which I directly link to) apply just as well to this situation, mutatis mutandis.

[1]: Lilliefors, H. (1969), "On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown", Journal of the American Statistical Association, Vol. 64, pp. 387–389.

Glen_b
  • 257,508
  • 32
  • 553
  • 939