Problem with Pareto distribution and R

Question

I am trying to test this property of pareto distribution: Let f(x) be a pareto distribution

$$ f(x)=\alpha \frac{x_m^\alpha}{x^{\alpha+1}} $$

so we have the cdf that is

$$ CDF(x)=\int_{x_m}^{x}\alpha \frac{t_m^\alpha}{t^{\alpha+1}}dt=1-\frac{x_m^\alpha}{x^\alpha} $$

then the probability that $x>x_0$ is

$$ P(x>x_0)=1-CDF(x)=\frac{x_m^\alpha}{x^\alpha} $$

and so we have

$$ \frac{P(x>x_0)}{f(x)}=\frac{x}{\alpha} $$

Now i am trying to test it with R.

 library(PtProcess)
 dd<-rpareto(10000,1.5,0.01)
 cdf<-ecdf(dd)
 df<-density(dd)
 ff<-(1-cdf(df$x))/df$y

If i plot ff

 plot(df$x,ff)

I do not obtain the correct straight line. I guess that this is due at the way density() and ecdf() works. I need this form of the test (an a posteriori evaluation of fd and cdf) in order to perform the same test on a sample of data of unknown orgin. I guess that i need a way to binning the ecdf() function in the same way as hist() is the binning version of density.

So my question is:

Does there exist an equivalent binned function of ecdf() as hist() is the binned function of density()?
or can I simulate ecdf() with hist()?

@emauele, there are probably many points in your estimated density that are close to 0 which may cause numerically unstable results (I noticed this when pasting your code). Beyond that, I don't have much insight into the problem. — Macro, Jun 22 '12 at 17:57

score 6 · Accepted Answer · answered Jun 22 '12 at 18:45

6

By using ecdf and density, you're not actually doing the Pareto calculations, but instead using estimates based on a sample that are, by their non-parametric nature, not guaranteed (read: not going to) have the desired property.

Try the following:

x <- seq(0.1,10,by=0.1)
fx <- dpareto(x, 1.5, 0.05)
Fx <- ppareto(x, 1.5, 0.05)
plot((1-Fx)/fx ~ x)

You'll get the nice straight line out: enter image description here

answered Jun 22 '12 at 18:45

jbowman

31,550
8
54
107

Good, but i need that form in order to perform the same test on a sample of data of unknown origin. – emanuele Jun 22 '12 at 23:00
Since the property characterizes the Pareto distribution, i.e., no other distribution has that property, you could just use a goodness of fit test on the data. That's fully equivalent to testing for that property, since $(1-F(x))/f(x) = x/a \leftrightarrow x \sim \text{Pareto}$. Not sure how you'd test for the property directly, w/o going through the Pareto, though. – jbowman Jun 22 '12 at 23:13
Actually the $\alpha$-stable distributions share the same tail behaviour. I would like to use this way because i think that this is the better way for a straightforward measure of the $\alpha$ parameter in a generic $\alpha$-stable distribution. – emanuele Jun 22 '12 at 23:27
The $\alpha$-stable distributions only share that behavior asymptotically as $x \to \infty$, I'm afraid, see https://eldorado.tu-dortmund.de/bitstream/2003/5219/1/47_02.pdf for example, also Johnson, Balakrishnan & Kotz, Continuous Univariate Dist'ns Vol. 1, pp. 603-604 (sorry, Amazon's "look inside" doesn't let you look inside those pages.) – jbowman Jun 22 '12 at 23:42
Yes i know. Where is the problem? You have to see the tails of the distribution. If it is possible, of course. – emanuele Jun 23 '12 at 07:23
It means that for smallish $x$, whatever "smallish" means, you won't see a linear relationship, just something that approaches one in the tails. Maybe the "almost linear" part of the tail is the upper 0.1 percentile, and the data set has almost no data out there, so you won't actually see anything that looks linear. It depends on the distribution and the sample size, of course. – jbowman Jun 23 '12 at 14:40

Problem with Pareto distribution and R

1 Answers1