Discrete Pareto Distribution vs Zipf Distribution and Power Law vs Zipf Law

Question

I need to get a simple, but clear idea of Discrete Pareto Distribution vs Zipf Distribution and Power Law vs Zipf Law. (Are they similar/ how they relate to each other.) Wikipedia definitions do not address my issue. If graphical explanation is possible, would be clearer.

Please give a definition for what you intend by "discrete Pareto" — Glen_b, Apr 09 '19 at 02:11
Pareto is a continuous distribution. Discrete Pareto referred to here is, the same Pareto curve with a set of discrete data points, but not continuous. — Dovini Jayasinghe, Apr 09 '19 at 04:55
Well, yes, but since there's more than one plausible way to get from a continuous to a discrete variable - with somewhat different results, you need to explain *exactly* how you intend that to be done. What is the specific pmf of the discrete Pareto that you mean? — Glen_b, Apr 09 '19 at 17:17
https://math.stackexchange.com/questions/374878/discrete-pareto-distribution I was referring to this — Dovini Jayasinghe, Apr 10 '19 at 04:40
So just a [zeta distriubtion](https://en.wikipedia.org/wiki/Zeta_distribution) vs a [Zipf](https://en.wikipedia.org/wiki/Zipf%27s_law)? — Glen_b, Apr 10 '19 at 04:45
Okay now you've lost me. You're dropping the part about "discrete" Pareto now? — Glen_b, Apr 10 '19 at 17:57
Sorry if i made you puzzled. I wonder how "discrete" Pareto, Zipf and Zeta differs from each other. (or are they the same) — Dovini Jayasinghe, Apr 11 '19 at 06:09
There is a huge confusion as these topics are not clearly gotten to my head. — Dovini Jayasinghe, Apr 11 '19 at 06:10
We have just circled back to my earlier points. It's not clear what you mean by "discrete Pareto" since there's more than one possible way to discretize a Pareto that are not completely identical. When I asked you to say what you mean by 'discrete Pareto' you pointed me to something that *specified* that the discrete Pareto was the zeta. Do you intend for that to be the case or not? If that is the case, why ask for the difference between two things you have defined to be the same? If not, then what do you now mean instead of what you said before? What are we to make of that inconsistency? — Glen_b, Apr 11 '19 at 06:12
If you want to know the difference between zeta and Zipf, *ask that* (but beware; I believe that's already answered on site). If you want to know *how there's more than one way to discretize a distribution* then ask that (but again, I think that might already be covered). If you want to ask something else you're going to need to be clearer about what you need to know "I am confused about things, unconfuse me" is not clear enough; we would have to speculate about what's in your head. — Glen_b, Apr 11 '19 at 06:15
I understand, but clearly defined question is a half solution. I have a difficulty in clarifying further, since this is the particular question runs in my head. I'm not clear with these things. From your comment I learnt that there are different versions of discrete pareto, then what is Zipf? What is Zeta? Are they among the list of discrete pareto? Or there's no so called association? I'm skeptical because, different sites has different paths. For example the following link says Zeta = Zipf https://www.statisticshowto.datasciencecentral.com/zeta-distribution-zipf/ — Dovini Jayasinghe, Apr 11 '19 at 06:30
What I need here is the very simple "picture" to deviate these terms clearly. Nothing technically mathematical. And that's why I mentioned, "If graphical explanation is possible, would be clearer." Thanks — Dovini Jayasinghe, Apr 11 '19 at 06:32
You're right that a clearly defined question is half the solution -- that's certainly the case, but defining your question is your task not ours (the stackexchange network places that task in the asker's hands). Did you read the links on the zeta and zipf I provided (in the hope it would help you to clarify your question)? — Glen_b, Apr 11 '19 at 06:34
I will reopen but I am concerned that as soon as someone tries to answer it, either the question will change (since you're not clear about what you want to know) or it will turn into a series of followup questions in comments. I fear the question lacks sufficient [search and research](https://stats.stackexchange.com/help/how-to-ask) at the required level -- i.e. beyond reading the top hit or two on the simplest searches. (Further, avoiding a certain amount of mathematics is likely a forlorn hope, since pmfs are inherently mathematical) — Glen_b, Apr 11 '19 at 06:38
Alright, let me think and read further and try whether I could make this question better. Thanks a lot for your attempt on this. — Dovini Jayasinghe, Apr 11 '19 at 07:01
On the other hand the discussions in comments have served to help clarify more what kinds of things you don't know so it might serve to help someone arrive at an answer that is useful. If that happens, then ideally the question would ask something that the answer was a direct answer to (to the benefit of later searches). — Glen_b, Apr 11 '19 at 07:25

Glen_b · Accepted Answer · 2019-04-11T10:28:08.070

[In relation to the relationship between the Zipf and the zeta distributions, the Wikipedia definitions absolutely address your main question. It's possible that you didn't understand what was there.]

I'm going to use Wikipedia's definitions of these distributions; its references are explicit, so we at least know where they're coming form.

Let us start with a zeta distribution. This is a pmf proportional to $x^{-s}$, $s=1,2,...$. We see that this is akin to the Pareto in that it has a density of essentially the same power-law form. (Why power-law? Because it is in the form of a constant times a power of $x$.)

On this basis it's a candidate for a discrete equivalent of a Pareto. On the other hand, the correspondence is less direct if we're focused on the survival function of the Pareto $1-F(x) = S(x) \propto x^{-\alpha}$; that's also in power-law form but the survival function of the zeta is not (though it's an increasingly good approximation to the tail).
Let us then discuss the Zipf. It's of the same form as the zeta, but the difference is it's over a finite range, not a semi-infinite range; that is, it only assigns probability to $x=1,2,...,N$ and this alters the normalizing constant on the pmf. Then for a given index (negative-power), $s$, the probability associated with each outcome up to $N$ must be higher (because there's no probability associated with any outcomes $>N$).

It's a (right-) truncated zeta and would be a candidate for the discrete equivalent of a right-truncated Pareto.

In terms of pictures/graphs, the wikipedia articles already give log-log graphs of the pmfs for both zeta and Zipf and the effect of the truncation is obvious in those graphs; I see little point in reproducing them; if they didn't help you there, they could be no more help here. However, I will put in a plot of an example of an unlogged pmf for each:

The left plot shows the first 16 probabilities in a zeta(2) distribution; in fact they continue off to the right without limit. The right plot is a corresponding Zipf truncated at 10; the open circles are the values of the zeta to the left. You can see on the first probability or two that the open circles are lower (because the filled-circle Zipf probabilities are pushed up due to the omission of the right tail).
Now let us consider what could be meant by a "discrete Pareto". In this case we must take some property of the Pareto which we try to preserve when we move to a discrete distribution.

We saw an example in the zeta, where it was the basic form of the pdf as proportional to a (negative) power of the argument that was preserved in moving to a pmf. However, many people focus on the survival function when defining what makes for a power-law and in that case defining a discrete Pareto directly in terms of $S(x)\propto x^{-\alpha}\,,\: x=1,2,3,...$ would give a different pmf from the zeta.

Note further that the zeta is always defined on the positive integers, while the Pareto has its left-limit as a parameter, so there's a whole class of potential discrete Paretos that have different left-limits.

Indeed, rather than preserve the functional form in either the pdf or the survival function one might proceed by directly associating a section of the density in $(u,u+1]$ with the probability mass at $\lfloor u+1\rfloor$ -- i.e. by specifying how to 'round' the probability in the interval (in a general sense), you will get yet another discrete Pareto (or rather, an entire class of them).

All of these (and likely more besides) are entirely valid candidates for being called a "discrete Pareto". It's up to the user to figure out what properties they need in such an object and choose appropriately.

On the use of the word "Law":

"Law" in science is a pretty general thing (usually referring to some specific hypothesis, but sometimes to a commonly used model or observed regularity), but in relation to probability distributions, it's more specifically a reference to a functional form (or class of functional forms) for the distribution (whether expressed as a pdf/pmf, as a cdf, as a survival function or in some other way). For example you may see reference to "the normal law" or "the Poisson law", and generally the intent there is to refer specifically to the distribution.

So which are we dealing with here?

Zipf's law and the Pareto principle are both "scientific laws" in the "observed regular behavior" sense (at least up to the usual approximation involved in scientific models in general):

https://en.wikipedia.org/wiki/Empirical_statistical_laws#Examples

But at the same time, those observed laws lead directly to the distributional models, so they're also sometimes a direct reference to the distributional form in the sense of a probability law.

Discrete Pareto Distribution vs Zipf Distribution and Power Law vs Zipf Law

1 Answers1