Knowing the sum, the n(), and the bound parameters of a truncated-Pareto distributed variable, how I identify the alpha (shape) parameter?

Question

I know that there would be a fancy command on R to do the estimation of $\alpha$ given the inputs, but I am also curious about the relationship between $\alpha$ to $\mathbb{E}(\sum (x))$, given $n(x)$, $min(x)$ and $max(x)$.

Also, given that observed $\hat{max(x)} \leq max(x)$, is there a way to estimate $\alpha$ without bias? I know that in applied science, Pareto estimation is actually very hard, but I don't know about truncated Pareto.

I accept suggestion on the best way to estimate it in R. I know truncated/bounded Paretos are in VGAM, but I don't know the commands to model Pareto distribution. Is it correct to use the package fitdistrplus?

There is no stable sum for $\alpha\leq 1$ for the Pareto distribution type I. There is a stable sum for $\alpha>1$. If you want a measure that is always stable, use the harmonic mean for that distribution. — Carl, Feb 05 '22 at 03:17
@Carl: All the same, a *truncated* Pareto distribution will have a finite expectation for any value of $\alpha$. For inference on $\alpha$, truncation notwithstanding, the geometric mean is a minimal sufficient statistic; the harmonic mean isn't. (And if only the sample sum is available then neither are calculable from it.) — Scortchi - Reinstate Monica, Feb 08 '22 at 19:16
@Scortchi-ReinstateMonica I found the question to be ambiguous because all Pareto distributions of types I through IV are left truncated power functions. It would help to specify "right truncated" if that is indeed the case. — Carl, Feb 08 '22 at 20:01
@Carl: Could be clearer, but the upper bound is mentioned explicitly. — Scortchi - Reinstate Monica, Feb 08 '22 at 20:17
@Scortchi-ReinstateMonica [For the case of the Pareto type I distribution](https://www.r-bloggers.com/2011/09/the-long-tail-of-the-pareto-distribution/), the reciprocal transformation converts it into the extremely well-behaved beta distribution, and the harmonic mean has the following simple expression: Harmonic mean = $k(1 + a^{-1})$ — Carl, Feb 08 '22 at 22:49
@Scortchi-ReinstateMonica Also, see https://stats.stackexchange.com/a/271126/99274. — Carl, Feb 09 '22 at 01:22
@Carl: That's nice but doesn't make the harmonic mean the best choice of statistic on which to base an estimator of $\alpha$. — Scortchi - Reinstate Monica, Feb 09 '22 at 01:34
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/133977/discussion-between-carl-and-scortchi-reinstate-monica). — Carl, Feb 09 '22 at 01:40
Suggested reading, von Hippel et al. arxiv.org/pdf/1402.4061 — Carl, Feb 09 '22 at 23:48

Scortchi - Reinstate Monica · Answer 1 · 2022-02-16T09:56:53.843

First, it's well known that $\sum_{i=1}^n \log x_i$ is complete & minimal sufficient for the shape parameter $\alpha$ of an untruncated Pareto distribution—& truncation doesn't affect the completeness or sufficiency of a statistic (Smith, 1957). Consequently (1) you ought to be basing an estimator on the sum of logged observations (or the sample geometric mean, or an equivalent), & (2) you'll be lucky to find an estimator based on the sum of observations (or the sample arithmetic mean, or an equivalent), either derived in papers or textbooks, or implemented in software libraries.

If for some reason all you have to work with is $\sum_{i=1}^n x_i$, then you could try a method-of-moments approach to estimating $\alpha$. Given a Pareto distribution with density

$$ f(x)= \frac{\alpha \beta^\alpha}{x^{\alpha+1}} $$

its expectation when truncated above at $\tau$ is

$$ \begin{align} \operatorname{E} X &= \frac{\int^\tau x f(x)\ \mathrm{d} x}{\int^\tau f(x) \ \operatorname{d} x} \\ &= \frac{\int_\beta^\tau x^{-\alpha}\ \mathrm{d}x} {\int_\beta^\tau x^{-(\alpha+1)}\ \mathrm{d}x}\\ &= \frac{\alpha}{1-\alpha}\cdot\frac{[x^{1-\alpha}]^\tau_\beta}{[x^\alpha]^\tau_\beta}\\ &= \frac{\alpha}{1-\alpha}\cdot\frac{\tau^{1-\alpha} - \beta^{1-\alpha}}{\tau^{-\alpha}-\beta^{-\alpha}}\\ &= \frac{\alpha\beta}{\alpha-1}\cdot\frac{1 - \left(\frac{\beta}{\tau}\right)^{\alpha-1}}{1-\left(\frac{\beta}{\tau}\right)^{\alpha}}\\ \end{align} $$

So plug in the sample mean $\bar x$ for $\operatorname{E} X$, & $\hat\alpha$ for $\alpha$, & find the root for $\hat\alpha$ numerically. (You could bootstrap to get confidence interval/standard error/bias estimates.)

# function to generate Pareto variates
rpareto <- function(n, alpha, beta) beta/(1-runif(n))^(1/alpha)

# function to calculate expectation
EX <- Vectorize(function(alpha, beta, tau, epsilon = 1e-7){
  if (abs(alpha-1) > epsilon){
    (alpha*beta/(alpha-1)) * (1 - (beta/tau)^(alpha-1)) / (1 - (beta/tau)^alpha)
  }
  else {
    alpha*beta*log(tau/beta) / (1 - (beta/tau)^alpha)
  }
})

# function that you'll find the root of
f <- function(alpha, x.bar, beta, tau, epsilon = 1e-7) {
  x.bar - EX(alpha, beta, tau, epsilon = 1e-7)
}

alpha_0 <- 2 # true value of alpha
beta <- 3
tau <- 9

# make up some data
set.seed(1234)
x <- rpareto(200, alpha_0, beta)
x <- x[x < tau]
length(x)
plot(ecdf(x), verticals=TRUE, pch="")

# find sample mean
(x.bar <- mean(x))

# find roughly where alpha.hat lies
alpha <- seq(1, 3, by=0.1)
plot(alpha, EX(alpha, beta, tau), type="l", ylab="EX")
abline(h = x.bar, lty=2)

# calculate alpha.hat
(alpha.hat <- uniroot(f, x.bar=x.bar, beta = beta, tau = tau, lower = 2, upper = 3)$root)

Note $$\lim_{\alpha \rightarrow 1} \frac{1-\left(\frac{\beta}{\tau}\right)^{\alpha-1}}{\alpha-1} = \log \frac{\tau}{\beta}$$

for when $\alpha \approx 1$ in numerical calculations.

Smith (1957), "A Note on Truncation and Sufficient Statistics", Ann. Math. Statist., 28, 1

Knowing the sum, the n(), and the bound parameters of a truncated-Pareto distributed variable, how I identify the alpha (shape) parameter?

1 Answers1