2

I was trying to generalize the triangular distribution over $[0,1]$ to get a distribution that has the same unimodal structure and the same density of $0$ at the bounds, but where the spread of the distribution can be controlled using a parameter. So I came up with a distribution that is defined by

$f_{c,k}(x) = \begin{cases} \frac{(k+1)x^k}{c^k} \text{ for } x \leq c\\ \frac{(k+1)(1-x)^k}{(1-c)^k} \text{ for } x > c \end{cases}$

with $c\in[0,1]$ defining the mode and $k\in \mathbb{R}_{>0}$ defining the spread of the values.

If I did not mess up the integrals, this should actually be a valid distribution.

It is clear that the mode of this distribution is $c$ and I was also able to get an expression for the mean (don't have my notes with me right now). It seems variance should also be defined, but I haven't worked it out, yet.

Is this distribution known and does it have a name? I couldn't find anything about this both via google or wikipedia.

EDIT:

Here are some plots to indicate what this distribution looks like.

At $k=1$ this is just the triangular distribution:

enter image description here

If $k>1$ then the flanks of the distribution become steeper, therefore centering the values:

enter image description here

If $k<1$ then the flanks of the distribution become wider, therefore allowing for a larger variance of the values:

enter image description here

LiKao
  • 2,329
  • 1
  • 17
  • 25
  • @NickCox I don't see why there should be a jump in this density. In the examples I plotted, I also did not observe any jumps, but maybe I am missing something. If I plug in $x=c$, then $x^k$ and $(1-x)^k$ cancel, so I am left with $(k+1)$ in both cases. There is a jump in the derivative, though. But this comes from this being a generalization of the triangular distribution, which has the same undefined derivative at the mode. If I didn't mess up, this should just be a triangular distributioon with steeper/less steep flanks due to the $k$ parameter. – LiKao Oct 05 '20 at 12:12
  • I think you're right. Sorry for the red herring. Cusps still seem unphysical to me without a rationale (and I'm aware of the Laplace or double exponential). Still don't know a name. – Nick Cox Oct 05 '20 at 12:56
  • @NickCox You are completely right, that this has no real physical rationale. The main reason I came up with this atrocity is, that I need a distribution that is defined on $[0,1]$, has zero density at the boundary, is unimodal, parametrizable in terms of the mode (or mean) and has some kind of precision parameter controlling the spread. I'd much rather use some kind of re-parametrized beta distribution in fact, but I can't work out how to reparametrize ist, so the parameters remain in the permissible range for the given constraints. – LiKao Oct 05 '20 at 13:22
  • 1
    The beta can be parameterised in terms of the mean and another parameter. It's true that fitting a beta might point to a distribution with positive density at the boundary. This might not tell you more than you already know: https://stats.stackexchange.com/questions/12232/calculating-the-parameters-of-a-beta-distribution-using-the-mean-and-variance – Nick Cox Oct 05 '20 at 13:32

1 Answers1

4

You can create a huge number of such distributional families by following the process I described recently at https://stats.stackexchange.com/a/490160/919.

Begin with any non-negative bounded integrable function $f$ on $[0,1]$ that (i) has a unique maximum and (ii) vanishes at the endpoints. For any $n\ge 1$ define $$f_n(t)=c_n\exp(n\log f(t)) = f(t)^{1/n}$$ (setting $f_n(t)=0$ wherever $f(t)=0$) where $c_n$ makes the integral of $f_n$ equal to $1:$ it always exists under the assumptions. You may confirm that the variance of the distribution PDF given by $f_n$ decreases down to $0$ as $n$ grows, thereby controlling the spread.

For instance, you could let $f$ by any of the functions graphed in the question, thereby extending each of them to a one-dimensional family of distributions whose variances you can control.


As an example, pick $a \gt 0$ and $b\gt 0$ and define $f(t;a,b)=t^a(1-t)^b.$ Evidently

$$f_n(t;a,b) = c_n\,t^{an}(1-t)^{bn} = c_n t^{\alpha(n)-1}(1-t)^{\beta(n)-1}$$

where $\alpha(n) = 1+an$ and $\beta(n)=1+bn.$ This is the PDF of the Beta$(\alpha(n),\beta(n))$ distribution.. Its variance is

$$\begin{aligned} \sigma^2_n(a,b) &= \frac{\alpha(n)\beta(n)}{(\alpha(n)+\beta(n))^2(\alpha(n)+\beta(n)+1)} \\ &= \frac{(1+an)(1+bn)}{(2+(a+b)n)^2(3+(a+b)n)} \\ &= \left(\frac{9ab}{(a+b)^2}-2\right)\frac{1}{(a+b)n+3} \\&\quad- \frac{(a-b)^2}{(a+b)^2}\frac{1}{((a+b)n+2)^2}\\&\quad+ 2\frac{(a-b)^2}{(a+b)^2}\frac{1}{(a+b)n+2} \end{aligned}$$

The last expression makes this family of rational functions of $n$ straightforward to analyze. Because the poles of its two terms are located at $-2/(a+b)$ and $-3/(a+b),$ it must asymptotically decrease with $n$ to zero, eventually being dominated by the last term. We have recovered all the Beta distributions with zero densities at the endpoints.

A particularly simple example arises when we require $a=b,$ for then the last two terms vanish and

$$\sigma_n^2(a,a) = \frac{1}{4} \frac{1}{3 + 2an}$$

shows that you can find such a distribution with any variance $0 \lt \sigma^2 \lt 1/12$ by choosing positive $a$ and $n\ge 1$ so that $an = (1/(4\sigma^2) - 3)/2.$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • I am not sure, I fully understand the answer. You define $f_n(t)=c_n \exp(n \log f(t))$. Isn't that the same as just $f_n(t)=c_n (f(t))^n$, or am I missing something? In that case, that would be the same process I used to generalize the triangular distrubution, i.e. if this is given by $f$, then the other distributions are also just $f_n$. However, I used $n>0$ instead of $n\geq 1$, so I am extending it to both directions, more variance and less variance. But it seems that is the same process, you just proved that it works in general, whereas I was only interested in the triangular. – LiKao Oct 06 '20 at 10:15
  • Also I think the variance of the beta is incorrect (or the wikipedia entry is incorrect). I think it should be $\frac{\alpha(n)\beta(n)}{(\alpha(n)+\beta(n))^2(\alpha(n)+\beta(n)+1)}$. I am trying to figure out what the simplifications look like using the correct expression. I tried a similar approach before, but got stuck somewhere, but maybe this can lead to a different direction somewhere along the line (I didn't try including the $n$ parameter before). – LiKao Oct 06 '20 at 10:30
  • Maybe this works: The family of distributions with $f_n(t)=c_n t^{an}(1-t)^{bn}$ are zero at $\{0,1\}$ for all $a,b,n>0$ (not just $n>1$). These functions correspond to $Beta(\alpha(n),\beta(n))$ PDFs. The mode is at $c=\frac{a}{a+b}$, which we can solve by letting $a=c$ as $b=1-c$. So we have all the distributions parametrized as $Beta(1+cn,1+(1-c)n)$, which are all the unimodal beta distributions parametrized with the mode and one additional parameter $n$ controlling the spread. This seems like it should work, and it doesn't have the nasty spike as the distribution I sketched out above. – LiKao Oct 06 '20 at 11:34
  • Sorry, there's an initial typo of "-" in place of "+" for the variance, but that didn't affect any of the subsequent calculations. My definition of $f_n$ was carefully made to avoid any possible misunderstanding of what the power $n$ might be. Your process of generalizing the triangular distribution is indeed a special case of this construction. – whuber Oct 06 '20 at 12:49
  • BTW, in your case because the integrals are elementary to evaluate, it's straightforward to compute any moment of any of your distributions (which is an advantage). Represent them as mixtures of truncated power distributions, for instance. – whuber Oct 06 '20 at 17:36
  • I think the error with "+1" vs. "-1" is carried through to the rest of the calculation. Taking $n=1$ and $a=b=1$ the $f_1(t;1,1)$ is equivalent to the $Beta(2,2)$ distribution. According to your calculation, this should have a Variance of $\sigma^2=1/12$. However, 1/12 is the variance of the $Beta(1,1)$ (uniform) and not the $Beta(2,2)$ distribution. The final result should come out to $\sigma^2_1(1,1)=1/20$ instead. – LiKao Oct 07 '20 at 08:06
  • 1
    @LiKao Yes, thank you: I will fix that. – whuber Oct 07 '20 at 11:49