2

I have a set of proportion data between 0 and 1, working in R.

Let's say that I have fitted a mixture of a beta and a degenerate distribution. This means that I write the density as something like this :

$F(x) = pF_0(x) + (1-p)F_{\theta}(x)$

where I call $F_0$ the degenerate distribution in 0 and $F_{\theta}$ the beta distribution with the parameters $\theta = (a,b)$. $p$ is the mixing percentage between the two distribution.

So let's say that I manage to find my 3 parameters $p,a,b$ then I would like to deduce some quantile values (95% for example) as one can do by fitting for example a simple beta distribution $\beta(a,b)$ (using the fitdistrplus package for example).

How can I manage to do this ? Is this even meaningful for this kind of mixture distribution ?

I think yes it is (but I'm not sure) because I'm using it in a context of zero-inflated beta distribution, and having several zero-values should influence the quantile value accordingly (because I want to take the multiple zero-values into account, as they are important information).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ThanhKim
  • 19
  • 6
  • This question is asked and answered generally at https://stats.stackexchange.com/questions/411647. My post in that thread includes a software solution. – whuber Oct 08 '21 at 21:12

2 Answers2

1

I assume by $F_0$ you mean the CDF of a point mass at 0, that is, $F(x) = 1\{x \ge 0\}$.

Properly defined, quantile function makes sense for any distribution. The $q$th quantile of $F$ is the smallest value of $x$ such that $F(x) \ge q$.

Beta is a continuous distribution whose CDF starts at zero at $x=0$ and goes to one at $x=1$. The other CDF is zero for $x < 0$ and jumps to 1 at $x=0$. Plotting $F$, you will see that the mixture CDF is zero for $x < 0$, jumps to $p$ at $x=0$ and gradually and continuously increases to one at $x=1$. (I am too lazy to make a plot, but I am sure you can make the plot yourself.)

Once you have this picture in mind, then it would be clear that the $q$th quantile of your mixture distribution will be equal to $0$ if $q \le p$ and will be equal to the $\frac{q-p}{1-p}$th quantile of the Beta distribution when $q > p$, that is, $$ F^{-1}(q) = \begin{cases} 0 & q \le p\\ F_\theta^{-1}\Big( \frac{q-p}{1-p}\Big) & q > p \end{cases}, \quad \forall q \in [0,1] $$ where $F^{-1}$ is the generalized inverse (i.e., just the quantile function of $F$.)

passerby51
  • 1,573
  • 8
  • 11
0

Ok so I finally come to this procedure to find the quantile :

  • First I find the parameters of the distribution fitting my datas using the gamlss package and the gamlss function

    g0 <- gamlss(datas~1,family=BEINF)

I'm using BEINF because in fact, I can get several zeros or one values.

  • Then because of the gamlss function, I get 4 parameters $\mu,\sigma,\nu,\tau$

The parameters $\mu,\sigma$ correspond to $a,b$ and $\nu,\tau$ are parameters such that we can write the following mixing probabilities :

$p_0= \frac{\nu}{1+\nu+\tau}$, probality of having a 0

$p_1= \frac{\tau}{1+\nu+\tau}$, probality of having a 1

  • After, I'm generating a sample of my fitted mixed distribution with the parameters I just found by using the rBEINF function :

y0 <- rBEINF(1e6,mu=fitted(g0,"mu")[1],sigma=fitted(g0,"sigma")[1],nu=fitted(g0,"nu")[[1]],tau=fitted(g0,"tau")[[1]])

  • Finally, I sort this sample and take the "0.95%-of the size" value of the sample as the 0.95-quantile value. In this case, this is the 950 000th value.

AA <- sort(y0)

quantile <- AA[950000]

Note that the precision of the value will depend on the size of the sample chosen, but from $10^5$, the difference will be very unsignificant.

ThanhKim
  • 19
  • 6