Percentiles of mixture distribution: negative values?

Question

I am trying to grasp what is the meaning of getting unexpected negative values for some percentiles of a mixture distribution. Let the ~~distribution function~~ pdf be:

\begin{equation} f(x) = (1-p)\cdot \delta(x) + p\cdot \lambda e^{-\lambda\,x} \cdot H(x) \end{equation} where $p$ is the probability of the value of $x$ being modeled as an exponential function and $(1-p)$ the probability of it being equal to zero. Also, $\textit{delta}$ and the $\textit{Heaviside step function}$ are the indicator functions of the intended supports.

If we integrate $f(x)$ from 0 to C, where C is the value of of the $n_{th}$ percentile, we can express C as:

\begin{equation} C = \frac{ln(\frac{1-n_{th}}{p})}{-\lambda} \end{equation}

My question is: provided that the expression of C is correct, $C < 0$ whenever $(1-n_{th}) < p$.

Probably I am not properly deriving the expression of C, specially regarding the integration of the $\textit{delta}$ function. If that is not the case, what is the meaning of a negative valued percentile when the support of both distribution functions is greater than or equal to zero?

There are two issues here. First, $f$ is not a distribution function. Do you perhaps mean to use it as a *density* function? Second, the expression for $C$ clearly is not correct, because for any $C \gt 0$, the integral of $f$ must include an added $1-p$: that's what "$\delta(x)$" means. Generally, it should be obvious that obtaining a negative value for the percentile of a distribution with non-negative support simply means you made a mistake in calculation. — whuber, Oct 05 '17 at 20:54
Hi, thanks for your answer whuber. I think that the expression for $C$ already includes that $(1−p)$ you mention. If I am not wrong, an expanded expression for the integral of $f$ between $0$ and $C$ would be: $n_{thPerc} = (1−p) + p\cdot\lambda [\frac{e^{-\lambda x}}{-\lambda}]^{C}_{0}$, which is the same as $n_{thPerc} = (1-p) - p\cdot(e^{-\lambda C} - 1)$. Then, two $p$'s cancel out and you get to the expression I mentioned unless I am too tired to do simple maths. And by "for any $C > 0$" you mean that when $C < 0$ I should just say $C = 0$? — Gabriel, Oct 06 '17 at 01:00
These aren't so "simple maths": you seem to have lost the a factor of $H(x)$ in the integration of $\delta$. This will prevent you from doing the simplification you have outlined--it's correct only for positive numbers. Draw a picture! — whuber, Oct 06 '17 at 14:12
So my mistake is actually in $\int^{\infty}_{-\infty} f(x)$, but I do not really see why I cannot simplify the 2nd part to $\int^{C}_{0}p \lambda e^{-\lambda\,x}$ taking into account the support and assuming $H(x)$ is 1 from 0 to C. If this is correct only for positive numbers, the only thing I can think of is: $n_{thPerc} = (1-p) - p\cdot(e^{-\lambda C} - 1)\cdot H(C)$ and, thus, $C = 0$ when I get $C < 0$ using the outlined simplification. I'm really sorry to bother others with this dummy questions, but I'm stuck with this and I appreciate any hint. Thanks for your time whuber. — Gabriel, Oct 06 '17 at 15:41

whuber · Accepted Answer · 2017-10-08T16:44:34.817

The meaning of "$\delta$" in $ f(x) = (1-p)\cdot \delta(x) + p\cdot \lambda e^{-\lambda\,x} \cdot H(x) $, as a "generalized function," is it is a quantity that when integrated against any continuous "test function" $g$ with compact support yields $g(0)$. (This differs from the indicator of zero, which when integrated against any test function yields only zero.) In particular,

$$\int_{-\infty}^x \delta(x)dx = \lim_{y\to x^{+}}\lim_{a\to-\infty}\int_a^y 1\delta(x)dx = \left\{\matrix{0 & x \lt 0 \\ 1 & x \ge 0}\right. = H(x).$$

(The left-hand limit as $y$ decreases to $x$ was taken in order to assure the left continuity of $F$. The technical problem it addresses concerns the fact that when $x=0$ we're trying to integrate a function equal to $1$ for non-positive $x$ and zero for positive $x$ and, unfortunately, that is not continuous at $0$. For any other $x\ne 0$, the limit over $y$ is superfluous.)

Through the usual rules of integration $f$ determines the distribution function

$$\eqalign{ F(x) &= \int_{-\infty}^x f(x) dx = (1-p)\int_{-\infty}^x \delta(x)dx + p\lambda\int_{-\infty}^x e^{\lambda x}H(x) dx\\ &=(1-p)H(x) + p\lambda \int_0^{\max(0,x)} e^{-\lambda x}dx\\ &=(1-p)H(x) + p \left(1 - e^{-\lambda\max(0,x)}\right). }$$

Given a number $0\lt \alpha\le 1$, the solution to $F(x)=\alpha$ is obtained by considering whether $\alpha \lt 1-p$ or $\alpha \ge 1-p$, as suggested by this generic graph of $F$ (the thick blue curve with a jump at zero):

Obviously, zero ought to be the $\alpha$ percentile for $F$ whenever $0\le \alpha\lt 1-p$. Since $$F(0)=(1-p)H(0) +p (1 - e^0) = 1-p \gt \alpha$$ and $$F(x)=0 \le \alpha$$ for all $x\lt 0$, $x=0$ indeed satisfies the requirements to be an $\alpha$ quantile. For $\alpha \ge 1-p$, the equation

$$\alpha = F(x) = (1-p) + p(1 - e^{-\lambda x}) = 1 - p e^{-\lambda x}$$

has the unique solution

$$F^{-1}(\alpha) = x = -\frac{1}{\lambda}\log\left(\frac{1-\alpha}{p}\right) \ge 0$$

as given in the question.

In no case, with positive $\alpha$, is there a solution $F(x)=\alpha$ for which $x$ is negative.

Thank you very much whuber. Now it's completely clear. I feel I am a complete newbie when handling these "singular" points and I tend to overlook the importance of being rigorous with notation which leads me to make mistakes... Thanks again for the time you took in writing this thorough answer :) — Gabriel, Oct 06 '17 at 18:53
They can be tricky. For additional examples of working with singular densities, please visit https://stats.stackexchange.com/questions/16509 and https://stats.stackexchange.com/questions/73623. — whuber, Oct 06 '17 at 19:04

jjet · Answer 2 · 2017-10-08T15:33:33.690

-3

It looks like your distribution function is incorrect. First of all, you have to be careful when defining a mixed-density. It's natural to give a definition for an arbitrary probability mass function or density function. But when your random variable is mixed, there is no easy way to express that. The best solution is to define the cumulative distribution function as that's a well-defined quantity for any arbitrary random variable. In your case, the CDF would have the form: $$ F(x) = (1 - p) I(x \ge 0) + p (1 - \exp(-\lambda x)) $$ Note that this function isn't differentiable at zero so it doesn't correspond to the expression you gave. However, you can use this function to derive the associated quantile function. It is given by $$ F^{-1}(y) = -\dfrac 1 \lambda \ln\bigg(\dfrac {1 - y} p\bigg) I(y \ge p)$$ If the quantile of interest, $y$, is less than $p$, then the quantile function is equal to zero.

*Correction: The quantile function above is incorrect due to the indicator function. The correct formula is $$ F^{-1}(y) = -\dfrac 1 \lambda \ln\bigg(\dfrac {1 - y} p\bigg) I(y \ge 1 - p)$$ Also, the formula for the CDF above requires implicit multiplication by an indicator function as given below: $$ F(x) = [(1 - p) + p (1 - \exp(-\lambda x))] I(x \ge 0) $$

edited Oct 08 '17 at 15:33

answered Oct 05 '17 at 19:32

jjet

1,187
7
12

1

Your derivation of the quantile function is incorrect; to see this, note that the probability that $x=0$ is $1-p$, and therefore that for any quantile $y \leq 1-p$, the quantile function equals 0, not for any quantile $y \leq p$ as you state. – jbowman Oct 05 '17 at 20:32
@jbowman, you are right. However, the formula was incorrect due to a minor typo in the indicator function. I have corrected the formula. – jjet Oct 08 '17 at 14:35
It was more than a minor typo, because the expression for $F$ is incorrect. – whuber Oct 08 '17 at 14:36
How is it incorrect? I suppose I should've multiplied the indicator by both sides but at first glace, it seemed pedantic to do that. On second thought, that may actually make more sense to do. But either way, the final quantile function is correct. I noticed that the quantile function you derived seems to be lacking a negative sign. – jjet Oct 08 '17 at 14:42
What happens when you plug negative values $x$ into your expression for $F$? – whuber Oct 08 '17 at 15:22
I just stated that the exponential CDF requires multiplication by the indicator function $I(x \ge 0)$. If we grant that implicit multiplication, then we are left with 0 which is the desired result. Now, let's consider a simple case using your solution for the quantile function. If $\lambda=1$, $p=.5$ and $\alpha=.75$, then $\alpha \ge 1 - p$ as required and $F^{-1}(\alpha)=-.693$. The result is negative which cannot be the case. You needed to include the negative sign to obtain the correct result. – jjet Oct 08 '17 at 15:30
My answer very clearly breaks the formula for $F^{-1}$ into two cases according to the value of $\alpha$; you have ignored that. Your first formula obviously is wrong when $x$ is very negative: it produces exponentially larger negative values for $F$. I'm quite confident in the solution I presented because the graph that accompanies it was generated from *exactly* the formula I gave. I invite you to subject your formulas to a comparable test--that should reveal where you go wrong. – whuber Oct 08 '17 at 15:38
I have not ignored the two cases. I am referring to the second case in which $\alpha \ge 1-p$. In this case, I have shown the by plugging in prespecified (valid) parameters, we end up with a solution that is clearly incorrect. It is frustrating to me that you won't acknowledge that. – jjet Oct 08 '17 at 15:46
You are correct: I failed to include a negative sign in the formula (which I will fix; thank you for bringing it to my attention). However, your insistence that this is "clearly incorrect," rather than the simple typographical omission that it is, belies a certain unwillingness to understand my method of solution, to verify your own solution, or to recognize where your solution is fundamentally erroneous--and is not a mere typographical error. This was pointed out in the very first comment by @jbowman. The basic problem is that you only *present* a formula for $F$ without deriving it. – whuber Oct 08 '17 at 16:34
@whuber the irony here is truly mindboggling. You originally wrote that I had more than a minor typo. You likely came to that conclusion because my solution differed from yours. But yours had a typo too. And it was your original unwillingness to acknowledge that typo that led me to say that your solution was clearly incorrect. – jjet Oct 08 '17 at 17:09
I am sorry you feel that way. You might not fully understand what this site is about or how it works. We are far from a "simple message board." We take seriously the characterization you will find in the [site tour](https://stats.stackexchange.com/tour): "we are working together to build a library of detailed answers to every question about statistics ... **posting good questions and answers.**" Part of a good answer is *explanation*. In the present case, you offer a solution by *fiat*: "here it is, believe me it's right." We expect better answers; I'm trying to help you construct them. – whuber Oct 08 '17 at 18:32

Percentiles of mixture distribution: negative values?

2 Answers2