1

Suppose there is a disease whose likelihood of occurring increases with age. But this increase is not linear, in fact it is at a certain age (e.g. 50) that the risk starts growing rapidly. But at some older age (e.g. 80) it doesn't matter that much whether a patient is a year or two older because they are already in the risky age range. What might be a good probability distribution to model this problem?

If I were to draw the probability distribution that I have in mind it would have an S-shaped curve similar to that of the Sigmoid function, but with a truncated domain. Is there such a probability distribution? Or perhaps, there may be a better way to represent this problem.

To clarify I am looking for a PDF like Sigmoid function not a CDF.

Further clarification. I understand that area under the graph of a Sigmoid function is infinite. But if the domain of the Sigmoid function is limited then it will be finite.

I will try to explain the problem starting from the data:

I am given the following data, for some disease D, x% of patients with that disease are above the age of y. i.e. for some random patient that has disease D: $$P(his\_age \geq y | D) = x/100$$ Additionally, I know that the risk of having disease D increases with age. What I want to do is estimate $P(D|age=x)$ where x is some valid age. Using Bayes theorem I know that: $$P(D|age=x)= {P(age=x|D)P(D)\over P(age=x)}$$ From this I happen to know $P(D)$ and $P(age=x)$. All that I need is to estimate $P(age=x|D)$. To do so I though that this distribution may look like a Sigmoid function with the domain limited to possible age range.

Solution:

The current solution that I'm going with involves modifying the Sigmoid function to fit my need. These modifications are as follows: Firstly I need to be able to move the inflection point; Secondly, I need to limit the size of the area under the graph by setting lower and upper limits to the domain; Thirdly, I need to be able to change curvature of the Sigmoid to have a more gradual change; And fourthly, I need to limit the size of the area under the graph to be equal to 1.

To do so I ended up with the following function (i is inflection point location, s is scale to control the curvature, l is lower limit, and u is upper limit):

$$f(x,i,s,l,u)= \begin{cases} 0, & x\in \{l... u\} \\ {1\over 1+e^{-{x-i\over s}}} \times {1 \over s}, & \text{otherwise} \end{cases}$$

Then I integrated it: $$f'(x,i,s,l,u)= \begin{cases} 0, & x \leq l \\ \ln (e^{u \over s} + e^{i \over s}), & x \geq u\\ \ln (e^{x \over s} + e^{i \over s}), & \text{otherwise} \end{cases} $$

Then the total area is: $$t=f'(u,i,s,l,u)$$

So the final function is: $$P(age=x|D)={f'(x+1,i,s,l,u) - {f'(x,i,s,l,u)}\over t}$$

I'm not sure if this solution is completely correct (I might be way off), but that's where I got with this problem. One thing I need to look at is how to make use of this probability that was given to me $P(his\_age \geq y | D)$. Perhaps I could use the law of total probability together with a uniform distribution and the Sigmoid shaped one above.

Omar
  • 21
  • 3
  • so what you want is a distribution that gives an S-shaped hazard function? – carlo Sep 30 '19 at 16:20
  • 3
    The "S-shaped curve" to which you refer doesn't sound like a probability distribution at all: it sounds like some kind of "risk function" or "conditional distribution" in which a probability varies by age. Ordinarily one doesn't just invent such a function: one uses *data,* preferably supported by some underlying *scientific theory,* to indicate useful functions. – whuber Sep 30 '19 at 16:20
  • @carlo I'm not sure what a hazard function is. But basically I am looking for a probability, such that given some disease D, P(age=x | D). – Omar Sep 30 '19 at 16:38
  • @whuber The problem is I don't have such data. So I wanted to logically come up with a distribution that may not be too far off. – Omar Sep 30 '19 at 16:38
  • What do you hope to accomplish if you have no data? – whuber Sep 30 '19 at 16:46
  • 1
    Please keep in mind that if $f$ is $PDF => \int_0^\infty f(x) dx = 1 => \underset{x\rightarrow 0}{\lim} f(x) = 0$ and $\underset{x\rightarrow 0}{\lim} {\rm sigmoid}(x) = 1$. – quester Sep 30 '19 at 16:56
  • Also remember that the y-axis for a PDF is not a probability. You couldn't look at an age on the x-axis, then go to the curve above that point and see its height and interpret its height as the probability of disease. That's what you could do with logistic regression, which does indeed use the logistic **CDF** for its sigmoid curve, but that's not what a PDF does. – Noah Sep 30 '19 at 20:12
  • 2
    I am voting to close this question because what you ask 'a sigmoid probability distribution' is unclear and will create lots of various answers using different interpretations that will make the issue even less clear. It seems like your underlying problem is not what you ask. Please explain your problem from the bottom up (what data do you have what problem do you want to tackle) and then we may help to shape this into a question. – Sextus Empiricus Oct 01 '19 at 07:53
  • I imagine that you are looking at some sort of inhomogeneous poisson process where the rate of getting the disease (when not yet being sick) is some function of age $\lambda (t) $ and then the probabiliqty of not getting sick before some age is a function expressed as an integral $$P (t)_\text {no disease} = e^{\int_0^t -\lambda (s)ds}$$ See more in [this question about waiting time](https://stats.stackexchange.com/a/354574/164061) – Sextus Empiricus Oct 01 '19 at 08:13

3 Answers3

2

Normal distributions have such cumulative distribution functions (CDFs). Here are the density function (left) and CDF (right) for the distribution $\mathsf{Norm}(\mu=50, \sigma=7).$

enter image description here

Some (but not all) other types of distributions have CDFs of somewhat similar shapes, If you want further examples or explanations, please leave a comment.


Addendum per comments: Following @Omar's suggestion: Truncate the above normal distribution to interval $(-\infty, 50)$ and double the PDF on that support. But notice that the CDF doesn't have a 'sigmoid' shape. Also, please note the objection in @ whuber's Comments. This quest might be more successful, not to mention useful. if you had an application in mind.

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76
  • If possible I want a PDF that has the shape of Sigmoid (this would mean that the PDF is bounded) not a PDF whose CDF has that shape. – Omar Sep 30 '19 at 16:31
  • 1
    Remember that the area under a PDF has to be 1, so the PDF curve cannot remain 'high' at the right. – BruceET Sep 30 '19 at 16:34
  • if the PDF is truncated by setting a certain domain of possible inputs, where if the input was out of that domain the function returns 0 then may be possible. – Omar Sep 30 '19 at 16:41
  • OK, I'll push this just one step further. See addendum to answer implementing the truncation idea. Also, objections in other comments. – BruceET Sep 30 '19 at 23:10
1

It seems like what you want is a cumulative hazard function of that sigmoid shape, at least that is what But this increase is not linear, in fact it is at a certain age (e.g. 50) that the risk starts growing rapidly. But at some older age (e.g. 80) it doesn't matter that much whether a patient is a year or two older because he is already in the risky age range indicates. Wikipedia has a good explanation.

Following this, the cumulative hazard ratio is $$\Lambda(t)=-\ln S(t) $$ where $S(t)$ is the survival (or tail) function $S(t)=1-F(t)$, $F$ the cumulative distribution function. So just try to propose a sigmoid function as model for $\Lambda(t)$ and solve the resulting equation. If you try the cumulative normal distribution function as your sigmoid, the equation should be solvable explicitly. Details is an exercise.

You will find a lot of information in books about reliability theory, for instance this one or this one.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Thank you. I'm not sure if hazard function will solve my issue. But as it was brought up multiple times, I will look into it. – Omar Sep 30 '19 at 17:21
0

technically if we would treat sigmoid $S(x)$ as CDF then $\frac{d\,S(x)}{dx} = \frac{e^{-x}}{(1+e^{-x})^2}$ is PMF of Logistic distribution en.wikipedia.org/wiki/Logistic_distribution // name was found thanks co courtesy of @grand_chat

also keep in mind that probably all continuous distributions have CDF that looks like sigmoid...

https://en.wikipedia.org/wiki/Weibull_distribution

https://en.wikipedia.org/wiki/Normal_distribution

https://en.wikipedia.org/wiki/Erlang_distribution

quester
  • 472
  • 3
  • 12
  • 3
    The name of the distribution having a sigmoid CDF is, unsurprisingly, the logistic distribution! https://en.wikipedia.org/wiki/Logistic_distribution – grand_chat Sep 30 '19 at 16:48
  • To clarify I'm looking for a PDF like Sigmoid function not a CDF. I understand that the CDF of some distributions looks like a Sigmoid function, but their PDFs do not. – Omar Sep 30 '19 at 16:50
  • 3
    @Omar You can't get a PDF that looks like that because the total area under the PDF must sum to 1. With a sigmoid function, the total area infinite. – Chechy Levas Sep 30 '19 at 16:52
  • @ChechyLevas but limiting the domain will make the area finite. – Omar Sep 30 '19 at 17:23
  • @Omar, in that case you already have your answer. Take a sigmoid function, limit the domain as you suggest, scale it so that the total area is 1. As I am sure you are aware, any function that integrates to 1 can be used as a PDF. In this case, it almost certainly does not have a name. – Chechy Levas Sep 30 '19 at 17:26
  • Not all continuous distributions have 'sigmoid' CDFs: Two immediate counterexamples are $\mathsf{Unif}(0,1)$ and the one in the Addendum to my Answer. – BruceET Sep 30 '19 at 23:07