What distribution to use to model time before a train arrives?

Question

I'm trying to model some data on train arrival times. I'd like to use a distribution that captures "the longer I wait, the more likely the train is going to show up". It seems like such a distribution should look like a CDF, so that P(train show up | waited 60 minutes) is close to 1. What distribution is appropriate to use here?

If you wait 25 hours and there has been no train, I suspect the chance of a train turning up in the next minute may be close to $0$ as it is quite possible that the line has been closed temporarily or permanently — Henry, Jul 05 '18 at 11:32
@Henry, this depends entirely on your believes in prior probabilities. For instance, the least used railway station in Britain, https://www.theguardian.com/uk-news/2016/dec/09/brief-encounter-at-britains-least-used-railway-station-shippea-hill , does have gaps of arrivals for more than one day (on Sundays there is no service). — Sextus Empiricus, Jul 05 '18 at 15:01
@MartijnWeterings - perhaps thanks to journalists, Shippea Hill saw a 1200% increase in usage and did not even make [the lowest 10 of usage the following year](https://www.globalrailnews.com/2017/12/01/these-are-the-10-least-used-railway-stations-in-great-britain/), some of which such as Teesside Airport have one train a week in one direction — Henry, Jul 05 '18 at 16:59

score 17 · Accepted Answer · edited Jun 11 '20 at 14:32

Multiplication of two probabilities

The probability for a first arrival at a time between $t$ and $t+dt$ (the waiting time) is equal to the multiplication of

the probability for an arrival between $t$ and $t+dt$ (which can be related to the arrival rate $s(t)$ at time $t$)
and the probability of no arrival before time $t$ (or otherwise it would not be the first).

This latter term is related to:

$$P(n=0,t+dt) = (1-s(t)dt) P(n=0,t)$$

or

$$\frac{\partial P(n=0,t)}{\partial t} = -s(t) P(n=0,t) $$

giving:

$$P(n=0,t) = e^{\int_0^t-s(t) dt}$$

and probability distribution for waiting times is:

$$f(t) = s(t)e^{\int_0^t-s(t) dt}$$

Derivation of cumulative distribution.

Alternatively you could use the expression for the probability of less than one arrival conditional that the time is $t$

$$P(n<1|t) = F(n=0;t)$$

and the probability for arrival between time $t$ and $t+dt$ is equal to the derivative

$$f_{\text{arrival time}}(t) = - \frac{d}{d t} F(n=0 \vert t)$$

This approach/method is for instance useful in deriving the gamma distribution as the waiting time for the n-th arrival in a Poisson process. (waiting-time-of-poisson-process-follows-gamma-distribution)

Two examples

You might relate this to the waiting paradox (Please explain the waiting paradox).

Exponential distribution: If the arrivals are random like a Poisson process then $s(t) = \lambda$ is constant. The probability of a next arrival is independent from the previous waiting time without arrival (say, if you roll a fair dice many times without six, then for the next roll you will not suddenly have a higher probability for a six, see gambler's fallacy). You will get the exponential distribution, and the pdf for the waiting times is: $$f(t) = \lambda e^{-\lambda t} $$
Constant distribution: If the arrivals are occurring at a constant rate (such as trains arriving according to a fixed schedule), then the probability of an arrival, when a person has already been waiting for some time, is increasing. Say a train is supposed to arrive every $T$ minutes then the frequency, after already waiting $t$ minutes is $s(t) = 1/(T-t)$ and the pdf for the waiting time will be: $$f(t)= \frac{e^{\int_0^t -\frac{1}{T-t} dt}}{T-t} = \frac{1}{T}$$ which makes sense since every time between $0$ and $T$ should have equal probability to be the first arrival.

So it is this second case, with "then the probability of an arrival, when a person has already been waiting for some time is increasing", that relates to your question.

It might need some adjustments depending on your situation. With more information the probability $s(t) dt$ for a train to arrive at a certain moment might be a more complex function.

score 7 · Answer 2 · answered Jul 05 '18 at 06:20

7

The classical distribution to model waiting times is the exponential distribution.

The exponential distribution occurs naturally when describing the lengths of the inter-arrival times in a homogeneous Poisson process.

answered Jul 05 '18 at 06:20

Stephan Kolassa

95,027
13
197
357

2

Yes, but I daresay a Poisson process is not a good model for a train network. – leftaroundabout Jul 05 '18 at 12:05

What distribution to use to model time before a train arrives?

2 Answers2

Multiplication of two probabilities

Derivation of cumulative distribution.

Two examples

Linked