4

I am learning about arrival times with constraints. I've constructed a statistical problem based on a metaphor I heard about business processes. I think better understanding how to solve the problem below will help me tackle similar problems.

Imagine a finite collection of $n$ hikers start up a narrow trail of length $l$ (in meters please) that does not allow for passing, thus the order of the hikers does not change. Each hiker's speed if they were alone would be a folded normal distribution with their own mean $\mu_i$ and standard deviation $\sigma_i$, but their speed is truncated when a hiker is ahead of them due to the non-passing rule.

Question: What is distribution of the arrival time of the group in the sense of the elapsed time from when the first hiker starts up the trail to the time that the last hiker arrives at the end of the trail?


My thoughts on approaching the solution.

The first hiker is unconstrained while every other hiker's arrival time depends on the hikers in front of them. So it makes sense to first consider the first hiker's arrival time which I think will be distributed by an inverse folded normal distribution. Since a speed of zero is sometimes possible, the case $\frac{l}{0}$ will have to be considered. Perhaps by arguing it has measure zero.

Every hiker's speed after the first will be dependent on those hikers ahead of them. This dependence aside, they'll each have an arrival time distribution related to the inverse folded normal distribution. Perhaps the best way to tackle this is with the chain rule of probability where we consider the speeds that are compatible with the "no passing" constraint. Since we are imposing a total order on the hikers' positions, we can focus on keeping the speed of a given hiker consistent with only the hiker directly ahead of them while recursively applying the chain rule.

If my approach is valid, then it is just a matter of figuring out how to set up the appropriate formalisms.


Update

Assume that each hiker starts at an arbitrary time at the start of the trail with an arbitrary speed. For distribution purposes, these initial conditions have a degenerate distribution.

DifferentialPleiometry
  • 2,274
  • 1
  • 11
  • 27
  • 3
    The last hiker cannot arrive any sooner than the slowest hiker. If all hikers start at the same point (essentially), any starting behind the slowest will remain bunched up behind and will arrive (essentially) at the same time. Thus, if "arrival time of the group" means "first time at which all have arrived," this is just an overly complicated way to ask how to compute the distribution of the minimum of $n$ potentially different folded Normal distributions. That requires numerical integration. It doesn't require any special "formalisms." Would this be a correct analysis of your problem? – whuber Dec 22 '21 at 17:37
  • @whuber Just taking the minimum of the individual arrival times doesn't sound quite right. I think that would be right if the hikers could pass each other. Let's try to clarify. The last hiker must arrive no sooner than the slowest hiker, the last hiker's arrival is dependent on those ahead of them because they cannot pass, and the last hiker is not necessarily the slowest hiker. Does that help you? – DifferentialPleiometry Dec 22 '21 at 17:57
  • @whuber By *appropriate formalisms* I was thinking about setting up integrals (if they apply). The distinction of special vs non-special isn't clear to me. – DifferentialPleiometry Dec 22 '21 at 18:00
  • @whuber Perhaps providing a numerical integration might (and code) might clarify what is being asked. – DifferentialPleiometry Dec 22 '21 at 18:02
  • @whuber I think I should further clarify. If the hikers were allowed to pass each other it would make sense to use the minimum speed of the hikers to further compute the arrival time of the group. The hikers cannot pass each other in this case. – DifferentialPleiometry Dec 22 '21 at 18:13
  • In conventional approach, one could leverage Erlang/Poisson distribution with simulations: See https://stats.stackexchange.com/questions/368986/poisson-process-and-queuing-system – msuzen Dec 22 '21 at 18:14
  • @MehmetSüzen You're suggesting that the group arrival time can be constructed from the inter-arrival times of the hikers? On that face of it that sounds promising. – DifferentialPleiometry Dec 22 '21 at 18:15
  • Indeed. It sounds like a queuing problem after all. – msuzen Dec 22 '21 at 18:16
  • @MehmetSüzen I agree. I will track down some introductory reading on [queuing theory](https://en.wikipedia.org/wiki/Queueing_theory). – DifferentialPleiometry Dec 22 '21 at 18:18
  • @MehmetSüzen There will be some issues I need to iron out. For example, the [erlang distribution](https://en.wikipedia.org/wiki/Erlang_distribution) appears to assume that the underlying exponential distributions are independent. I cannot assume independence, but I suspect there are valuable lessons to learn from these examples along the way. – DifferentialPleiometry Dec 22 '21 at 18:20
  • 1
    You seem not to have considered my argument. It assumes, as you state, that hikers cannot pass each other. The faster ones bunch up behind the slowest and all of those arrive together at the finish. That's all this comes down to. – whuber Dec 22 '21 at 18:29
  • @whuber I have certainly considered what you've stated above, but I wouldn't rule out that I've misunderstood it. That bunching you mention is an interesting possibility. Is that guaranteed to happen? I'm imagining that arriving at the same time in a bunch depends on the speed dispersion parameters of the hikers and the length of the hiking trail. What do you think? – DifferentialPleiometry Dec 22 '21 at 18:46
  • 2
    You seem to be thinking of a situation where the hikers all start at different points or at different times--but you haven't provided any information or assumptions about this. When the hikers are all at the same point at some time, then *of course* the bunching happens, even if it is just a "bunch" consisting of the slowest person alone. That's just a restatement of "they can't pass each other." (Anyone who has ever been caught in slow traffic knows this intuitively!) – whuber Dec 22 '21 at 18:52
  • 1
    @whuber That is an excellent point about initial conditions! Indeed, I seem to be assuming that the hikers differ in their starting position or time. I will update with an edit once I have clarified my own thinking on which case I am interested in. Thanks! – DifferentialPleiometry Dec 22 '21 at 18:58
  • 1
    It might simplify your thinking to reformulate the problem in this way: *Given the initial positions, what is the distribution of the times required to reach the end of the trail if hikers are not prevented from passing each other?* Then observe that anyone who encounters someone in front necessarily moves faster and therefore would have completed the hike sooner than their blocker. Once again, the solution devolves to finding the distribution of the maximum of those original times to completion, computed as if they were independent. – whuber Dec 22 '21 at 21:23
  • Re the latest edit: I believe you are still overcomplicating your problem. *Solve the problem for unconstrained hiking,* and then argue that this gives the solution for the problem where hikers cannot pass one another. For your problem to be solvable you *must* specify the joint distributions of the starting times, starting locations, and speeds. You could simplify this greatly by just giving the distributions of the times needed to reach the endpoint without interference from other hikers. – whuber Dec 23 '21 at 16:58
  • @whuber I have updated the initial conditions. How does this argument you have in mind go for starting with the unconstrained hiking case and showing that it holds for the constrained order case? – DifferentialPleiometry Dec 23 '21 at 17:12

1 Answers1

4

Let the speed of hiker $i$ be modeled by a random variable $S_i$ and let their start position be a distance $P_i$ from the end of the hike, where the hikers form a set $\mathcal H$ and $i\in\mathcal H.$ Their position at time $t$ would therefore be $P_i - t S_i,$ assuming they were unimpeded, and the time needed to arrive at the goal unimpeded would be $A_i=P_i / S_i.$

By time $t\ge 0,$ a hiker will have encountered anyone whose initial position was ahead of them but whose computed unimpeded position is now behind them. So, define

$$\mathcal{B}(t; \mathcal H, i) = \{j \in \mathcal H \mid P_j \le P_i\ \&\ P_j - tS_j \ge P_i - tS_i\}.$$

This is the subset of hikers whom $i$ has encountered through time $t.$ This bunch of hikers will reach the goal together at the last arrival time of everyone in it. That is, hiker $i$'s arrival time now is projected to be

$$A_i(t;\mathcal H) = \max\left(A_j \mid j \in \mathcal{B}(t;\mathcal H, i)\right).$$

Obviously none of these exceeds the largest of the $A_j.$ However, it is equally obvious that at least one of them always equals the largest of the $A_j:$ namely, the hiker with the latest unimpeded arrival time. Thus,

The problem is to find the distribution of the maximum of the $A_i, i\in\mathcal H.$

That has a standard solution derivable from basic concepts. In the simplest case, where the $A_i$ are $n$ independent and identically distributed variables with common distribution $F,$ the distribution function of the maximum is $F^n.$ In general, though, this distribution is messy and requires numerical integration.

To see this result intuitively, ponder a space-time diagram of a hike, where position along the trail is plotted vertically (with the goal at top) and time is indicated horizontally, flowing to the right. In this example five hikers $\mathcal H = \{a,b,c,d,e\}$ begin at the positions indicated by the triangles at the left at speeds indicated by the slopes of the rays emanating to their right, arriving at the goal at times $A_i$ indicated by the circles on the goal line at the top. Their unimpeded progress is indicated by the solid and dotted continuations, one color per hiker.

Figure

The true progress of the hikers is indicated by the solid line segments, colored according to the speed of the slowest hiker in each group. So, $e$ starts behind all hikers but quickly encounters the slower $d,$ forming a bunch of two hikers moving at $d$'s pace. A little while later the fast-moving $b$ runs into slower $a,$ forming another pair of hikers moving at $a$'s pace. This latter pair reaches the goal a little before hiker $c,$ who encountered nobody along the way. Finally, the $e,d$ pair cross the line exactly at the time $A_d$ that was anticipated for $d$ at the outset.

The analysis with which this answer began simply amounts to pointing out that $d$ cannot arrive any later than originally anticipated. After all, if they are ever delayed en route, then they will arrive with one or more hikers whose original arrival time was even later then $A_d,$ contradicting the characterization of $d$ as the hiker with the largest value of $A_d.$

R code to create the example.

f <- function(arrivals, speeds) {
  #
  # Compute an impeded hiker's position at time `time`.
  # (Negative positions have yet to reach the goal.)
  #
  path <- Vectorize(function(time, arrival, speed) {
    x <- speeds * (time - arrivals)   # Positions of all hikers
    x.0 <- speed * (time - arrival)   # This hiker's position
    crossed <- which(x <= x.0 & arrivals * speeds <= arrival * speed)
    min(x[crossed])                   # *Furthest* from the goal
  }, "time")
  #
  # Plot the unimpeded and impeded paths, respectively.
  #
  plot.path. <- function(arrival, speed, ...) curve(speed * (x - arrival), add = TRUE, ...)
  plot.path <- function(arrival, speed, ...) curve(path(x, arrival, speed), add = TRUE, ...)
  #
  # Plot and decorate the space-time diagram for these hikers.
  #
  colors <- hsv(seq(0, 3/4, length.out=length(arrivals)), 1, .8, alpha=2/3)
  plot(c(0, 1.1) * max(arrivals), c(-1, 0.1) * max(arrivals * speeds), type="n", bty="n",
       xlab="Time", ylab="Position", xaxt="n", yaxt="n",
       main="Space-Time Diagram of the Hikers")
  abline(h=0)                                                              # Goal line
  mtext("Goal", side=2, at=0, line=1)                                      # "Goal" label
  invisible(mapply(plot.path., arrivals, speeds, col=colors, lty=3, lwd=2))# Dotted paths
  invisible(mapply(plot.path, arrivals, speeds, col=colors, lty=1, lwd=2)) # Solid paths
  points(arrivals, rep(0, length(arrivals)), pch=21, bg=colors)            # Arrival points
  points(rep(0,length(arrivals)), -arrivals*speeds, pch=24, bg=colors)     # Departure points
  mtext(names(arrivals), side=2, at=-arrivals*speeds, line=0)              # Path labels
}
#
# Create sample data.
#
# set.seed(17)
arrivals <- signif(sort(rgamma(5, 20)), 3)
speeds <- signif(abs(rnorm(length(arrivals), 1, 3/4)) + 0.25, 3)
# # Put the last arriver near the middle at the start.
# i <- which.max(arrivals)
# speeds[i] <- 1/mean(1/speeds[-i])
names(arrivals) <- names(speeds) <- head(letters, length(arrivals))[rank(arrivals*speeds)]
f(arrivals, speeds)
whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • The spacetime diagram has different positions at time zero, whereas the question asks for the same initial position at different times. However, the main point of illustration and the analysis is equivalent. – DifferentialPleiometry Dec 23 '21 at 19:48
  • I apologize for not commenting on that difference. "The same initial position at different times" is nearly equivalent to "different positions at a given time." The only difference is that in the former, some collisions detected in my diagram would not have occurred. That changes neither the analysis nor the solution. – whuber Dec 23 '21 at 21:51
  • @whuber: could you please disclose also the R script (and graph related as well). Many thanks. – Maximilian Dec 25 '21 at 08:27
  • 1
    @Max The `R` code I wrote to create this example merely implements the algorithm described here. See the function `path` within `f` in the code. – whuber Dec 25 '21 at 16:51