9

There is an interesting puzzle in YouTube video Can you solve The Frog Problem?. I'll try to give an equivalent formulation here.

A frog is on one side of the pond and wants to get on the other side. There are $n$ lily leaves ahead in a line, the $n$-th leave laying on the other end of the pond and being the destination. Whatever the position the frog is at any time, it will only go ahead and the probability to land on one of the leaves left in front of it (including the destination) is uniform. So, for example, if there are 10 leaves ahead, there is a probability of $\frac{1}{10}$ that it will land on any of them.

What is the expected value for the number of jumps it will take the frog to arrive to the destination leaf? The answer cannot be a recursive expression.

I think I have a solution, I'll report it as an answer below.

polettix
  • 506
  • 4
  • 11

4 Answers4

3

We will call $J_n$ the expected value for the jumps when there are $n$ leaves ahead. We set also $J_0 = 0$, which is consistent with the fact that if the frog has no leaf ahead it needs to do exactly $0$ jumps to arrive at destination.

We will name/number the leaves according to their distance from the destination. So the destination will be leaf $0$, the one immediately before $1$ and so on up to leaf $n-1$ that is the one in front of the frog. There are in total $n$ leaves and the probability to jump on any of them with one leap is $\frac{1}{n}$ according to the puzzle indications.

When the frog takes this first leap, it will land on leaf $k$, with $k \in \{0, ... n-1\}$ and, from that point, the expected value of remaining leaps will be $J_k$. Considering that these events are mutually exclusive, we get the following:

$$J_n = \sum_{k=0}^{n-1}\frac{1}{n}(1 + J_k)$$

where the $1$ represents the first leap to reach position $k$. As there are $n$ elements in the sum, it can be rearranged as:

$$J_n = 1 + \frac{1}{n} \sum_{k=0}^{n-1}J_k$$

which is indeed a bit too recursive. With simple arithmetics we can further rearrange it as follows:

$$n(J_n - 1) = \sum_{k=0}^{n-1}J_k$$

This relation is generic and can be also rewritten with $n-1$ instead of $n$:

$$(n-1)(J_{n-1} - 1) = \sum_{k=0}^{n-2}J_k$$

Subtracting the two relations we get:

$$n(J_n - 1) - (n-1)(J_{n-1} - 1) = \sum_{k=0}^{n-1}J_k - \sum_{k=0}^{n-2}J_k = J_{n-1}$$

that is:

$$n(J_n - 1) = (n-1)(J_{n-1} - 1) + J_{n-1} = nJ_{n-1} - (n-1)$$ $$J_n - 1 = J_{n-1} - \frac{n-1}{n}$$ $$J_n = J_{n-1} + \frac{1}{n}$$

Still recursive, but at least the $n$-th element is expressed in terms of $n-1$-th element only.

Now considering that $J_0 = 0$ the relation above can be collapsed to:

$$J_n = \sum_{k = 1}^{n}\frac{1}{k}$$

which is the answer to the puzzle.

polettix
  • 506
  • 4
  • 11
  • 2
    You can also reason more shorter and intuitively: the frog with $n$ leaves ahead of him has the same options in front of him as the frog with $n-1$ leaves ahead *plus* the option that he first needs to make an extra step (land on the first leave in front of him which the frog in front of him does not have) with probability $1/n$. Thus $J_n = J_{n-1} + 1/n $ can be reasoned directly. – Sextus Empiricus Sep 14 '19 at 07:15
  • 1
    And now do the same puzzle when the frog can also stay in the same place. – Sextus Empiricus Sep 14 '19 at 07:17
  • 1
    When the frog can also go one step backwards then you have $$J_n = J_{n-1} *(n-1)/(n+1) + (J_{n}+1)/(n+1) + (J_{n+1}+1)/(n+1)$$ leading to $$J_{n+1} = n J_n - (n-1) J_{n-1}$$ – Sextus Empiricus Sep 14 '19 at 07:35
  • @MartijnWeterings I can only say wow, if you would turn the comment(s) into an answer it will be my accepted one! – polettix Sep 14 '19 at 08:23
  • It is only a small comment that allows you to elliminate that term $\sum J_k $ at once instead of the 'subtracting the two relations' that you do. Your answer is already a sufficient answer. That's why I added it as comment. – Sextus Empiricus Sep 14 '19 at 08:35
  • You should check out my new question on the same topic. – Sextus Empiricus Sep 14 '19 at 08:36
  • There is an interresting twist to this question. Instead of stating 'the answer cannot *be* a recursive relation' we can also ask 'the derivation of the answer cannot *use* a recursive relation'. We can solve it without the use of a recursive relation. We can plot the probability where the frog is after 1 step (a uniform distribution) after 2 steps (a triangular distribution) after 3 steps (a quadratic function for the distribution) after 4 steps (cubic) and so on. And then use the probability to be in the final leave at 1...10 steps. I will post that as an answer later this weekend. – Sextus Empiricus Sep 14 '19 at 10:11
  • I start to feel like I opened Pandora's vase... – polettix Sep 14 '19 at 14:01
  • 1
    The final equation should have $k$ in the denom instead of $n$ – L. Scott Johnson Sep 16 '19 at 17:52
  • @L.ScottJohnson thanks for the corrections! – polettix Sep 16 '19 at 19:36
  • Could you indicate how this answer is not recursive? *By definition,* the summation is given by $J_1=1$ and $J_{n} = 1/n + J_{n-1}.$ Yes, it's a simple recursion and its solution even has a name (these are the [harmonic numbers](https://en.wikipedia.org/wiki/Harmonic_number)), but that doesn't obviate its recursive nature. – whuber Sep 16 '19 at 19:52
  • @whuber the answer uses a recurrence relation, but the final answer can be expressed more elegantly as a sum. So, you can see at once the expression for $J_n$ and do not have to compute it by going manually through computing first all $J_1, J_2, ... J_{n-1} $. – Sextus Empiricus Sep 16 '19 at 20:08
  • @Martijn My point is that this sum, because it has a variable endpoint, ultimately has a recursive definition: it is a recursion disguised by the familiar summation notation. In this sense the request in the question to express the answer non-recursively isn't really satisfied: we have only the illusion of satisfying it. If you want to do better than that, you will have to show how to compute that sum without having to add $O(n)$ terms (which you can indeed do, *e.g.* with the digamma function). – whuber Sep 16 '19 at 20:52
  • 3
    @whuber the fact that the final expression requires an iterative process taking $O(n)$ does not necessarily make it recurrence relation (e.g. in the article you linked, the recurrence relation is explicitly indicated as $H_{n+1} = H_n + \frac1{n+1}$). You might just as well decide to rearrange the terms during the sum as you see fit, while this would not be possible by leveraging the recurrence relation alone. – polettix Sep 16 '19 at 21:16
  • The digamma function is not disguise? Where is the boundary? To me a sum (individual terms), is not closed, but different from an expression like a recurrence relation (related terms). The latter requires that you either come up with a solution for the equation or compute every term in succession. The harmonic number is a solution to the equation. It just happens that that solution, the harmonic number, happens to coincide with every term in the recurrence relation. The switch between a recurrence relation and it's solution as a series is, imo, a meaningful difference and not just disguise. – Sextus Empiricus Sep 16 '19 at 21:24
  • The question stated in the youtube video asks 'what is the explicit formula, (not the recurrence relation)?'. It becomes a bit arguing over semantics and it is ambiguous what type of mathematical equation an explicit formula may refer to. But I guess that the intention of the question is to not end with recurrence relations like $J_n = 1 + \frac{1}{n} \sum_{k=0}^{n-1}J_k$ but at least turn it into the simpler first order recurrence relation $J_n = J_{n-1} + \frac{1}{n}$. – Sextus Empiricus Sep 16 '19 at 21:41
  • Polettix and @Martijn, I am only pointing out that whatever you might think of summation, when you look at its underlying mathematical definition it is recursive. There is a large literature on this in mathematics and computer science. The summation is not a "solution" but is only one conventional way of expressing the recursion. If you think it's an "explicit formula," then I invite you to compute it for, say, $n=10^{20},$ to see just how non-explicit it really is, and compare that to the digamma calculation. – whuber Sep 17 '19 at 11:41
  • @whuber Is there a closed form expression for the digamma function that can be used to perform the computation? Or is it an approximation? – Sextus Empiricus Sep 17 '19 at 14:35
  • @Martijn It all depends on what you mean by "closed" and "approximation." For instance, how do you compute exponentials, logarithms, or trig functions? Apart from some special values, all such calculations are carried out numerically using power series, continued fractions, numerical searches, or whatever. One advantage is that often such procedures (a) can be made arbitrarily accurate while (b) incurring computational costs that depend only on the accuracy and not on the magnitude of the argument. The digamma function has this property; a summation of arbitrary length does not. – whuber Sep 17 '19 at 15:27
  • @whuber I agree that closed form is arbitrary. It depends on the accepted set of functions. You could always define a new function to "close" the equation. I believe the issue for this question is more to not end up with a recurrence relation (that you do not write the expactation value for one case in terms of the expectation values of other cases). Indeed the summation formula that one will end up with has a recursive nature and as $n$ gets bigger you need more terms, but it remains a closed form algebraic expression. The end result is a finite number of known terms and expressions. – Sextus Empiricus Sep 17 '19 at 16:16
  • @Martijn There I believe your characterization may be a little misleading, because although the number of terms is finite, it can be arbitrarily large: this not only violates the stated spirit of the question, but also has important implications for computation and analysis. I'm suggesting there is value in recognizing this fact both conceptually and practically. As far as the present thread is concerned, one might argue that the implicit recursion in the sum is simpler than the recursion initially used to obtain it; regardless, banning "recursive expressions" looks artificial and useless. – whuber Sep 17 '19 at 17:30
  • I agree that the usefulness of a closed expression when it contains arbitrarily long terms is questionable. And an expression in terms of special functions or a limit might be more useful when it comes to computation. However, to me the spirit seems to be that one finds an explicit formula, and it was stressed that this should not be a recurrence relation. To me that 'explicit' means that one should rewrite any result containing multiple unknown terms such that you end up with a single, isolated, unknown term (explicit contrasting with implicit). $\sum_1^n 1/k$ is not implicit, it is not wrong – Sextus Empiricus Sep 17 '19 at 18:05
3

This is an interesting problem, and polettix gives the solution for the immediate problem of finding the expected number of jumps. I'm going to try looking at the broader issue of the distribution of the time it takes to get to the last lily-pad. This broader analysis allows us to find the probabilities of any state, and any of the moments of the distribution.

This analysis can be framed as a problem of finding the distribution of the "hitting time" for the absorbing state of a discrete Markov chain. It is relatively simple to program this Markov chain in statistical software and extract the resulting distribution of the hitting times, thus giving a complete solution to the Frog Problem.


Setting up the problem as a Markov chain: To set up the problem, we use states $x = 0,1,2,...,n$, where state $x=0$ is the frog on the river-bank, and the remaining states are for the frog being on lily-pads $1,...,n$ respectively. We let $\{ X_t | t =0,1,2,3,... \}$ be the random process in the problem, with the frog being at lily-pad $X_t$ immediately after jump $t$. The process is a strictly monotone discrete Markov chain with the $(n+1) \times (n+1)$ transition probability matrix:

$$\mathbf{P} = \begin{bmatrix} 0 & 1/n & 1/n & \cdots & 1/n & 1/n & 1/n \\ 0 & 0 & 1/(n-1) & \cdots & 1/(n-1) & 1/(n-1) & 1/(n-1) \\ 0 & 0 & 0 & \cdots & 1/(n-2) & 1/(n-2) & 1/(n-2) \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 1/2 & 1/2 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1 \\ \end{bmatrix}.$$

The number of jumps to the last lily-pad is the hitting time for state $n$, which is:

$$T \equiv \min \{ t \in \mathbb{N} | X_t = n \}.$$

Our goal will be to find the probability mass function for the random variable $T$, which provides a full solution to the frog problem (i.e., it fully describes the behaviour of the number of jumps to the last lily-pad).


Finding the probability mass function: Since the frog progresses by at least one lily-pad at each jump, it takes at most $n$ jumps to reach the last lily-pad, so we must have $1 \leqslant T \leqslant n$. The cumulative distribution function for this time is:

$$F_T(t) = \mathbb{P}(T \leqslant t) = \mathbb{P}(X_t = n) = [\mathbf{P}^t]_{0,n}.$$

Thus, the probability mass function for the time is:

$$p_T(t) = \begin{cases} 1/n & & \text{for } t = 1, \\[6pt] [\mathbf{P}^t]_{0,n} - [\mathbf{P}^{t-1}]_{0,n} & & \text{for } t > 1. \\[6pt] \end{cases}$$

This mass function fully describes the distribution of the time taken for the frog to reach the last lily-pad, and so it can be considered to be a complete solution to the Frog problem. To facilitate computation, we can program this distribution in R as the dfrog function. This is a vectorised function that generates values from the probability mass function for an argument vector T and parameter n.

dfrog <- function(n, T = 1:n) {

#Create transition probability matrix
P <- matrix(0, nrow = n+1, ncol = n+1);
for (i in 1:n) { 
for (j in i:n) { 
    P[i, j+1] <- 1/(n+1-i);  } }
P[n+1, n+1] <- 1;

#Generate CDF and PMF vectors
PP  <- P;
CDF <- rep(0, n);
for (t in 1:n) {   
    CDF[t] <- PP[1, n+1];
    PP <- PP %*% P; }
PMF <- diff(c(0, CDF));

#Generate output vector
OUT <- rep(0, length(T));
for (i in 1:length(T)) { OUT[i] <- PMF[T[i]]; }

OUT; }

We can use this function to generate and plot the probability mass function. The plot below shows the distribution of the number of jumps when there are $n=20$ lily-pads. As can be seen, the frog will usually take 3-4 jumps to reach the last lily-pad in this case.

enter image description here

#Load ggplot and set theme
library(ggplot2);
THEME <- theme(plot.title    = element_text(hjust = 0.5, size = 14, face = 'bold'),
               plot.subtitle = element_text(hjust = 0.5, face = 'bold'));

#Plot the PMF
n    <- 20;
DATA <- data.frame(Jumps = 1:n, Probability = dfrog(n));
ggplot(aes(x = Jumps, y = Probability), data = DATA) + 
    geom_bar(stat = 'identity', fill = 'darkgreen') +
    THEME +
    ggtitle('PMF of number of jumps to last lily-pad') +
    labs(subtitle = paste0('(Frog problem with n = ', n, ' lily-pads)'));
Ben
  • 91,027
  • 3
  • 150
  • 376
3

Instead of using the recursive relation for the expected number $J_n = J_{n-1} + \frac{1}{n}$ we could also try a more mechanistic approach by computing every path that the frog can take and the distribution of the probability of the position of the frog after $k$ jumps.

This can be quickly computed using a Markov chain.

# stochastic Matrix
M <- pracma::Toeplitz(c(0,rep(1,10)),rep(0,11)) / c(1,1:10) 
M[1,1] <- 1                                               

# positions of frogs after k steps
V <- c(rep(0,10),1)
Vm <- sapply(0:10, FUN = function(k) V %*% (M %^% k))

# mean number of steps by computing 1-F(0)
sum(1-Vm[1,])

giving $2.928968$

The mass distribution, $p(x,k)$, for the probability to be at distance $x$ from the 'finish-leaf' in the $k$-th step would look like the following:

enter image description here


This method has one downside. It is not very easy to derive the final charming result that the expectation value for the number of steps is equal to the n-th harmonic number $\sum_{k=1}^n 1/k$.

In the comments I suggested that these distributions $p(x,k)$ would be like polynomial functions. However that is wrong. It is more complicated.

The distribution follows the relation:

$$p(x,k) = \sum_{y=x+1}^N \frac{p(y,k-1)}{j}$$

where $p(x,k)$ is a sum of the probabilities for the position of the frog in the $(k-1)$-th step, and $N$ is the number of leaves (generalizing from $N=10$). To start this relation we use $p(N,0)=1$.

This could be expanded as

$$p(x,k) = \frac{1}{N} \sum_{l_1=x+1}^{N-k} \sum_{l_2=l_1+1}^{N-k+1} ... \sum_{l_k=l_{k-1}+1}^{N-1} \frac{1}{l_1 \cdot l_2 \cdot ... l_k}$$

which is some sort of generalization of the Harmonic number.

You could describe it more compact as

$$p(x,k) = \frac{1}{N} \sum_{S \in S_{k,[x,...,N-1]}} \prod_{a \in S} \frac{1}{a}$$

where the sum is over all k-subsets $S$ in $S_{k,[x,...,N-1]}$, the set of all k-subset of $[x,...,N-1]$, and the product is over all the numbers $a$ in the subset $S$. For instance a subset $\lbrace 3,5,7 \rbrace$ would represent that the frog jumped from position 10 to 7 to 5 and to 3. The probability for the frog to follow this path is $\frac{1}{10 \cdot 7 \cdot 5 \cdot 3}$.

I am not sure yet how to continue from here to obtain the final result... I imagine you could use some recursive relation.

Ben
  • 91,027
  • 3
  • 150
  • 376
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
2

Like Martijn Weterings, I tried the "compute all possibilities" approach.

At the start, the frog has $n$ choices each with $\frac{1}{n}$ probability. After that, the choices remaining depend on the initial choice. But the set of the probabilities of the remaining steps is easy enough to see: it's the reciprocals of the Power Set on $\{1,...,n-1\}$.

That is, for $n=3$, the probabilities of each step are (in reciprocal):

{ 3 } -- one jump of 3
{ 3, 1 } -- jump of 2 (with a 1/3 probability) then a jump of 1 (with 1/1 probability)
{ 3, 2 } -- 1 then 2
{ 3, 2, 1 } -- 1 then 1 then 1

The expected value of these is just the size of the set divided by the product of the elements of the set.

Since each set always starts with $n$, we move it out of the summation.

The expected number of jumps to get across to the nth leaf is:

$$ \frac{1}{n} \sum_{\textbf{x}\in{\mathbb{P}(\{1,...,n-1\})}} \frac{|\textbf{x}|+1}{\prod \textbf{x}} $$

I'm not sure what approach can be used to simply this form into the $$\sum_{k=1}^n \frac{1}{k}$$ form, but the equivalence of the two checks for the $n$ I've tried (2,3,10,20)