Computationally, it's easy to see that they're equivalent. If your data are $y_1, y_2, y_3$, and your states are $x_1, x_2, x_3$, and if you are putting resampling at the beginning of each step (except the first step, because you can't), then your algorithm would basically look like S(RS)(RS)
where S
stands for sample, and R
stands for resample.
If you followed the convention where resampling comes at the end of every step, it would go something like this: (SR)(SR)(S-)
. You don't resample at the very end, because it just adds noise (resampling is only beneficial for future observations).
So they're the same if you just re-arrange the parentheses. This is just a guess, but the first description is probably easier to describe, because you don't need an extra if
condition for the last iteration. I usually follow the second convention, myself.
What matters is what observations you're using to adjust weights. For example, in this book, in chapter 9.4, they describe separately the sequential importance sampling with resampling (SISR) algorithm and something they call the "IID sampling" algorithm. They are compared on the basis of when resampling is performed; the first resamples second, and the second resamples first. However, what's important is that the second algorithm reweights at time $t$ based on observations from $t+1$ prior to sampling a time $t+1$ particle. On the other hand, SISR samples from time $t$ to $t+1$, and then resamples based afterwards based on weights that use the observation at time $t+1$. It turns out SISR has greater asymptotic variances than those had by IID sampling.
This is the motivation for selection of any proposal distribution in this class of algorithms: taking into account future observations is generally a good idea, whether you're using it to resample prior to mutating, or whether you're doing that to help choose a proposal distribution (e.g. using a locally optimal proposal, or approximating it with the auxiliary particle filter).