6

In this blog figure 4 shows that the principal components of a random walk are sinusoidal with increasing frequency for decreasing eigenvalue. Is there an intuitive way of understanding this?

If I generate two independent random walks with a mean of zero, why is there a covariance (or correlation) between these trajectories in the first place?

McLawrence
  • 263
  • 1
  • 8

1 Answers1

5

I actually recently wrote a paper on this subject which will appear at NIPS 2018: https://arxiv.org/abs/1806.08805

My collaborator and I proved that in the limit of an infinite number of dimensions the projection of a random walk onto any PCA component is a sinusoid. You are welcome to read the paper for the proof, but perhaps I can attempt a slightly more intuitive explanation.

The random walk process is a translation invariant process. No matter how many steps you take and how far away you get from the origin, the process that determines the next step is exactly the same. (This is in contrast to an Ornstein-Uhlenbeck process, for example, which is a random walk in a quadratic well. In this case the process that determines what your next step will be depends on your distance from the origin.)

Now, the eigenfunctions of any translation invariant operator are Fourier modes. Why is this? Well, in order to be translation invariant, the eigenfunctions need to be periodic, and in order to be an orthogonal basis, they need to be mutually orthogonal. The set of functions that satisfies these properties is the Fourier basis.

As for your second question about two independent random walks seeming to be correlated, this is just an illusion. Although the projection of the trajectory of both random walks onto their first PCA component will be a cosine, the direction of the first PCA component will be completely random. An an analogy, if you do two random walks, each of 10^6 steps, you can be quite sure that both will end up at a distance of about 1000 from the origin. But the two walks had nothing to do with each other, and furthermore the direction that the walk happened to go will be different each time.

J. O'Brien Antognini
  • 1,774
  • 1
  • 10
  • 7
  • 2
    +1. Very nice! Congratulations on having a paper accepted to NIPS. Actually your paper is more interesting than all six papers I had to review for NIPS this year :) Regarding intuition, it's worth noting that Fourier modes appear every time one does PCA of some roughly translation-invariant smooth data. One well-known example is PCA of natural images (e.g. ImageNet). But it happens quite frequently and I recall quite some confusion about it in some fields; see e.g. https://www.nature.com/articles/ng.139 for discussion. – amoeba Sep 19 '18 at 20:57
  • BTW, this has come up previously, e.g. here https://stats.stackexchange.com/questions/90484. I see that there is my answer there from several years ago, and it is very inadequate. I should edit it. Meanwhile, perhaps you'll be interested in answering that question too. – amoeba Sep 19 '18 at 20:58
  • Another related thread is https://stats.stackexchange.com/questions/255548. – amoeba Sep 19 '18 at 21:03
  • @amoeba is the idea that in a RW of high dimension, most dimensions will be approximately zero, and only a few dimensions will be large – thus, a linear combination of these few dimensions captures most of the variance? – swedishfished Aug 31 '20 at 04:04
  • @amoeba In this sense, isn't it a bit misleading to say each RW is explained by only the first few PC values (suggesting that RWs are low dimensional) since one had no way of knowing previously which of these low dimensions that the RW would take? See e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5913756/ – swedishfished Aug 31 '20 at 04:07
  • @j-obrien-antognini Nice paper! Regarding the math in your paper, I'm not clear how you derive Eq. (20), esp. the first term ($2+2\gamma+\gamma^2$). Would it be possible to elaborate? Also, (i) how do you derive that $\mathbf{M}^{−1}$ is a matrix with $1$s along the main diagonal and $- \gamma$s subdiagonal and (ii) are there some reference/citation/explanation for Eq. (17) and (18)? Thanks for your help in advance! – vbip Oct 29 '21 at 06:20
  • @j-obrien-antognini I'm asking because for Eq. (20), I am getting an additional $\gamma^2$ term ($2+2\gamma+2\gamma^2$) in some diagonal entries. – vbip Oct 29 '21 at 07:19
  • 1
    Hi @vbip! I believe you have caught a typo. I think you are correct that it should be $2(1 + \gamma + \gamma^2)$. You'll notice in the following equation (21) that the first term in the inverse of the eigenvalue is $2(1 + \gamma + \gamma^2)$ which only makes sense if the previous equation is missing that last factor of 2 on the $\gamma^2$. I went ahead and redid the math and got $2(1 + \gamma + \gamma^2)$ like you. – J. O'Brien Antognini Nov 06 '21 at 20:14
  • 1
    As for your other questions, I don't remember explicitly deriving $\textbf{M}^{-1}$, but if you try multiplying $\textbf{M}\textbf{M}^{-1}$ for a 6x6 matrix, I think you can convince yourself that it works. Basically the fact that $\textbf{M}^{-1}$ is banded Toeplitz means that you're only ever considering the effect of neighboring elements of $\textbf{M}$ on each other. You're basically always doing $\gamma^k + \gamma^{k-1} (-\gamma) = 0$, except along the diagonal (or above it, where everything is zero). – J. O'Brien Antognini Nov 06 '21 at 20:19
  • @J.O'BrienAntognini Thank you for the answer! I hope this doesn't change the rest of the proof, but it is good to know that I was not doing something stupid while trying to redo your derivations :) Also, I was wondering if you could provide some reference/citation/explanation for Eq. (17) and (18)? – vbip Nov 07 '21 at 16:08
  • 1
    No, I don't believe any of the rest of the paper would be affected. For Eqs. 17 & 18, you could check out p. 293 of Deep Learning by Goodfellow et al.: https://www.deeplearningbook.org/contents/optimization.html Alternatively, eqs. 1 & 2 of Sutskever et al.: http://www.cs.toronto.edu/~hinton/absps/momentum.pdf – J. O'Brien Antognini Nov 07 '21 at 17:20
  • @J.O'BrienAntognini What's not clear to me is whether $\mathbf{v}_t$ in Eqs. (17) and (18) are the non-normalized eigenvectors defined in Page 3? If they are, how are they related to the momentum update equations? Thank you again for the answers! – vbip Nov 08 '21 at 19:55
  • 1
    Sorry for the delay! The $\bf{v}_t$ in Eqs. 17 & 18 are unrelated to the eigenvectors defined previously. The $\bf{v}_t$ are simply a "velocity" term in the momentum update equation. $\bf{v}_t$ is initialized to 0 at time 0 and and then its value at later times is governed by Eq. 17. – J. O'Brien Antognini Nov 28 '21 at 02:21
  • @J.O'BrienAntognini Thank you for the clarification! I was having a hard time relating the $\mathbf{v_t}$ 's to the eigenvectors and this was helpful. – vbip Nov 28 '21 at 03:10