Explanation on Google’s PageRank is Webpages as Eigenvectors

Question

Help understand what is the matrix A and the vector x discussed below.

Mathematics for Machine Learning Example 4.9

Google uses the eigenvector corresponding to the maximal eigenvalue of a matrix A to determine the rank of a page for search. The idea for the PageRank algorithm, developed at Stanford University by Larry Page and Sergey Brin in 1996, was that the importance of any web page can be ap- proximated by the importance of pages that link to it. For this, they write down all web sites as a huge directed graph that shows which page links to which. PageRank computes the weight (importance) x i > 0 of a web site a i by counting the number of pages pointing to a i . Moreover, PageR- ank takes into account the importance of the web sites that link to a i . The navigation behavior of a user is then modeled by a transition matrix A of this graph that tells us with what (click) probability somebody will end up on a different web site. The matrix A has the property that for any ini- tial rank/importance vector x of a web site the sequence $x, Ax, A^2x$, . . . converges to a vector $x*$ . This vector is called the PageRank and satisfies Ax∗ = x∗ , i.e., it is an eigenvector (with corresponding eigenvalue 1 ) of A . After normalizing $x*$ , such that ||$x*$|| = 1 , we can interpret the entries as probabilities. More details and different perspectives on PageRank can be found in the original technical report (Page et al., 1999).

I suppose A is a linking from a source web page FROM which has a link(s) to another page TO. Is x like a one-hot-encoding of a web page to identify a specific page, then $Ax$ will result in multiple pages? Then the next step $A(Ax)$ will be multiple concurrent steps?

A web page is an eigenvector means that the matrix A has as many eigenvectors as the web pages existing in the world?

It sounds like they're treating page navigation as a markov process. You go to one page and randomly click a link to the next. If you do this for a long time, you converge to a distribution of what page you're on. This distribution is represented by the eigenvector with largest eigenvalue. So this eigenvector has an entry for each page. So the entry with the largest value would be the highest ranked page because it's the one you're most likely to be visiting in the long run. Watch this https://www.youtube.com/watch?v=lGGDIGizcQ0&pbjreload=101 — MathFoliage, Feb 05 '21 at 06:51

MathFoliage · Answer 1 · 2021-02-10T02:54:48.913

If I'm interpreting the excerpt you included correctly, $A$ is a Markov matrix. The wiki article for this calls it a Stochastic matrix: https://en.wikipedia.org/wiki/Stochastic_matrix

Each entry $A_{ij}$ contains the probability of transitioning (clicking) from state (page) $i$ to state $j$.

Because you must transition to some state (this includes remaining in the same state), the values in each column of $A$ must sum to 1, since each column represents a probability distribution.

Yes, if $x$ is a one-hot encoding of a web page you start on, then $Ax$ will extract the corresponding column of $A$, and that will represent the probability distribution of where you'll be after randomly clicking once. Then $A(Ax)$ after randomly clicking again, and so on.

In most cases, $A^nx$ will converge to $x^*$, the eigenvector corresponding to the largest eigenvalue of $A$.

But this doesn't sound quite right

"After normalizing ∗ , such that ||∗|| = 1, we can interpret the entries as probabilities"

The vector $\left[\sqrt2/2, \sqrt2/2\right]^T$ has length 1 but does not represent a probability distribution. Maybe they meant that the vector should be normalized in the sense of dividing the entries by the total. But if $x$ has this property, then so will $Ax$, so there wouldn't be a need if you started out with a valid probability distribution.

"A web page is an eigenvector means that the matrix A has as many eigenvectors as the web pages existing in the world?"

If $A$ is diagonalizable, which AFAIK it usually will be, then there will be as many eigenvectors as web pages. However, I don't know of an interpretation associating to each web page a particular eigenvector one-to-one.

It's more like, there will be as many stationary distributions as there are web pages. But if the multiplicity of the top eigenvalue is 1, which should be typical, then only one unique stationary distribution will be stable.

So you take that dominant eigenvector that represents the dominant stationary distribution, and choose to interpret the probability of each page as a ranking of importance. If I randomly click between pages for a long time, then the page I visit most often in the long run (the page I'm most probably currently on at any one given time) is the highest ranked.

this Gilbert Strang lecture largely covers all this: https://www.youtube.com/watch?v=lGGDIGizcQ0&pbjreload=101

Explanation on Google’s PageRank is Webpages as Eigenvectors

1 Answers1