In this post we’ll discuss the PageRank algorithm. The problem it tries to solve is ranking web pages based on their relevance. The general idea is that links to a page represent a good indicator to its relevance. It models the entire web as a giant directed graph, where links represent arcs, and then perform random walks following trough the links and assigns a score to each page based of the probability of landing at it.

This algorithm was developed by Sergey Brian and Larry Page during their PhD in Stanford. Besides the obvious reference to ranking of pages, PageRank also plays with the last name of Larry [1].

### Theory

The basic modeling of the web as a graph is simple. The set of vertices is composed by each web page (document), and an edge if the page corresponding to has a link the page corresponding to . Let be the number of nodes.

The rank of each node represents the probability of being at a node at a given time if we are to perform a random walk in the graph. We can start at any node with equal probability, so the rank of each node starts as .

At each node, we can change to any one the adjacent nodes with equal probability, so this graph is essentially a Markov chain. We can represent the rank of a given node at time t through the following equation:

(1)

Where is the out-degree of vertex .

One limitation of this simple approach is that nodes that do not have outbound links will “drain” all the probability, because once we arrive at such nodes, there’s no way out, so as time passes by, the probability of getting trapped in such nodes goes to 1.

A simple solution is to add a “teleport” probability, in which at each node we can teleport to any other node in the graph with a small probability. We control the chance of a teleport happening by the parameter . More formally, the page rank equation is now the following:

(2)

#### Matricial form

Let be the adjacency matrix corresponding to the set of arcs , that is, is a matrix, and if, and only if, . Let be normalized its values normalized by the sum of each row. That is,

We can then rewrite (1) as:

(3)

and (2) as:

(4)

where corresponds to a column vector with 1’s. Let’s simplify even further. Since for each iteration t, Pr is a probability distribution (that is, ), then we can rewrite (2) as:

(5)

We can now define

where is a matrix with all ones. With that in mind, we can re-write (5) simply as:

(6)

In the next section we’ll show has some interesting properties that guarantees the page rank converges to a final value. Using that information, we’ll be able to iterate until .

#### Properties of the transition matrix

**is stochastic**. A matrix is said (row or column) stochastic if the sum of each column or row is 1. For a given column of there are entries summing to 1 and which are multiplied by and also entries that sum to and multiplied by .

**is irreducible**. A matrix is reducible if it can be transformed into a upper-diagonal matrix by rows/columns permutation. In the context of Markov chains, the transition matrix is irreducible if there is a non-zero probability of transitioning (even if in more than one step) from any state to any other state [3].

Alternatively, we can show a matrix is not reducible by the following procedure [2]: Construct a directed graph where iff . Then, is irreducible iff is strongly connected. Because we’re adding the teleport links, we can show is a complete graph and clearly strongly connected.

**is aperiodic**. If a matrix is such that , we say it has period . A matrix with maximum period 1 is said aperiodic. We can compute the period of a matrix by constructing the same directed graph as above. If is irreducible, then the period is the greatest common divisor of the length of the closest path for a given node. Since by adding the teleport links we also add self-arcs (i.e. there’s a chance of staying in the same node), there’s always a path of length 1, so the GCD is always 1, and thus is the period.

**is positive recurrent**. Let be the probability a node will be first visited at time if we start from the state . Let be the expected value of for a given and .

We say a matrix is *recurrent* if for all , that is, we’re sure every node will be eventually visited no matter where we start. A matrix is *positive-recurrent* if , that is, the expected number of steps for a random walk to come back to the state if started is not infinite.

If a Markov chain has a finite set of states (i.e. the vertices of the graph) and its transition matrix is irreducible, then we can show this transition matrix is also positive recurrent [4].

With these properties, we're ready to throw in the **Fundamental theorem of Markov Chains** [5]:

For any irreducible, aperiodic, positive-recurrent Markov chain, there’s a unique stationary distribution such that

This also means the final rank is an eigenvector with a corresponding eigenvalue , since

A similar but more general result is Perron Frobenius theorem.

### Conclusion

I’ve been watching the Mining Massive Data Sets course in Coursera. It hasn’t finished yet, but I found the idea of the PageRank pretty interesting from the mathematical point of view. I didn’t do any actual coding here, but I’m planning to revisit this subject when I learn more about the Map reduce paradigm.

### References

[1] Wikipedia: Page Rank Theorem

[2] Wikipedia: Perron Frobenius Theorem

[3] Wikipedia: Irreducibility (mathematics)

[4] Mathematics Stack Exchange: Irreducible, finite Markov chains are positive recurrent

[5] The Fundamental Theorem of Markov Chains: Aaron Plavnick