The PageRank algorithm

In this post we’ll discuss the PageRank algorithm. The problem it tries to solve is ranking web pages based on their relevance. The general idea is that links to a page represent a good indicator to its relevance. It models the entire web as a giant directed graph, where links represent arcs, and then perform random walks following trough the links and assigns a score to each page based of the probability of landing at it.

This algorithm was developed by Sergey Brian and Larry Page during their PhD in Stanford. Besides the obvious reference to ranking of pages, PageRank also plays with the last name of Larry [1].



The basic modeling of the web as a graph G = (V, A) is simple. The set of vertices V is composed by each web page (document), and an edge (u, v) \in A if the page corresponding to u has a link the page corresponding to v. Let N = |V| be the number of nodes.

The rank of each node represents the probability of being at a node at a given time if we are to perform a random walk in the graph. We can start at any node with equal probability, so the rank of each node starts as rank(i) = 1/N.

At each node, we can change to any one the adjacent nodes with equal probability, so this graph is essentially a Markov chain. We can represent the rank of a given node v at time t through the following equation:

(1) Pr(v,t) = \sum_{(u, v) \in V} \dfrac{Pr(u, t-1)}{d(u)}

Where d(u) is the out-degree of vertex u.

One limitation of this simple approach is that nodes that do not have outbound links will “drain” all the probability, because once we arrive at such nodes, there’s no way out, so as time passes by, the probability of getting trapped in such nodes goes to 1.

A simple solution is to add a “teleport” probability, in which at each node we can teleport to any other node in the graph with a small probability. We control the chance of a teleport happening by the parameter \beta. More formally, the page rank equation is now the following:

(2) Pr(v,t) = \beta \left(\sum_{(u, v) \in V} \dfrac{Pr(u, t-1)}{d(u)} \right) + \dfrac{(1 - \beta)}{N}

Matricial form

Let M be the adjacency matrix corresponding to the set of arcs A, that is, M is a N \times N matrix, and M_{(i, j)} = 1 if, and only if, (i, j) \in A. Let M' be M normalized its values normalized by the sum of each row. That is,

M'_{(i, j)} = \dfrac{M_{(i, j)}}{d(i)}

We can then rewrite (1) as:

(3) Pr(t) = {M'}^T Pr(t - 1)

and (2) as:

(4) Pr(t) = \beta {M'}^T Pr(t - 1) + \dfrac{(1 - \beta)}{N} \vec{1}

where \vec{1} corresponds to a column vector with 1’s. Let’s simplify even further. Since for each iteration t, Pr is a probability distribution (that is, \sum_{v \in V} Pr(v, t) = 1), then we can rewrite (2) as:

(5) Pr(v,t) = \beta \left(\sum_{(u, v) \in V} \dfrac{Pr(u, t-1)}{d(u)} \right) +
\dfrac{(1 - \beta)}{N} \left(\sum_{v \in V} Pr(v, t-1) \right)

We can now define \hat{M} = \beta {M'}^T + \dfrac{(1 - \beta) E}{N}

where E is a N \times N matrix with all ones. With that in mind, we can re-write (5) simply as:

(6) Pr(t) = \hat{M} Pr(t-1)

In the next section we’ll show \hat{M} has some interesting properties that guarantees the page rank converges to a final value. Using that information, we’ll be able to iterate until |Pr(t) - Pr(t - 1)| < \epsilon.

Properties of the transition matrix

\hat{M} is stochastic. A matrix is said (row or column) stochastic if the sum of each column or row is 1. For a given column of \hat{M} there are d(i) entries summing to 1 and which are multiplied by \beta and also N entries that sum to N and multiplied by \frac{(1 - \beta)}{N}.

\hat{M} is irreducible. A matrix is reducible if it can be transformed into a upper-diagonal matrix by rows/columns permutation. In the context of Markov chains, the transition matrix is irreducible if there is a non-zero probability of transitioning (even if in more than one step) from any state to any other state [3].

Alternatively, we can show a matrix M is not reducible by the following procedure [2]: Construct a directed graph G(M) where (i,j) \in A iff M_{(i,j)} > 0. Then, M is irreducible iff G(M) is strongly connected. Because we’re adding the teleport links, we can show G(\hat{M}) is a complete graph and clearly strongly connected.

\hat{M} is aperiodic. If a matrix is such that M^{p} = M, we say it has period p. A matrix with maximum period 1 is said aperiodic. We can compute the period of a matrix M by constructing the same directed graph as above. If M is irreducible, then the period is the greatest common divisor of the length of the closest path for a given node. Since by adding the teleport links we also add self-arcs (i.e. there’s a chance of staying in the same node), there’s always a path of length 1, so the GCD is always 1, and thus is the period.

\hat{M} is positive recurrent. Let f_{ij}^{(m)} be the probability a node j will be first visited at time m if we start from the state i. Let \mu_{ij} be the expected value of m for a given i and j.

We say a matrix is recurrent if \sum_{m=1}^{\infty} f_{ij}^{m} = 1 for all i, j, that is, we’re sure every node will be eventually visited no matter where we start. A matrix is positive-recurrent if \mu_{ii} < \infty, that is, the expected number of steps for a random walk to come back to the state if started is not infinite.

If a Markov chain has a finite set of states (i.e. the vertices of the graph) and its transition matrix is irreducible, then we can show this transition matrix is also positive recurrent [4].

With these properties, we're ready to throw in the Fundamental theorem of Markov Chains [5]:

For any irreducible, aperiodic, positive-recurrent Markov chain, there’s a unique stationary distribution \pi such that

\pi^{(t + 1)} = \pi^{(t)}

This also means the final rank is an eigenvector with a corresponding eigenvalue \lambda = 1, since

\lambda Pr =  \hat{M} Pr

A similar but more general result is Perron Frobenius theorem.


I’ve been watching the Mining Massive Data Sets course in Coursera. It hasn’t finished yet, but I found the idea of the PageRank pretty interesting from the mathematical point of view. I didn’t do any actual coding here, but I’m planning to revisit this subject when I learn more about the Map reduce paradigm.


[1] Wikipedia: Page Rank Theorem
[2] Wikipedia: Perron Frobenius Theorem
[3] Wikipedia: Irreducibility (mathematics)
[4] Mathematics Stack Exchange: Irreducible, finite Markov chains are positive recurrent
[5] The Fundamental Theorem of Markov Chains: Aaron Plavnick

Lawler and an Introduction to Matroids


Eugene Lawler was an American computer scientist, professor of UC Berkeley and was one of the founders of the field of Combinatorial Optimization.

Lawler made important contributions on branch and bound algorithms, dynamic programming, and was also the first one to observe that matroid intersection could be done in polynomial time.

He also proved that two of Karp’s 21 NP-Complete problems, The Directed Hamiltonian Cycle and 3-Dimensional Matching were NP-Complete.

This is the first post in our series of Matroids in Combinatorial Optimization context. They will be mainly based on the Matroids and the Greedy Algorithm chapter from Combinatorial Optimization – Networks and Matroids, by Eugene Lawler.


Hassler Whitney developed Matroids in 1935 in the context of algebraic theory and it was further applied by Jack Edmonds in the context of combinatorial optimization.

Matroids are a structure that allows us to solve problems by always taking local optimal steps and by doing that there’s the guarantee we’ll reach the global optimum. These types of algorithms are known as greedy algorithms.

First we’ll define Matroids and then give some examples of problems to modeled as matroids. Next, we’ll introduce weighted matroids and describe a generic algorithm to solve them and how such algorithm applied to matroids in graphs is actually the famous Kruskal algorithm.


Let E be a set of elements and \cal I a family of subsets of E (family here has the meaning of a set of elements that share some properties and it’s also clearer than using ‘set of subsets’). Let M = (E, \cal I) and consider the following properties:

1) \emptyset \in \cal I and if a subset I \in \cal I, then all subsets of I belong to \cal I as well.

2) If I_p \in {\cal I} with |I_p| = p and I_{p+1} \in {\cal I} with |I_{p+1}| = p+1, then there is an element e \in I_{p+1} \setminus I_{p} such that I + e \in {\cal I}. (Henceforth, by abuse of notation, when we say I + e we mean I \cup \{e\}).

If M satisfies both (1) and (2), we say that M is a matroid.

Terminology. An independent set is any set I belonging to the family \cal I. If there’s no other set containing I in \cal I we say it’s a maximal independent set. Note that by property (2), all maximal independent sets have the same size.

The rank of a set E, denoted by r(E) is the largest independent subset of E. A minimal dependent set is called circuit.

Special types of matroids

We can model some structures as matroids and take advantage of matroids properties to find solutions to problems involving these structures. In this section we’ll present a few examples of such modelings.

The modelling process consists in defining the set E, the family \cal I (usually by defining the property that the subsets of E must have to be in there) and then proving that \cal I satisfies (1) and (2).

Matric Matroid.

A matric matroid is a matroid in the context of matrices. Given a matrix A with the set of columns as E, if \cal I is the family of sets containing only linear independent columns, then M = (E, {\cal I}) is a matric matroid.

Proof. It’s easy to see that any subset of a linear independent (LI) set of columns is also LI, so (1) is straightforward. For (2), we need to prove that given subsets I_{p+1} and I_{p} of p+1 and p LI columns, respectively, then there must be a column c from I_{p+1} \setminus I_{p} such that I_{p} + c is still LI. Well, if it’s not the case, it means that each column in I_{p+1} can be written as linear combination of the p LI columns from I_{p}, which contradicts the fact that I_{p+1} is LI.

Graphic Matroid.

A graphic matroid is a matroid in the context of graphs. Given a graph G = (V, E), if \cal I is the family of arcs that do not form a cycle, then M = (E, {\cal I}) is a graphic matroid.

Proof. It’s possible to show that subsets of edges that are cycle free, correspond to LI columns in the incidence matrix of G, so in this case we can also view M as a matric matroid of the incidence matrix of G.

Matching Matroid.

A matching matroid is a matroid in the context of matchings in graphs.

Let G = (V, E) be an undirected graph and let \cal I be the family of subsets of nodes I \subseteq V such that there’s a matching in G that covers all nodes in I (we say that an node in I is covered if there’s an edge in the matching incident to it, even if the other end of the edge is not in I). Then M = (V, {\cal I}) is a matching matroid.

Sketch of the Proof. It’s easy to verify that \cal I satisfies (1). The proof of why it satisfies (2) consists in getting the symmetric difference between matchings covering p+1 and p vertices respectively and showing that in that difference there will be at least one alternating path such that if we change the alternation will increase the number of matched vertices.

Weighted Matroids

Let M = (E, {\cal I}) be a matroid and w a weight function for elements in E. The problem we want to solve is to find the independent set I \in {\cal I} that has maximum weight, where the weight of a set is the sum of the weight of its elements.

Let’s assume the elements on each independent set are listed in the non-increasing order of weight, so given sets I_1 and I_2, we list them as

I_1 = \{a_1, a_2, \cdots, a_m\} and I_2 = \{b_1, b_2, \cdots, b_n\}

such that w(a_1) \ge w(a_2) \ge \cdots w(a_m) and w(b_1) \ge w(b_2) \ge \cdots w(b_n)

We say I_1 is lexicographically greater than I_2 if for the first position their components differ, say at index k, a_k > b_k or in case I_2 listing is a prefix of I_1 listing (that is, the same way we sort strings).

A set that is not lexicographically less than any other set is said to be lexicographically maximum. We can show such set is also a maximum independent set, because otherwise, by property (2), we can always add more elements to it, making it lexicographically greater.

The following Theorem from Rado and Edmonds states an important property of weighted matroids:

Theorem 1. Let M = (E, {\cal I}) be a matroid. Then

3) For any negative weight function on E, a lexicographically maximum set in \cal I is also the set with the maximum weight.

Conversely, given E and \cal I, if (1) and (3) are satisfied, M = (E, {\cal I}) is a matroid.

We say that a set B \in {\cal I} is Gale optimal if all its components are not less than the corresponding components of any other set I \in {\cal I} when these components are listed in non-increasing order. More formally, there exists a one-to-one mapping h:I \rightarrow B such that w(e) \le w(h(e)) for all e \in I.

Note that a Gale optimal set is clearly a lexicographically maximum and thus has optimal weight.

Given that, Gale’s theorem provides another a stronger result than Theorem 1 regarding weighted matroids:

Theorem 2. Let M = (E, {\cal I}) be a matroid. Then

4) For any weight function of elements on E, there exists a Gale optimal set B \in {\cal I}.

Conversely, given E and \cal I, if (1) and (4) are satisfied, M = (E, {\cal I}) is a matroid.

Weighted Matroid Greedy Algorithm. Property (4) allows to use a simple greedy algorithm to find the set of the largest weight in \cal I.

In the first step, we look for the single-element set I = \{e_0\} in \cal I with the largest weight. By property (2) and (4), we can show that the Gale optimal set contains e_0. Next, we look for the largest element e_1 such that \{e_0, e_1\} \in {\cal I}. Again, we can show that such elements are contained in the Gale optimal set. We repeat this until we get a maximum independent set, which is also the Gale optimal set.

More generically, we have the following algorithm:

Let S be our current solution, which starts as the empty set. At each step, we look for the element e with maximum weight not in S such that S + e belongs to \cal I.

The Maximum Spanning Tree problem and the Kruskal Algorithm.

In the maximum spanning tree problem we are given a connected, undirected graph G = (V,E) with non-negative weights w on V and we want to find a spanning tree (subset of E that forms a tree) with the lowest cost.

Recall that a tree is a connected graph without cycles. The graphic matroid represents cycle-free subsets of arcs (or a forest) and thus the maximum independent set of a connected graph is a tree. If we assign non-negative weights to arcs, we can find the optimal Gale independent set which corresponds to the maximum spanning tree.

The Kruskal algorithm is then basically an application of the weighted matroid greedy algorithm for graphic matroids: It first sorts the edges by non-increasing order of weight and then adds an edge to the current solution if it doesn’t form a cycle.


In this post we learned the basics of matroids and weighted matroids. In future posts we’ll study Matroid Intersection, Matroid Partition and other types of matroids like Oriented Matroids.


[1] Wikipedia – Eugene Lawler
[2] Combinatorial Optimization – Networks and Matroids, Eugene Lawler – Chapter: Matroids and the Greedy Algorithm

Totally Unimodular Matrix Recognition

Paul Seymour is an english mathematician, graduated from Oxford. He is currently teaching at Princeton.

His research area concentrates on discrete mathematics, where he obtained important results including in Regular Matroids, Totally Unimodular Matrices and the Four Color Theorem, being awarded the Fulkerson Prize four times.

In this post we’ll present one of his results regarding decomposition of totally unimodular matrices, which are the key piece in deciding whether a given matrix is TU.

This is the third post about Totally Unimodular (TU) matrices. We introduced them here and described a special case called Network matrices here.

Although it’s not true that all TU matrices are network matrices, Seymour’s theorem basically says that all TU matrices are some kind of combination of network matrices and the following matrices:

(1) \left[ \begin{array}{rrrrr}   1 & -1 & 0 & 0 & -1\\  -1 & 1 & -1 & 0 & 0\\  0 & -1 & 1 & -1 & 0\\  0 & 0 & -1 & 1 & -1\\  -1 & 0 & 0 & -1 & 1\\  \end{array} \right]

(2) \left[ \begin{array}{rrrrr}   1 & 1 & 1 & 1 & 1\\  1 & 1 & 1 & 0 & 0\\  1 & 0 & 1 & 1 & 0\\  1 & 0 & 0 & 1 & 1\\  1 & 1 & 0 & 0 & 1\\  \end{array} \right]

It’s possible to show that TU matrices are closed under the following operations:

(i) permuting rows or columns
(ii) taking the transpose
(iii) multiplying a row or column by -1
(iv) pivoting, that is, transforming

\left[ \begin{array}{cc}   \xi & c\\  b & D\\  \end{array} \right]


\left[ \begin{array}{cc}   -\xi & -\xi c\\  \xi b & D - \xi b c \\  \end{array} \right]

where \xi is a scalar and c and b are a row and a column vector of the appropriate sizes, respecitvely.

(v) adding a row or column with at most one non-zero entry
(vi) repeating a row or a column

Also, given TU matrices A and B, the following operations preserve total unimodularity:

(vii) A \oplus_1 B := \left[ \begin{array}{rr}   A & 0\\  0 & B\\  \end{array} \right]

(viii) \left[ \begin{array}{rr}   A & a\\   \end{array} \right] \oplus_2   \left[ \begin{array}{r}   b\\  B\\   \end{array} \right] :=  \left[ \begin{array}{rr}   A & ab\\  0 & B\\  \end{array} \right]

(ix) \left[ \begin{array}{rrr}   A & a & a\\  c & 0 & 1\\   \end{array} \right]     \oplus_3    \left[ \begin{array}{rrr}   1 & 0 & b\\  d & d & B\\   \end{array} \right]    :=    \left[ \begin{array}{rr}   A & ab\\  dc & B\\  \end{array} \right]

We can now state Seymour’s theorem:

Theorem 1. (Seymour’s decomposition theorem for totally unimodular matrices). A matrix A is totally unimodular if and only if A arises from network matrices and the matrices (1) and (2) by applying the operations (i) to (ix). Here, the operations (vii) to (ix) are only applied if for A and B, the number of rows and columns added is at least 4.

Recognizing Total Unimodularity

The algorithm for recognizing whether a given matrix M is TU consists in finding whether it is a network matrix or one the matrix (1) or (2).

1. If any of the entries of M is not in \{-1, 0, 1\}, then M is not TU.

2. Remove all rows and columns with one or less non-zero entries (Property (v)).

3. Remove repeated rows and columns (Property (vi)).

4. Test if M or its transpose M^T is a network matrix or (1) or (2), possibly permuting and multiplying rows and columns by -1 (Properties (i), (ii) and (iii)). If yes, then the matrix is TU.

At this point of the algorithm, we still don’t know whether our matrix is TU. We’ll now check whether M can be decomposed as follows:

(3) M = \left[ \begin{array}{rr}   A & B\\  C & D\\   \end{array} \right]

such that \mbox{rank}(B) + \mbox{rank}(C) \le 2 and for both A and D, the number of rows plus the number of columns is greater or equal than 4.

Cunningham and Edmonds stated it’s possible to find such a decomposition for a matrix M or conclude no such decomposition exists in polynomial time using the matroid intersection algorithm.

Going back to the algorithm:

5. If M has cannot be decomposed as (3), then it’s not TU.

6. If it can, then we break into cases depending on the values of \mbox{rank}(B) and \mbox{rank}(C). Given that \mbox{rank}(B) + \mbox{rank}(C) \le 2, we have the 6 possible combinations:

\begin{array}{l|c|c}   \mbox{Case} & \mbox{rank}(B) & \mbox{rank}(C)\\  \hline  1 & 0 & 0 \\  2 & 1 & 0 \\  3 & 0 & 1 \\  4 & 1 & 1 \\  5 & 2 & 0 \\  6 & 0 & 2 \\  \end{array}

Case 1:

A matrix with rank zero is the zero matrix, so we have

M = \left[ \begin{array}{rr}   A & 0\\  0 & D\\   \end{array} \right]

By Property (vii), if A and D are TU, M is TU. Since A and D are submatrices of M, the converse holds, so it suffices to recursively verify that A and D are TU.

Case 2:

Since B has rank 1, it has all duplicate rows/columns or rows/columns with all zeroes. Thus, we can write it as B = fg, where f is a column vector and g is a row vector. In this form, we can write M as the result of the 3-sum operation:

\left[ \begin{array}{rr}   A & f\\   \end{array} \right] \oplus_2   \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] :=  \left[ \begin{array}{rr}   A & fg\\  0 & D\\  \end{array} \right]

From Property (viii), if \left[ \begin{array}{rr}   A & f\\   \end{array} \right] and \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] are TU, M is TU. Also, since both are submatrices of M, if one of them is not TU, then M can’t be TU either, so we can test \left[ \begin{array}{rr}   A & f\\   \end{array} \right] and \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] recursively.

Case 3: Analogous to Case 2.

Case 4:

In this case, both B and C have all duplicate rows/columns or rows/columns with all zeroes. We can re-arrange the rows and columns in such a way that M looks like this:

\left[ \begin{array}{rrrr}   A_1 & A_2 & 0 & 0\\  A_3 & A_4 & 1 & 0\\  0 & 1 & D_1 & D_2\\  0 & 0 & D_3 & D_4\\  \end{array} \right]


A := \left[ \begin{array}{rr}   A_1 & A_2\\  A_3 & A_4\\  \end{array} \right]


D := \left[ \begin{array}{rr}   D_1 & D_2\\  D_3 & D_4\\  \end{array} \right]

We define a bipartite graph G_A with one partition being the set of rows of A and the other partition the set of columns of A. There’s an edge between two vertices r and c if the entry (r,c) is non-zero in A.

Let R be the set of vertices of the row partition that intercepts A_4 and K those of the column partition that intercepts A_4. If there’s no path between R and K, then we can reduce to Cases 2 and 3, so without loss of generality, consider the shortest path \prod_A between R and K.

Since R and K are in different partitions, \prod_A has odd length \delta. Let

\xi_A = \left\{ \begin{array}{rr}   1 & \mbox{if } \delta \equiv 1\, (\mbox{mod } 4)\\  -1 & \mbox{if } \delta \equiv 3\, (\mbox{mod } 4)\\  \end{array} \right.

and \xi_D defined analogously.

(4) \left[ \begin{array}{rrr}   A_1 & A_2 & 0\\  A_3 & A_4 & 0\\  0 & 1 & \xi_D  \end{array} \right]

(5) \left[ \begin{array}{rrr}   \xi_A & 1 & 0\\  1 & D_1 & D_2\\  0 & D_3 & D_4  \end{array} \right]

The algorithm says that M is TU if and only if (4) and (5) are TU.

Case 5: Can be reduced to Case 4 by pivoting (Property iv).

Case 6: Analogous to Case 5.

Complexity and Implementation

In [2], Truemper presents a O(n + m)^3 algorithm for the total unimodularity testing and in [3] he Walter and Truemper provides an implementation in C++ of a simplified O(n + m)^5 version.


The matroid intersection algorithm is used in one of the steps of the algorithm. I’ve heard about this algorithm before, but never studied it, so this will the next subject I’m researching about in the series of posts on Combinatorial Optimization.


[1] Theory of Linear and Integer Programming – A. Schrijver (Chapter 20)
[2] A decomposition theory for matroids. V. Testing of matrix total unimodularity – K. Truemper
[3] Implementation of a Unimodularity Test – M. Walter and K. Truemper

Totally Unimodular Matrices

An unimodular matrix is a square matrix with integer entries such that its determinant is either -1, 0 or 1. A matrix is said totally unimodular (TU for short) if all its square submatrices are unimodular.

Sometime ago, we said that problems such as the minimum path, maximum flow and minimum cost max flow can be modeled using linear programming with the interesting property that the optimal solutions are always integer.

In that post, we also said that it was because the coefficient matrix of the constraints of the formulations are totally unimodular. More formally, we have the following theorem:

Theorem 1. Let A be a totally unimodular matrix and b be an integer vector. Then, the polyhedra P = \{x | Ax \le b\} has integer vertices.

In this post we’ll present some properties of TU matrices and discuss about two simple examples.


Let A be a totally unimodular matrix. Then we have the following properties:

  1. Its transpose, A^{T}, is TU.
  2. The matrix obtained by appending the identity matrix to A, that is [A, I], is TU
  3. Any submatrix of A is TU
  4. The matrix obtained by multiplying any row of A by -1 is TU
  5. The matrix obtained by duplicating any row of A is TU

Using this properties we can get some Corollaries from Theorem 1.

Since [A, -I] is TU, we have the following

Corollary 1. The polytope P = \{x | Ax \le b; x \ge 0 \} has integer vertices.

Also, since [A^T, -A^T, I, -I] is TU,

Corollary 2. The dual of P = \{c^Tx | Ax \le b; x \ge 0 \}, namely Q = \{b^Ty | A^Ty \ge c; y \ge 0\} has also integer vertices.


1. Bipartite Graphs

Let G = (V, E) be an undirected graph and M the incidence matrix of G. That is, a binary matrix where each line corresponds to a vertex v and each column to an edge e. We have M_{v,e} = 1 if v is an endpoint of e or M_{v,e} = 0 otherwise. Then, we have the following result:

Theorem 2. The incidence matrix of a graph G is totally unimodular if and only if, G is bipartite.

This result can be used to derive the König-Egerváry theorem, stating that the maximum cardinality matching and the minimum vertex cover have the same value bipartite graphs.

The maximum cardinality can be modeled as integer linear programming:

\max \sum_{e \in E} y_e

\begin{array}{llclr}    & \sum_{e = (u, v)} y_e & \le & 1 & \forall v \in V\\   & y_e & \in & \{0, 1\} & \forall e \in E  \end{array}

And its dual is the minimum vertex cover:

\min \sum_{v \in V} x_v

\begin{array}{llclr}    & x_u + x_v  & \ge & 1 & \forall (u,v) \in E\\   & x_v  & \le & \{0, 1\} & \forall v \in V  \end{array}

It’s not hard to see that if M is the incidence matrix of the graph, then the problems can be stated as

(1) \max \{1y | My \le 1; y \mbox{ binary} \} and

(2) \min \{x1 | xM \ge 1; x \mbox{ binary} \}

If the graph is bipartite, we can use Theorem 2 and the strong duality for linear programs to conclude that (1) = (2).

2. Directed Graphs

Let D = (V, A) a directed graph, and M be the incidence matrix of D. That is, a matrix where each line corresponds to a vertex and each column to an arc. For each arc e = (u, v), we have M_{u, e} = -1 and M_{v, e} = 1 and 0 otherwise. For directed graphs, we have a even stronger result:

Theorem 3. The incidence matrix of a directed graph D is totally modular.

Consider a network represented by D and with capacities represented by c : A \rightarrow \mathbb{R}_{+}. For each directed edge (ij) \in A, let x_{ij} be the flow in this edge.

If M is the incidence matrix of D, then Mx = 0 corresponds to

\begin{array}{llclr}   (3) & \sum_{(iv) \in A} x_{iv} & = & \sum_{(vi) \in A} x_{vi} & \forall v \in V\\  \end{array}

which are the flow conservation constraints. Since M is totally unimodular, then if c is integer, it’s possible to show that the polytope \{x| 0 \le x \le c; Mx = 0\} has integral vertices. This polytope also represents the constraints of the max circulation problem.

Now, we can use these observations to prove that the following LP formulation for the max-flow problem has an optimal integer solution:

\max \sum_{(si) \in A} x_{si}

Subject to:

\begin{array}{llclr}   & \sum_{(iv) \in A} x_{iv} & = & \sum_{(vi) \in A} x_{vi} & \forall v \in V \setminus \{s, t\}\\  0 \le & x_{ij} & \le & c_{ij} & \forall (ij) \in A\\  \end{array}

We can see that the constraints matrix of the above formulation is a submatrix of the max circulation problem and by Property 3, it’s also TU, which in turn means the corresponding polytope has integral vertices.


In this post, we introduced the concept of total unimodular matrices and presented two simple examples: the incidence matrix of a bipartite graph and the incidence matrix of a directed graph.

Here’s a cool chart of common polynomial time solvable problems organized by their generality [2].

In future posts, we’ll keep exploring this subject by studying other examples and properties of TU matrices.


[1] Theory of Linear and Integer Programming – A. Schrijver
[2] Marco Chiarandini – Scheduling, Timetabling and Routing DM204 (Lecture Notes)
[3] Ted Ralphs – Integer Programming IE418 (Lecture Notes)