Protein Design

We previously learned about the problem of predicting the folding of a protein, that is, given a chain of amino acids, find its final 3D structure. This time we’re interested in the reverse problem, that is, given a 3D structure, find some chain of amino-acid that would lead to that structure once fully folded. This problem is called Protein Design.

In this post we’ll focus on mathematical models for this problem, studying its computational complexity and discuss possible solutions.

Mathematical Modeling

The 3D structure of a protein can be divided into two parts: the backbone and the side chains. In the model proposed by Piece and Winfree [3], we assume the backbone is rigid and that we’ll try to find the amino-acids for the side chains such that it minimizes some energy function.

This means we’re not really trying to predict the whole chain of amino acids, but a subset of those amino that will end up on the side chains.

The amino acids on the side-chain can have a specific orientation [2, 3], known as rotamer, which in turn can be represented by a single value, its dihedral angle. We can define some arbitrary order for these amino acids and label them with an index, which we call position.

At each position there are multiple rotamers possible and we need to select them to minimize the overall energy function. More formally, for each position i, let R_i be the set of rotamers available and r_i \in R_i the chosen rotamer.

The model assumes the cost function of the structure is pairwise decomposable, meaning that we can account for the interaction of each pair independently when calculating the total energy:

Screenshot from 2019-09-20 15-46-22

Where E(r_i, r_j) is the energy cost of the interaction between positions i and j, assuming rotamers r_i and r_j respectivelly. The definition of E can be based on molecular dynamics such as AMBER.

In [3], the authors call this optimization problem PRODES (PROtein DESign).


Pierce and Winfree [3] prove that PRODES is NP-Hard. We’ll provide an informal idea of the proof here.

First we need to prove that the decision version of PRODES is NP-Hard. The decision version of PRODES is, given a value K, determine whether there is a set of rotamers such that the energy cost is less or equal K. We’ll call it PRODESd. We can then prove that PRODESd is NP-complete by showing that it belongs to the NP complexity class and by reducing, in polynomial time, some known NP-complete problem to it.

We claim that this problem is in NP because given an instance for the problem we can verify in polynomial time whether it is a solution (i.e. returns true), since we just need to evaluate the cost function and verify whether it’s less than K.

We can reduce the 3-SAT problem, known to be NP-complete, to PRODESd. The idea is that we can map every instance of the 3-SAT to an instance of PRODESd and show that the result is “true” for that instance of 3-SAT if and only if the result is “true” for the mapped instance of PRODESd.

Let’s start by formalizing PRODESd as a graph problem, with an example shown in the picture below:

Screenshot from 2019-09-29 16-19-31

a) has 3 positions with their sets of rotamers. b) each rotamer becomes a vertex grouped into sets. We pick exactly one vertex per set and try to minimize the total cost of the edges associated with the selected vertices. Image source: [3]

Now, given a 3-SAT instance, we create a vertex set for each clause C_i (containing a vertex for each literal), and a vertex set for each variable x_i (containing vertices T and F). For each literal x_i we add an edge of weight 1 to the vertex F of the set corresponding to variable x_i. Conversely, for each negated literal, \bar x_i, we add an edge of weight 1 to the vertex T. All other edges have weight 0.

For example, the instance (x_1 + \bar x_2 + x_3) \cdot ( \bar x_1 + x_2 + x_3) \cdot (\bar x_3) yields the following graph where only edges of non-zero weight are shown:

Screenshot from 2019-09-29 16-41-00

Source: [3]

We claim that this 3-SAT instance is satisfiable if and only if the PRODESd is true for K=0. The idea is the following: for each vertex set corresponding to a variable we pick either T or F, which corresponds to assigning true or false to the variable. For each vertex set corresponding to a clause we pick a literal that will evaluate to true (and hence make the clause true). It’s possible to show that if 3-SAT is satisfiable, there’s a solution for PRODESd that avoids any edge with non-zero weight.

Integer Linear Programming

Now that we know that PRODES is unlikely to have an efficient algorithm to solve it, we can attempt to obtain exact solutions using integer linear programming model. Let’s start with some definitions:

Screenshot from 2019-09-29 18-43-51

We can define our variables as:

Screenshot from 2019-09-29 18-44-27.png

The object function of our model becomes:

Screenshot from 2019-09-29 18-45-05

Finally, the constraints are:

Screenshot from 2019-09-29 18-39-34

Equation (1) says we should pick exactly one rotamer for each position. Constraints (2) and (3) enforce that x_{i, j} = 1 if and only if r_i = r_j = 1.

Note: the LaTeX used to generate the images above are available here.


The study of protein prediction led me to protein design, which is a much simpler problem, even though from the computational complexity perspective it’s still an intractable problem.

The model we studied is very simple and makes a lot of assumptions, but it’s interesting as a theoretical computer science problem. Still I’m not sure how useful it is in practice.


[1] Wikipedia – Protein design
[2] Wikipedia – Rotamer
[3] Protein Design is NP-hard – Niles A. Pierce, Erik Winfree

US as an hexagonal map

In this post we’ll study a way to visualize maps in a hexagonal grid, in which each entity have uniform area. We’ll then model that as a mathematical problem.

One challenge in displaying data in maps is that larger area countries or states tend to get more attention than smaller ones, even when economically or population-wise, the smaller state is more relevant (e.g. New Jersey vs. Alaska). One idea is to normalize the areas of all the states by using symbols such as squares. Recently I ran into a NPR map that used hexagons and it looked very neat, so I decided to try building it in D3 and perform some analysis.

Below is the result of plotting the state populations (log scale):

US Hexmap: Population (log scale)

US Hexmap: Population (log scale)

One important property of visualizing data in maps is familiarity of the location (you can easily find specific states because you remember where they are) and also adjacency patterns can provide insights. For example, if we plot a measure as a choropleth map and see that the West coast is colored differently from the Midwest, then we gain an insight we wouldn’t have by looking at a column chart for example.

Because of this, ideally the homogeneous area maps should preserve adjacencies as much as possible. With that in mind, we can come up with a similarity score. Let X be the set of pairs of states that share a border in the actual US map. Now, let Y be the set of pairs of states that share a border in the hexagonal map (that is, two hexagons sharing a side). The similarity score is the size of their symmetric difference and we can normalize by the size of the original:

(|X - Y| + |Y - X|) / |X|

The lower the score the better. In an ideal case, the borders sets would match perfectly for a score of 0.

The size of the symmetric difference between the two sets seems like a good measure for similarity, but I’m not sure about the normalization factor. I initially picked the size of the union of X and Y, but this wouldn’t let us model this problem as a linear programming model as we’ll see next. The problem with using the size of X is that the score could theoretically be larger than 1, but it’s trivial to place the hexagons in the grid in such a way that none of them are touching and thus Y is empty, so we can assume the score is between 0 and 1.

Hexgrid coordinates convention

Hexgrid coordinates convention

The score from the NPR maps is 0.67.

An Optimization Problem

Let’s generalize and formalize the problem above as follows: given a graph G = (V,E), and another graph H = (V_H, E_H) representing our grid, find the induced subgraph of H, I = (V_I, E_I), such that there’s bijection f: V \rightarrow V_I and the size of the symmetric difference of f(E) and E_I is minimized (f(E) is an abuse of notation, but it means applying the bijection to each vertex in the set of edges E).

To make it clearer, let’s apply the definition above to the original problem. G represents the adjacency of states in the original map. V is the set of states and E is the set of pairs of states that share a border. H is the hexagonal grid. V_H is the set of all hexagons and E_H is the set of pairs of hexagons that are adjacent. We want to find a subset of the hexagons where we can place each of the states (hence the bijection from states to hexagons) and if two hexagons are in the grid, and we place two states there, we consider the states to be adjacent, hence the need for an induced graph, so the adjacency in the grid is preserved.

Is this general case an NP-hard problem? We can reduce the Graph Isomorphism problem to this one. It consists in deciding whether two graphs A and B are isomorphic. If we set G = A and H = B, then A and B are isomorphic if and only if I = H and the symmetric difference of f(E) and E_I is 0. The problem is that it’s not known whether Graph Isomorphism belongs to NP-Complete.

What if G is planar (which is the case for maps)? I haven’t given much thought about it, but I decided to come up with an integer programming model nevertheless.

An Integer Linear Programming Model

Note: the model uses the original grid analogy instead of the more general problem so that the constraints are easier to understand.

Boolean algebra as linear constraints

Before we start, we need to recall how to model logical constraints (AND, OR and EQUAL) using linear constraints. Let a and b be binary variables. We want x to be the result of logical operators applied to a and b.

For AND, we can do (x = 1 if and only if a = 1 and b = 1)

x \le a
x \le b
x \ge a + b - 1

For OR, we can do (x = 0 if and only if a = 0 and b = 0)

x \ge a
x \ge b
x \le a + b

For EQUAL, we can do (x = 1 if and only if a = b)

x \le 1 - (a - b)
x \le 1 - (b - a)
x \ge a + b - 1
x \ge -(a + b - 1)

We can introduce a notation and assume these constraints can be generated by a function. For example, if we say
x = \mbox{EQ}(a, b), we’re talking about the four constraints we defined above for modeling EQUAL. This is discussed in [2].


Let G be the set of pairs (x,y) representing the grid positions. Let P be the set of pieces p that have to be placed in the grid. Let N(x,y) be the set of pairs (x',y') that are adjacent to (x, y) in the grid.

Let A_{v1, v2} represent whether v1 and v2 are adjacent to each other in the dataset.

“Physical” constraints

Let b_{x,y,s} be a binary variable, and equals 1 if and only if state s is placed position (x, y).

1) A piece has to be placed in exactly one spot in the grid:

\sum_{(x,y) \in G} b_{x,y,p} = 1 for all p \in P

2) A spot can only be occupied by at most one state:

\sum_s b_{x,y,s} \le 1 for all (x,y) \in G

Adjacency constraints

Let a_{p1, p2, x, y} be a binary variable and equals 1 if and only if piece p1 is placed in (x, y) and adjacent to p2 in the grid.

3) a_{p1, p2, x, y} has to be 0 if p1 is not in (x,y) or p2 is not adjacent to any of (x,y) neighbors:

a_{p1, p2, x, y} = \mbox{AND}(\sum_{(x', y') \in N(x, y)} b_{x', y', p2}, b_{x,y,p})

We have that a_{p1, p2, x, y} is 1 if and only if p1 is in (x,y) and p2 is adjacent to any of (x,y) neighbors.

Finally, we can model the adjacency between two pieces p1 and p2. Let a_{p1, p2} be a binary variable and equals 1 if and only if p1 and p2 are adjacent in the grid:

a_{p1, p2} = \sum_{(x,y) in G} a_{p1, p2, x, y}

Symmetric difference constraints

Let y_{p1, p2} be a binary variable and equals to 1 if and only if a_{p1, p2} \ne A_{p1, p2}.

4) y_{p1, p2} \ge a_{p1, p2} - A_{p1, p2}
5) y_{p1, p2} \ge A_{p1, p2} - a_{p1, p2}

Objective function

The sum of all y‘s is the size of the symmetric difference:

\min \sum_{p1, p2 \in P} y_{p1, p2}.

Practical concerns

This model can be quite big. For our US map example, we have |P| = 50 and we need to estimate the size of the grid. A 50×50 grid is enough for any type of arrangement. The problem is that the number of variables a_{p1, p2, x, y} is |P|^2|G| = 50^4 which is not practical.

We can also solve the problem for individual connected components in the original graph and it’s trivial construct the optimal solution from each optimal sub-solution. This doesn’t help much in our example, since only Hawaii and Alaska are disconnected, so we have |P| = 48. The grid could also be reduced. It’s very unlikely that an optimal solution would be a straight line. In the NPR map, the grid is 8×12. Sounds like doubling these dimensions would give the optimal solution enough room, so |G| = 8*12*4 = 384.

We can also assume states are orderer and we only have variables a_{p1, p2, x, y} for p1 < p2, so the number of a_{p1, p2, x, y} is about 450K. Still too large, unfortunately.

Another important optimization we can do in our case because we're working with a grid is to define the adjacency for x and y independently and combine them afterwards.

Refined adjacency constraints

Instead of working with b_{x,y,s} we use X_{x, s}, and equals 1 if and only if state s is placed position (x, y) for any y and Y_{y, s}, which equals 1 iff state s is placed position (x, y) for any x. The physical constraints are analogous to the previous model:

6) A piece has to be placed in exactly one spot in the grid:

\sum_{x \in G} X_{x,p} = 1 for all p \in P
\sum_{y \in G} Y_{y,p} = 1 for all p \in P

7) A spot can only be occupied by at most one state:

\sum_s X_{xs} \le 1 for all x \in G
\sum_s Y_{y,s} \le 1 for all y \in G

In a hexagonal grid, if we have the piece p1 in position (x,y), it will be adjacent to another piece p2 if and only if p2 is in one of these six positions: 1: (x-1, y), 2: (x+1, y), 3: (x-1, y-1), 4: (x, y-1), 5: (x-1, y+1) or 6: (x, y+1). We can define two adjacency categories: Type I, which happens when p1.y - p2.y = 0 and |p1.x - p2.x| = 1 (cases 1 and 2); and Type II, which is when |p1.y - p2.y| = 1 and p1.x - p2.x \le 0 (cases 3, 4, 5 and 6).

Let’s define Y_{d=0, p1, p2, y} = 1 iff p1.y - p2.y = 0 for a given y. Similarly we define X_{|d|=1, p1, p2, x} = 1 iff |p1.x - p2.x| = 1, Y_{|d|=1, p1, p2, y} = 1 iff |p1.y - p2.y| = 1 and finally X_{d \ge 0, p1, p2, x} = 1 iff p1.x - p2.x \ge 0.

8) We can have the following constraints do model the variables we just defined:

Y_{d=0, p1, p2, y} = \mbox{EQ}(Y_{y, p_1}, Y_{y, p2})
X_{|d|=1, p1, p2, x} = \mbox{EQ}(X_{x, p1}, X_{x-1, p2} + X_{x+1, p2})
Y_{|d|=1, p1, p2, y} = \mbox{EQ}(Y_{y, p1}, Y_{y-1, p2} + Y_{y+1, p2})
X_{d \ge 0, p1, p2, x} = \mbox{EQ}(X_{x, p1}, X_{x, p2} + X_{x+1, p2})

9) Let Y_{d=0, p1, p2} = 1 iff p1.x - p2.y = 0 for any y. We can define analogous variables for the other cases:

Y_{d=0, p1, p2} = \sum_{y} Y_{d=0, p1, p2, y}
X_{|d|=1, p1, p2} = \sum_{x} X_{d=0, p1, p2, x}
Y_{|d|=1, p1, p2} = \sum_{y} Y_{d=0, p1, p2, y}
X_{d \ge 0, p1, p2} = \sum_{x} Y_{d \ge 0, p1, p2, x}

10) Let T'_{p1, p2} = 1 iff p1 and p2 have the Type I adjacency and T''_{p1, p2} = 1 iff p1 and p2 have Type II adjacency:

T'_{p1, p2} = \mbox{AND}(Y_{d=0, p1, p2}, X_{|d|=1, p1, p2})
T''_{p1, p2} = \mbox{AND}(Y_{|d|=1, p1, p2}, X_{d \ge 0, p1, p2})

11) Finally, we say that p1 and p2 are adjacency iff either Type I or Type II occurs:

a_{p1, p2} = \mbox{OR}(T'_{p1, p2}, T''_{p1, p2})

The model for adjacency became much more complicated but we were able to reduce the number of adjacency variables are now roughly O(|P|^2 \sqrt{|G|}). The number of non-zero entries in the right hand size of (which represents the size of the sparse matrix) is roughly 11M, dominated by the type (8) constraints. I’m still not confident this model will run, so I’ll punt on implementing it for now.


In this post we explored a different way to visualize the US states map. I was mainly exploring the idea of how good of an approximation this layout is and a natural question was how to model this as an optimization problem. Turns out that if we model it using graphs, the problem definition is pretty simple and happens to be a more general version of the Graph Isomorphism problem.

I struggled with coming up with an integer programming model and couldn’t find one with a manageable size, but it was a fun exercise!

World Map?

One cannot help wondering if we can display the countries in a hexagonal map. I’m planning to explore this idea in a future post. The main challenge is that the US states areas are more uniform than the countries. For example, the largest state (Alaska) is 430 times larger than the smallest (Rhode Island). While the largest country (Russia) is almost 40,000,000 bigger than the smallest (Vatican City).

Also, the layout of the US map was devised by someone from NPR and they did a manual process. I’m wondering if we can come up with a simple heuristic to place the hexagons and then perform manual adjustments.


[1] NPR Visuals Team – Let’s Tesselate: Hexagons For Tile Grid Maps
[2] Computer Science: Express boolean logic operations in zero-one integer linear programming (ILP)
[3] SOM – Creating hexagonal heatmaps with D3.js
[4] Github – d3/d3-hexbin

Data sources

[5] US State Borders
[6] Wikipedia – Population of US states and territories
[7] Tool to download Wikipedia tables as CSV
[8] List of US states and territories by area

Lawler and an Introduction to Matroids


Eugene Lawler was an American computer scientist, professor of UC Berkeley and was one of the founders of the field of Combinatorial Optimization.

Lawler made important contributions on branch and bound algorithms, dynamic programming, and was also the first one to observe that matroid intersection could be done in polynomial time.

He also proved that two of Karp’s 21 NP-Complete problems, The Directed Hamiltonian Cycle and 3-Dimensional Matching were NP-Complete.

This is the first post in our series of Matroids in Combinatorial Optimization context. They will be mainly based on the Matroids and the Greedy Algorithm chapter from Combinatorial Optimization – Networks and Matroids, by Eugene Lawler.


Hassler Whitney developed Matroids in 1935 in the context of algebraic theory and it was further applied by Jack Edmonds in the context of combinatorial optimization.

Matroids are a structure that allows us to solve problems by always taking local optimal steps and by doing that there’s the guarantee we’ll reach the global optimum. These types of algorithms are known as greedy algorithms.

First we’ll define Matroids and then give some examples of problems to modeled as matroids. Next, we’ll introduce weighted matroids and describe a generic algorithm to solve them and how such algorithm applied to matroids in graphs is actually the famous Kruskal algorithm.


Let E be a set of elements and \cal I a family of subsets of E (family here has the meaning of a set of elements that share some properties and it’s also clearer than using ‘set of subsets’). Let M = (E, \cal I) and consider the following properties:

1) \emptyset \in \cal I and if a subset I \in \cal I, then all subsets of I belong to \cal I as well.

2) If I_p \in {\cal I} with |I_p| = p and I_{p+1} \in {\cal I} with |I_{p+1}| = p+1, then there is an element e \in I_{p+1} \setminus I_{p} such that I + e \in {\cal I}. (Henceforth, by abuse of notation, when we say I + e we mean I \cup \{e\}).

If M satisfies both (1) and (2), we say that M is a matroid.

Terminology. An independent set is any set I belonging to the family \cal I. If there’s no other set containing I in \cal I we say it’s a maximal independent set. Note that by property (2), all maximal independent sets have the same size.

The rank of a set E, denoted by r(E) is the largest independent subset of E. A minimal dependent set is called circuit.

Special types of matroids

We can model some structures as matroids and take advantage of matroids properties to find solutions to problems involving these structures. In this section we’ll present a few examples of such modelings.

The modelling process consists in defining the set E, the family \cal I (usually by defining the property that the subsets of E must have to be in there) and then proving that \cal I satisfies (1) and (2).

Matric Matroid.

A matric matroid is a matroid in the context of matrices. Given a matrix A with the set of columns as E, if \cal I is the family of sets containing only linear independent columns, then M = (E, {\cal I}) is a matric matroid.

Proof. It’s easy to see that any subset of a linear independent (LI) set of columns is also LI, so (1) is straightforward. For (2), we need to prove that given subsets I_{p+1} and I_{p} of p+1 and p LI columns, respectively, then there must be a column c from I_{p+1} \setminus I_{p} such that I_{p} + c is still LI. Well, if it’s not the case, it means that each column in I_{p+1} can be written as linear combination of the p LI columns from I_{p}, which contradicts the fact that I_{p+1} is LI.

Graphic Matroid.

A graphic matroid is a matroid in the context of graphs. Given a graph G = (V, E), if \cal I is the family of arcs that do not form a cycle, then M = (E, {\cal I}) is a graphic matroid.

Proof. It’s possible to show that subsets of edges that are cycle free, correspond to LI columns in the incidence matrix of G, so in this case we can also view M as a matric matroid of the incidence matrix of G.

Matching Matroid.

A matching matroid is a matroid in the context of matchings in graphs.

Let G = (V, E) be an undirected graph and let \cal I be the family of subsets of nodes I \subseteq V such that there’s a matching in G that covers all nodes in I (we say that an node in I is covered if there’s an edge in the matching incident to it, even if the other end of the edge is not in I). Then M = (V, {\cal I}) is a matching matroid.

Sketch of the Proof. It’s easy to verify that \cal I satisfies (1). The proof of why it satisfies (2) consists in getting the symmetric difference between matchings covering p+1 and p vertices respectively and showing that in that difference there will be at least one alternating path such that if we change the alternation will increase the number of matched vertices.

Weighted Matroids

Let M = (E, {\cal I}) be a matroid and w a weight function for elements in E. The problem we want to solve is to find the independent set I \in {\cal I} that has maximum weight, where the weight of a set is the sum of the weight of its elements.

Let’s assume the elements on each independent set are listed in the non-increasing order of weight, so given sets I_1 and I_2, we list them as

I_1 = \{a_1, a_2, \cdots, a_m\} and I_2 = \{b_1, b_2, \cdots, b_n\}

such that w(a_1) \ge w(a_2) \ge \cdots w(a_m) and w(b_1) \ge w(b_2) \ge \cdots w(b_n)

We say I_1 is lexicographically greater than I_2 if for the first position their components differ, say at index k, a_k > b_k or in case I_2 listing is a prefix of I_1 listing (that is, the same way we sort strings).

A set that is not lexicographically less than any other set is said to be lexicographically maximum. We can show such set is also a maximum independent set, because otherwise, by property (2), we can always add more elements to it, making it lexicographically greater.

The following Theorem from Rado and Edmonds states an important property of weighted matroids:

Theorem 1. Let M = (E, {\cal I}) be a matroid. Then

3) For any negative weight function on E, a lexicographically maximum set in \cal I is also the set with the maximum weight.

Conversely, given E and \cal I, if (1) and (3) are satisfied, M = (E, {\cal I}) is a matroid.

We say that a set B \in {\cal I} is Gale optimal if all its components are not less than the corresponding components of any other set I \in {\cal I} when these components are listed in non-increasing order. More formally, there exists a one-to-one mapping h:I \rightarrow B such that w(e) \le w(h(e)) for all e \in I.

Note that a Gale optimal set is clearly a lexicographically maximum and thus has optimal weight.

Given that, Gale’s theorem provides another a stronger result than Theorem 1 regarding weighted matroids:

Theorem 2. Let M = (E, {\cal I}) be a matroid. Then

4) For any weight function of elements on E, there exists a Gale optimal set B \in {\cal I}.

Conversely, given E and \cal I, if (1) and (4) are satisfied, M = (E, {\cal I}) is a matroid.

Weighted Matroid Greedy Algorithm. Property (4) allows to use a simple greedy algorithm to find the set of the largest weight in \cal I.

In the first step, we look for the single-element set I = \{e_0\} in \cal I with the largest weight. By property (2) and (4), we can show that the Gale optimal set contains e_0. Next, we look for the largest element e_1 such that \{e_0, e_1\} \in {\cal I}. Again, we can show that such elements are contained in the Gale optimal set. We repeat this until we get a maximum independent set, which is also the Gale optimal set.

More generically, we have the following algorithm:

Let S be our current solution, which starts as the empty set. At each step, we look for the element e with maximum weight not in S such that S + e belongs to \cal I.

The Maximum Spanning Tree problem and the Kruskal Algorithm.

In the maximum spanning tree problem we are given a connected, undirected graph G = (V,E) with non-negative weights w on V and we want to find a spanning tree (subset of E that forms a tree) with the lowest cost.

Recall that a tree is a connected graph without cycles. The graphic matroid represents cycle-free subsets of arcs (or a forest) and thus the maximum independent set of a connected graph is a tree. If we assign non-negative weights to arcs, we can find the optimal Gale independent set which corresponds to the maximum spanning tree.

The Kruskal algorithm is then basically an application of the weighted matroid greedy algorithm for graphic matroids: It first sorts the edges by non-increasing order of weight and then adds an edge to the current solution if it doesn’t form a cycle.


In this post we learned the basics of matroids and weighted matroids. In future posts we’ll study Matroid Intersection, Matroid Partition and other types of matroids like Oriented Matroids.


[1] Wikipedia – Eugene Lawler
[2] Combinatorial Optimization – Networks and Matroids, Eugene Lawler – Chapter: Matroids and the Greedy Algorithm

Totally Unimodular Matrix Recognition

Paul Seymour is an english mathematician, graduated from Oxford. He is currently teaching at Princeton.

His research area concentrates on discrete mathematics, where he obtained important results including in Regular Matroids, Totally Unimodular Matrices and the Four Color Theorem, being awarded the Fulkerson Prize four times.

In this post we’ll present one of his results regarding decomposition of totally unimodular matrices, which are the key piece in deciding whether a given matrix is TU.

This is the third post about Totally Unimodular (TU) matrices. We introduced them here and described a special case called Network matrices here.

Although it’s not true that all TU matrices are network matrices, Seymour’s theorem basically says that all TU matrices are some kind of combination of network matrices and the following matrices:

(1) \left[ \begin{array}{rrrrr}   1 & -1 & 0 & 0 & -1\\  -1 & 1 & -1 & 0 & 0\\  0 & -1 & 1 & -1 & 0\\  0 & 0 & -1 & 1 & -1\\  -1 & 0 & 0 & -1 & 1\\  \end{array} \right]

(2) \left[ \begin{array}{rrrrr}   1 & 1 & 1 & 1 & 1\\  1 & 1 & 1 & 0 & 0\\  1 & 0 & 1 & 1 & 0\\  1 & 0 & 0 & 1 & 1\\  1 & 1 & 0 & 0 & 1\\  \end{array} \right]

It’s possible to show that TU matrices are closed under the following operations:

(i) permuting rows or columns
(ii) taking the transpose
(iii) multiplying a row or column by -1
(iv) pivoting, that is, transforming

\left[ \begin{array}{cc}   \xi & c\\  b & D\\  \end{array} \right]


\left[ \begin{array}{cc}   -\xi & -\xi c\\  \xi b & D - \xi b c \\  \end{array} \right]

where \xi is a scalar and c and b are a row and a column vector of the appropriate sizes, respecitvely.

(v) adding a row or column with at most one non-zero entry
(vi) repeating a row or a column

Also, given TU matrices A and B, the following operations preserve total unimodularity:

(vii) A \oplus_1 B := \left[ \begin{array}{rr}   A & 0\\  0 & B\\  \end{array} \right]

(viii) \left[ \begin{array}{rr}   A & a\\   \end{array} \right] \oplus_2   \left[ \begin{array}{r}   b\\  B\\   \end{array} \right] :=  \left[ \begin{array}{rr}   A & ab\\  0 & B\\  \end{array} \right]

(ix) \left[ \begin{array}{rrr}   A & a & a\\  c & 0 & 1\\   \end{array} \right]     \oplus_3    \left[ \begin{array}{rrr}   1 & 0 & b\\  d & d & B\\   \end{array} \right]    :=    \left[ \begin{array}{rr}   A & ab\\  dc & B\\  \end{array} \right]

We can now state Seymour’s theorem:

Theorem 1. (Seymour’s decomposition theorem for totally unimodular matrices). A matrix A is totally unimodular if and only if A arises from network matrices and the matrices (1) and (2) by applying the operations (i) to (ix). Here, the operations (vii) to (ix) are only applied if for A and B, the number of rows and columns added is at least 4.

Recognizing Total Unimodularity

The algorithm for recognizing whether a given matrix M is TU consists in finding whether it is a network matrix or one the matrix (1) or (2).

1. If any of the entries of M is not in \{-1, 0, 1\}, then M is not TU.

2. Remove all rows and columns with one or less non-zero entries (Property (v)).

3. Remove repeated rows and columns (Property (vi)).

4. Test if M or its transpose M^T is a network matrix or (1) or (2), possibly permuting and multiplying rows and columns by -1 (Properties (i), (ii) and (iii)). If yes, then the matrix is TU.

At this point of the algorithm, we still don’t know whether our matrix is TU. We’ll now check whether M can be decomposed as follows:

(3) M = \left[ \begin{array}{rr}   A & B\\  C & D\\   \end{array} \right]

such that \mbox{rank}(B) + \mbox{rank}(C) \le 2 and for both A and D, the number of rows plus the number of columns is greater or equal than 4.

Cunningham and Edmonds stated it’s possible to find such a decomposition for a matrix M or conclude no such decomposition exists in polynomial time using the matroid intersection algorithm.

Going back to the algorithm:

5. If M has cannot be decomposed as (3), then it’s not TU.

6. If it can, then we break into cases depending on the values of \mbox{rank}(B) and \mbox{rank}(C). Given that \mbox{rank}(B) + \mbox{rank}(C) \le 2, we have the 6 possible combinations:

\begin{array}{l|c|c}   \mbox{Case} & \mbox{rank}(B) & \mbox{rank}(C)\\  \hline  1 & 0 & 0 \\  2 & 1 & 0 \\  3 & 0 & 1 \\  4 & 1 & 1 \\  5 & 2 & 0 \\  6 & 0 & 2 \\  \end{array}

Case 1:

A matrix with rank zero is the zero matrix, so we have

M = \left[ \begin{array}{rr}   A & 0\\  0 & D\\   \end{array} \right]

By Property (vii), if A and D are TU, M is TU. Since A and D are submatrices of M, the converse holds, so it suffices to recursively verify that A and D are TU.

Case 2:

Since B has rank 1, it has all duplicate rows/columns or rows/columns with all zeroes. Thus, we can write it as B = fg, where f is a column vector and g is a row vector. In this form, we can write M as the result of the 3-sum operation:

\left[ \begin{array}{rr}   A & f\\   \end{array} \right] \oplus_2   \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] :=  \left[ \begin{array}{rr}   A & fg\\  0 & D\\  \end{array} \right]

From Property (viii), if \left[ \begin{array}{rr}   A & f\\   \end{array} \right] and \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] are TU, M is TU. Also, since both are submatrices of M, if one of them is not TU, then M can’t be TU either, so we can test \left[ \begin{array}{rr}   A & f\\   \end{array} \right] and \left[ \begin{array}{r}   g\\  D\\   \end{array} \right] recursively.

Case 3: Analogous to Case 2.

Case 4:

In this case, both B and C have all duplicate rows/columns or rows/columns with all zeroes. We can re-arrange the rows and columns in such a way that M looks like this:

\left[ \begin{array}{rrrr}   A_1 & A_2 & 0 & 0\\  A_3 & A_4 & 1 & 0\\  0 & 1 & D_1 & D_2\\  0 & 0 & D_3 & D_4\\  \end{array} \right]


A := \left[ \begin{array}{rr}   A_1 & A_2\\  A_3 & A_4\\  \end{array} \right]


D := \left[ \begin{array}{rr}   D_1 & D_2\\  D_3 & D_4\\  \end{array} \right]

We define a bipartite graph G_A with one partition being the set of rows of A and the other partition the set of columns of A. There’s an edge between two vertices r and c if the entry (r,c) is non-zero in A.

Let R be the set of vertices of the row partition that intercepts A_4 and K those of the column partition that intercepts A_4. If there’s no path between R and K, then we can reduce to Cases 2 and 3, so without loss of generality, consider the shortest path \prod_A between R and K.

Since R and K are in different partitions, \prod_A has odd length \delta. Let

\xi_A = \left\{ \begin{array}{rr}   1 & \mbox{if } \delta \equiv 1\, (\mbox{mod } 4)\\  -1 & \mbox{if } \delta \equiv 3\, (\mbox{mod } 4)\\  \end{array} \right.

and \xi_D defined analogously.

(4) \left[ \begin{array}{rrr}   A_1 & A_2 & 0\\  A_3 & A_4 & 0\\  0 & 1 & \xi_D  \end{array} \right]

(5) \left[ \begin{array}{rrr}   \xi_A & 1 & 0\\  1 & D_1 & D_2\\  0 & D_3 & D_4  \end{array} \right]

The algorithm says that M is TU if and only if (4) and (5) are TU.

Case 5: Can be reduced to Case 4 by pivoting (Property iv).

Case 6: Analogous to Case 5.

Complexity and Implementation

In [2], Truemper presents a O(n + m)^3 algorithm for the total unimodularity testing and in [3] he Walter and Truemper provides an implementation in C++ of a simplified O(n + m)^5 version.


The matroid intersection algorithm is used in one of the steps of the algorithm. I’ve heard about this algorithm before, but never studied it, so this will the next subject I’m researching about in the series of posts on Combinatorial Optimization.


[1] Theory of Linear and Integer Programming – A. Schrijver (Chapter 20)
[2] A decomposition theory for matroids. V. Testing of matrix total unimodularity – K. Truemper
[3] Implementation of a Unimodularity Test – M. Walter and K. Truemper

Network Matrices


In this post we’re going to write about a special case of the \{0, \pm 1\} matrix, which is called network matrix. They play an important role in our study of totally unimodular matrices.

We’ll first define a network matrix and provide an example. We’ll later show some properties and then describe a polynomial-time algorithm to recognize network matrices.


Network matrices were first defined by William Tutte (1965) [1]. The definition is the following:

Let D = (V, A) be a directed graph and T = (V, A_0) be a directed tree on the vertex set V. Let M be the matrix A_0 \times A-matrix defined by a = (v, w) \in A and a' \in A_0.

M_{a',a} = \left\{ \begin{array}{ll}   +1 & \mbox{if the path } v\mbox{-}w \in T \mbox{ passes through}\\     & a \mbox{ in the same direction.}\\  -1 & \mbox{if the path } v\mbox{-}w \in T \mbox{ passes through}\\     & a \mbox{ in the opposite direction}.\\  0  & \mbox{if does not pass through } a \mbox{ at all.}  \end{array} \right.

Matrices with this structure are called network matrices.

Example. In Figure 1 we see a sample digraph G(V,A) and a given directed tree T on the vertex set V. The corresponding network matrix M represented by G and T is given on Table 1.


Figure 1. Directed graph G(V,A) and a directed tree T on V

For each arc (u,v) \in A, we have a column in M representing the path from u to v in T. The signs indicate the direction of the edges in the path.

Table 1. Network Matrix of G and T

Table 1. Network Matrix of G and T


Let M be a network matrix, represented by the digraph G and the directed tree T.

1) Multiplying a row of M by -1 is equivalent to inverting the direction of the corresponding edge in T.

2) Multiplying a column of M by -1 is the same as inverting the direction of the corresponding edge in G.

3) Deleting a row of M corresponding to edge (i, j) of T, is the same as shrinking vertex i and j into a single vertex.

4) Deleting a column of M is equivalent to removing the corresponding edge from G.

Combining properties (3) and (4), we have that:

5) A submatrix of a network matrix is a network matrix as well.

The following theorem is the most important property about Network Matrices, that we’ll explore in the forthcoming posts about Integer Programming Theory:

Theorem 1. Networks matrices are Totally Unimodular.

Recognizing Network Matrices

There is a polynomial-time algorithm for deciding whether a matrix M is a network matrix. Without loss of generality we restrict ourselves to \{0, \pm 1\} matrices, since all network matrices are of this type and we can easily find out whether a matrix has such property.

We now divide it in two cases. Case 1: for all columns of M, it has at most two non-zero entries; Case 2: the remaining types, that is, M has at least one column with three or more non-zero entries.

Case 1. Let’s describe the algorithm for the first case. Let M be a m \times n, \{0,\pm 1\} matrix with at most 2 non-zero entries.

If M has, for each column, at most one entry 1 and at most on entry -1, we can show it’s a network matrix by constructing a digraph G and a tree T such that their corresponding network matrix is M. We first build a direct star T with m vertices and with all edges pointing towards the center vertex v^*. We now build the digraph G with the same vertex set. For each column in M that has an entry +1 and -1 for rows (u,v^*) and (w,v^*) we add an edge (u,w). For columns having a single entry for row (u, v^*), we add the (u,v^*) to G if it’s +1 or (v^*,u) if it’s -1.

The problem is that even in the restrictive Case 1, we may have both entries with the same signal. From property (1) though, we can multiply some rows by -1 and try to reach the case above. The question is then: can we split the rows into sets R_1 and R_2, in such a way that if we multiply rows in R_2 by -1, we have at most one entry +1 and at most one entry -1 for all columns?

This question can be answered by a neat reduction to problem of deciding wether a graph is bipartite. Let G_R be a graph with vertex set corresponding to each row of M plus some artificial vertices. We add an edge (i, j) if the corresponding rows have the same signal for some column. We add edges (i, v_{ij}^*) and (j, v_{ij}^*) if the have different signs. We then try to split this graph into two partitions R_1 and R_2.

The idea is that if such a partitioning exists, vertex with different signs will be in the same partition because they share a common vertex and vertex with the same signs must be on different partitions. If we now multiply all rows in R_2 by -1, we’ll have our property.

Conversely, if no partition exists, it’s possible to show (see Example 1, from this post) that such matrix is not TU. Since by Theorem 1 all network matrices are TU, we conclude that this matrix is also not a network matrix. Summarizing we have that

Observation 1. The matrix M from the Case 1 is a network matrix if and only if its graph G_R define as above is bipartite.

Case 2. Now let’s concentrate on the more general case, where the matrix has at least one column with 3 or more non-zero entries.

For each row i of M, we define a graph G_i with vertex set \{1, \cdots, m\} \setminus \{i\}. There’s an edge between j and k if exists any column in M such that it has non-zero entries for j and k, but 0 for i. We have the following observation:

Observation 2. If M is a network matrix, there exists a row i for which G_i is disconnected.

The basic idea in understanding this observation is that since there is at least one column in M with three non-entries, there must be a path in T that connects two vertices in G with length at least 3. There is some edge i from the middle of this path that is not the first nor the last edge, henceforth denoted as j and k. If we remove this edge, we will split it into two components. It’s then easy to see that any path in the tree that has j and k needs to go through i. In turn, this means that there is no edge between the corresponding vertices in G_i.

From Observation 2, we can conclude that if a given matrix has G_i for all possible columns i, then M is not a network matrix.

So we can now suppose our candidate matrix has a disconnected G_1 (we can assume i = 1 without loss of generality. Let C_1, \cdots, C_p be the connected components of G_1.

We define

* W := as the set of column indexes for which the first row of M has non-zero entries;

* W_i := W \cap the set of column indexes for which the i-th row of M has non-zero entries;

* U_k := \bigcup \{W_i | i\in C_k\}

Now, we build a graph H with vertex set \{C_1, \cdots, C_p\}, with an edge (C_k, C_\ell) if the following conditions are met:

* \exists i \in C_k: U_\ell \not \subseteq W_i and U_\ell \cap W_i \neq \emptyset

* \exists j \in C_\ell: U_k \not \subseteq W_j and U_k \cap W_j \neq \emptyset

and let M_k be the submatrix formed by the rows of M corresponding to vertices in C_k.

We now can state the following Theorem:

Theorem 2. M is a Network Matrix if and only if: H is bipartite and M_k is a network matrix for k=1, \cdots, p.


Theorem 3. The algorithm above to detect whether a given matrix M is a network matrix has polynomial-time complexity.

1. Check if has only entries in \{0, \pm 1\}

2. Test if has at most two non-zero entries for each column (Case 1 or Case 2)

3. Case 1: Construct the graph G_R and check whether it is bipartite.

4. Case 2: For each column i, construct graph G_i and check if it is not connected. In this case, we build the graph H and build each of the submatrices M_k.

If we assume the matrix is m \times n, (1) and (2) can be performed in linear size of the matrix, O(mn). We can construct G_R in O(mn) as well and test for bipartiteness in linear time on the number of vertices plus the edges of G_R, which are both O(n + m), so (3) is bounded by O(mn) too.

For step (4), we need to generate the graph G_i. Its edges are bounded by mn and we can decide it’s connected in linear time of its size. Doing this for all n columns we a total complexity of O(mn^2).

For the recursive complexity, we can relax our complexity for Steps 1 to 4 to be O(n^tm^t) for a fixed constant t \ge 2. By induction, we assume our total complexity for a given level of the recursion is O(n^{t+1}m^t).

The matrix M_k has m_k rows and n columns, can be solved, by induction, in O(m_k^{t+1}n^t). Summing up all the complexities we have:

O(m^tn^t) + O(m_1^{t+1}n^t) + O(m_2^{t+1}n^t) + \cdots + O(m_p^{t+1}n^t) which is O(m^{t+1}n^t)

The base is when we do steps 1 to 4 without building any submatrices, which we saw it is O(m^tn^t).


The main points to be taken from this post is that network matrices are totally unimodular and that they can be recognized in polynomial-time.

We provided explanations about the two observations, but we left out the proofs of the Theorems 1 and 2, which are quite long and complicated, and thus out of the scope of the post. Nevertheless, they can be found on [1].


[1] Theory of Linear and Integer Programming – A. Schrijver (Chapter 19)

Flood it! An exact approach

Flood-it is a game created by LabPixies, which was recently aquired by Google.

The game consists of a n \times n board with random colors cells. Let’s call the top-left cell a seed cell and the region connected to the seed cell and with the same color as it, the flooded region. At each round, the player chooses a color for the flooded region which may flood adjacent regions, expanding the flooded region. The final objective is to flood all the board.

Figure 1. Flooding proccess [1]

In the original game, there are three different sizes of boards: 14, 21 or 28. The number of colors is always 6.

A paper from Clifford, Jalsenius, Montanaro and Sach [1], presents theoretical results regarding the general version of this problem, where the size of the board and the number of colors can be unbounded.

In this post we’ll highlight the main results of their paper and present an integer linear programming approach to solve it exactly.


For C \ge 3, the game is shown to be NP-hard even if one is allowed to start flooding from an arbitrary cell. The proof given in [1] is based on a reduction from the shortest common supersequence problem (SCS for short).

Greedy approaches

There are two greedy approaches one might try:

1. Pick the color that maximizes the number of cells covered.

2. The most frequent color in the perimeter of the current region.

In Figure 2, we have an instance where this strategies can be arbitrarily bad [1]. They use n colors while the optimal solution is 3.

Figure 2: A 10×10 board for which the greedy algorithms perform badly [1]

Approximation algorithms

Surprisingly, the following naïve strategy:

Cycle through all colors until the board is colored.

gives a solution with a value within a factor of the optimal value.

More specifically, if L is the optimal number of movements and C the number of available colors, then this algorithm solves the problem with no more than C L movements. To see why, let \{c_1, c_2, \cdots, c_L \} be the color sequence of the optimal solution. In the worst case, each cycle covers at least one color in the sequence. Thus, this algorithm is C approximated.

This can be refined to C-1 observing that the cycle does not need to have the current color of the flooded region.

The authors improve this bound with a randomized algorithm that achieves a \frac{2c}{3} expected factor.

They also give a lower bound, proving that if the number of colors is arbitrary, no polynomial time approximated algorithm with a constant factor can exist unless P=NP.


An upper bound for the number of movements required for solving any n \times n is given by the following theorem:

Theorem 1: There exists a polynomial time algorithm for Flood-It which can flood any n x n board with C colors in at most 2n + (\sqrt{2C})n + C moves.

Conversely, we can set an lower bound for a n \times n board:

Theorem 2: For 2 \le C \le n^2, there exists an n \times n board with (up to) c colors which requires at least (\sqrt{C - 1})n/2 - C/2 moves to flood.

That is, we can’t expect an algorithm to perform much better than the one from Theorem 1 for arbitrary boards.

Integer Linear Programming

Let C be the number of colors and k an upper bound for the optimal solution. A component is a connected region of pixels with same color, considering 4-adjacency. Two components i and j are adjacenct if there exists at least a pixel in i adjacent to a pixel in j.

We denote by N(i) the set of components adjacent to component i. Let S be the set of components and m = |S|. Furthermore, let c_{i} be the color of component i.

We define the binary variable x_{ct} that is 1 if color c = 1, \cdots, C is chosen at iteration t = 1, \cdots, k or 0 otherwise. We also define the binary variable y_{it} that is 1 if component i = 1, \cdots, m is filled in some iteration t or 0 otherwise.

For simplicity, we’ll assume that the component of the seed cell has index i = 1.

(1) Objective function:

\displaystyle \min \sum_{c=1}^{C} \sum_{t=1}^{k} x_{ct}

(2) For each component i,

\displaystyle \sum_{t=1}^{k} y_{it} = 1

(3) For each iteration t,

\displaystyle \sum_{c=1}^{C} x_{ct} \le 1

(4) For each component i and iteration t,

\displaystyle y_{it} \le \sum_{j \in N(i)} \sum_{t' = 1}^{t - 1} y_{jt'}

(5) For each component i and iteration t

y_{it} \le x_{c_it}

(6) For each component i = 2, \cdots, k,

y_{i,0} = 0

(7) For the component of seed cell,

y_{1,0} = 1

(8) For all component i and iteration t,

y_{it} \in \{0, 1\}

(9) For all color c and iteration t,

x_{ct} \in \{0, 1\}

Constraint (2) states that we fill a component exactly in one iteration. Constraint (3) says that at each iteration we pick at most one color. Constraint (4) allow us to fill a component only if any of its adjacent components has been already filled (which means it currently has the same color as the seed pixel) and if its color is the same as the chosen color (5) .

Finally, constraints (6)-(9) state the variables are binary and each component starts unfilled except the component of the seed cell.

Computational Experiments

We’ve implemented this model in C++ using the COIN-OR Cbc library. The code is available, as always, on github.

The first task was obtaining “real” instances. I don’t know whether the colors of Flood it! boards are uniformly chosen. Thus, I preferred to take some print screens from the original game and do some image processing to convert it matrices with integer representing colors.

Figure 3: A 14×14 Flood it! puzzle

Unfortunately the model is too big if we use k as the number of components, m. The number of constraints (4) is then O(m^3) and for the instance in Figure 3, m = 127.

In order to reduce its size, we tried two tricks.

First, we changed our model a little bit to decrease the number of contraints (4). Now, y_{it} may also be 1 if component i was covered some iteration before t. Thus, we can rewrite (4) as,

(4′) For each component i,

\displaystyle y_{it} \le \sum_{j \in N(i)} y_{j(t-1)}

But now we need to allow a component to be “covered” if it was covered in a previous iteration, even if the current color does not match. Thus, (5) becomes:

(5′) For each component i and iteration t

y_{it} \le x_{c_it} + y_{i(t-1)}

We also need to change (2) equality to \ge inequalities. Now, note that there are O(m^2) constraints (4′).

The second trick is based on the assumption that in general, the optimal number of movements is much less than the number of components. Thus, solve the model for increasing values of k starting with the number of colors until we find a feasible solution.

Even with these changes, we were not able to solve the instance in Figure 3. The 5×5 instance obtained from the first rows and cols of the matrix is solved in 2 minutes using 9 colors.

0 0 1 2 2
3 4 0 0 1
0 0 0 2 4
1 3 3 3 4
3 0 4 0 2


1 2 0 1 3 0 3 4 2

For the 6×6 instance obtained the same way, the solver does not find the optimal solution in an hour.


In this post we gave a brief summary of Clifford et al. paper and then presented a integer linear programming approach to solve instances exactly.

As our experimental results showed, even for the smallest boards (14 x 14) we’re not able to obtain optimal solutions in feasible time.

As future work, we may try devising alternative models or find additional inequalities. Another possibility is to solve this same model using a commercial solver like CPLEX.


[1] Clifford R., Jalsenius M., Montanaro A. and Sach B. – The Complexity of Flood Filling Games (

Totally Unimodular Matrices

An unimodular matrix is a square matrix with integer entries such that its determinant is either -1, 0 or 1. A matrix is said totally unimodular (TU for short) if all its square submatrices are unimodular.

Sometime ago, we said that problems such as the minimum path, maximum flow and minimum cost max flow can be modeled using linear programming with the interesting property that the optimal solutions are always integer.

In that post, we also said that it was because the coefficient matrix of the constraints of the formulations are totally unimodular. More formally, we have the following theorem:

Theorem 1. Let A be a totally unimodular matrix and b be an integer vector. Then, the polyhedra P = \{x | Ax \le b\} has integer vertices.

In this post we’ll present some properties of TU matrices and discuss about two simple examples.


Let A be a totally unimodular matrix. Then we have the following properties:

  1. Its transpose, A^{T}, is TU.
  2. The matrix obtained by appending the identity matrix to A, that is [A, I], is TU
  3. Any submatrix of A is TU
  4. The matrix obtained by multiplying any row of A by -1 is TU
  5. The matrix obtained by duplicating any row of A is TU

Using this properties we can get some Corollaries from Theorem 1.

Since [A, -I] is TU, we have the following

Corollary 1. The polytope P = \{x | Ax \le b; x \ge 0 \} has integer vertices.

Also, since [A^T, -A^T, I, -I] is TU,

Corollary 2. The dual of P = \{c^Tx | Ax \le b; x \ge 0 \}, namely Q = \{b^Ty | A^Ty \ge c; y \ge 0\} has also integer vertices.


1. Bipartite Graphs

Let G = (V, E) be an undirected graph and M the incidence matrix of G. That is, a binary matrix where each line corresponds to a vertex v and each column to an edge e. We have M_{v,e} = 1 if v is an endpoint of e or M_{v,e} = 0 otherwise. Then, we have the following result:

Theorem 2. The incidence matrix of a graph G is totally unimodular if and only if, G is bipartite.

This result can be used to derive the König-Egerváry theorem, stating that the maximum cardinality matching and the minimum vertex cover have the same value bipartite graphs.

The maximum cardinality can be modeled as integer linear programming:

\max \sum_{e \in E} y_e

\begin{array}{llclr}    & \sum_{e = (u, v)} y_e & \le & 1 & \forall v \in V\\   & y_e & \in & \{0, 1\} & \forall e \in E  \end{array}

And its dual is the minimum vertex cover:

\min \sum_{v \in V} x_v

\begin{array}{llclr}    & x_u + x_v  & \ge & 1 & \forall (u,v) \in E\\   & x_v  & \le & \{0, 1\} & \forall v \in V  \end{array}

It’s not hard to see that if M is the incidence matrix of the graph, then the problems can be stated as

(1) \max \{1y | My \le 1; y \mbox{ binary} \} and

(2) \min \{x1 | xM \ge 1; x \mbox{ binary} \}

If the graph is bipartite, we can use Theorem 2 and the strong duality for linear programs to conclude that (1) = (2).

2. Directed Graphs

Let D = (V, A) a directed graph, and M be the incidence matrix of D. That is, a matrix where each line corresponds to a vertex and each column to an arc. For each arc e = (u, v), we have M_{u, e} = -1 and M_{v, e} = 1 and 0 otherwise. For directed graphs, we have a even stronger result:

Theorem 3. The incidence matrix of a directed graph D is totally modular.

Consider a network represented by D and with capacities represented by c : A \rightarrow \mathbb{R}_{+}. For each directed edge (ij) \in A, let x_{ij} be the flow in this edge.

If M is the incidence matrix of D, then Mx = 0 corresponds to

\begin{array}{llclr}   (3) & \sum_{(iv) \in A} x_{iv} & = & \sum_{(vi) \in A} x_{vi} & \forall v \in V\\  \end{array}

which are the flow conservation constraints. Since M is totally unimodular, then if c is integer, it’s possible to show that the polytope \{x| 0 \le x \le c; Mx = 0\} has integral vertices. This polytope also represents the constraints of the max circulation problem.

Now, we can use these observations to prove that the following LP formulation for the max-flow problem has an optimal integer solution:

\max \sum_{(si) \in A} x_{si}

Subject to:

\begin{array}{llclr}   & \sum_{(iv) \in A} x_{iv} & = & \sum_{(vi) \in A} x_{vi} & \forall v \in V \setminus \{s, t\}\\  0 \le & x_{ij} & \le & c_{ij} & \forall (ij) \in A\\  \end{array}

We can see that the constraints matrix of the above formulation is a submatrix of the max circulation problem and by Property 3, it’s also TU, which in turn means the corresponding polytope has integral vertices.


In this post, we introduced the concept of total unimodular matrices and presented two simple examples: the incidence matrix of a bipartite graph and the incidence matrix of a directed graph.

Here’s a cool chart of common polynomial time solvable problems organized by their generality [2].

In future posts, we’ll keep exploring this subject by studying other examples and properties of TU matrices.


[1] Theory of Linear and Integer Programming – A. Schrijver
[2] Marco Chiarandini – Scheduling, Timetabling and Routing DM204 (Lecture Notes)
[3] Ted Ralphs – Integer Programming IE418 (Lecture Notes)