# Flood it! An exact approach

Flood-it is a game created by LabPixies, which was recently aquired by Google.

The game consists of a $n \times n$ board with random colors cells. Let’s call the top-left cell a seed cell and the region connected to the seed cell and with the same color as it, the flooded region. At each round, the player chooses a color for the flooded region which may flood adjacent regions, expanding the flooded region. The final objective is to flood all the board.

Figure 1. Flooding proccess [1]

In the original game, there are three different sizes of boards: 14, 21 or 28. The number of colors is always 6.

A paper from Clifford, Jalsenius, Montanaro and Sach [1], presents theoretical results regarding the general version of this problem, where the size of the board and the number of colors can be unbounded.

In this post we’ll highlight the main results of their paper and present an integer linear programming approach to solve it exactly.

### NP-Hardness

For $C \ge 3$, the game is shown to be NP-hard even if one is allowed to start flooding from an arbitrary cell. The proof given in [1] is based on a reduction from the shortest common supersequence problem (SCS for short).

### Greedy approaches

There are two greedy approaches one might try:

1. Pick the color that maximizes the number of cells covered.

2. The most frequent color in the perimeter of the current region.

In Figure 2, we have an instance where this strategies can be arbitrarily bad [1]. They use $n$ colors while the optimal solution is 3.

Figure 2: A 10×10 board for which the greedy algorithms perform badly [1]

### Approximation algorithms

Surprisingly, the following naïve strategy:

Cycle through all colors until the board is colored.

gives a solution with a value within a factor of the optimal value.

More specifically, if $L$ is the optimal number of movements and $C$ the number of available colors, then this algorithm solves the problem with no more than $C L$ movements. To see why, let $\{c_1, c_2, \cdots, c_L \}$ be the color sequence of the optimal solution. In the worst case, each cycle covers at least one color in the sequence. Thus, this algorithm is $C$ approximated.

This can be refined to $C-1$ observing that the cycle does not need to have the current color of the flooded region.

The authors improve this bound with a randomized algorithm that achieves a $\frac{2c}{3}$ expected factor.

They also give a lower bound, proving that if the number of colors is arbitrary, no polynomial time approximated algorithm with a constant factor can exist unless P=NP.

### Bounds

An upper bound for the number of movements required for solving any $n \times n$ is given by the following theorem:

Theorem 1: There exists a polynomial time algorithm for Flood-It which can flood any n x n board with C colors in at most $2n + (\sqrt{2C})n + C$ moves.

Conversely, we can set an lower bound for a $n \times n$ board:

Theorem 2: For $2 \le C \le n^2$, there exists an $n \times n$ board with (up to) $c$ colors which requires at least $(\sqrt{C - 1})n/2 - C/2$ moves to flood.

That is, we can’t expect an algorithm to perform much better than the one from Theorem 1 for arbitrary boards.

### Integer Linear Programming

Let $C$ be the number of colors and $k$ an upper bound for the optimal solution. A component is a connected region of pixels with same color, considering 4-adjacency. Two components $i$ and $j$ are adjacenct if there exists at least a pixel in $i$ adjacent to a pixel in $j$.

We denote by $N(i)$ the set of components adjacent to component $i$. Let $S$ be the set of components and $m = |S|$. Furthermore, let $c_{i}$ be the color of component $i$.

We define the binary variable $x_{ct}$ that is 1 if color $c = 1, \cdots, C$ is chosen at iteration $t = 1, \cdots, k$ or 0 otherwise. We also define the binary variable $y_{it}$ that is 1 if component $i = 1, \cdots, m$ is filled in some iteration $t$ or 0 otherwise.

For simplicity, we’ll assume that the component of the seed cell has index $i = 1$.

(1) Objective function:

$\displaystyle \min \sum_{c=1}^{C} \sum_{t=1}^{k} x_{ct}$

(2) For each component $i$,

$\displaystyle \sum_{t=1}^{k} y_{it} = 1$

(3) For each iteration $t$,

$\displaystyle \sum_{c=1}^{C} x_{ct} \le 1$

(4) For each component $i$ and iteration $t$,

$\displaystyle y_{it} \le \sum_{j \in N(i)} \sum_{t' = 1}^{t - 1} y_{jt'}$

(5) For each component $i$ and iteration $t$

$y_{it} \le x_{c_it}$

(6) For each component $i = 2, \cdots, k$,

$y_{i,0} = 0$

(7) For the component of seed cell,

$y_{1,0} = 1$

(8) For all component $i$ and iteration $t$,

$y_{it} \in \{0, 1\}$

(9) For all color $c$ and iteration $t$,

$x_{ct} \in \{0, 1\}$

Constraint (2) states that we fill a component exactly in one iteration. Constraint (3) says that at each iteration we pick at most one color. Constraint (4) allow us to fill a component only if any of its adjacent components has been already filled (which means it currently has the same color as the seed pixel) and if its color is the same as the chosen color (5) .

Finally, constraints (6)-(9) state the variables are binary and each component starts unfilled except the component of the seed cell.

### Computational Experiments

We’ve implemented this model in C++ using the COIN-OR Cbc library. The code is available, as always, on github.

The first task was obtaining “real” instances. I don’t know whether the colors of Flood it! boards are uniformly chosen. Thus, I preferred to take some print screens from the original game and do some image processing to convert it matrices with integer representing colors.

Figure 3: A 14×14 Flood it! puzzle

Unfortunately the model is too big if we use $k$ as the number of components, $m$. The number of constraints (4) is then $O(m^3)$ and for the instance in Figure 3, $m = 127$.

In order to reduce its size, we tried two tricks.

First, we changed our model a little bit to decrease the number of contraints (4). Now, $y_{it}$ may also be 1 if component $i$ was covered some iteration before $t$. Thus, we can rewrite (4) as,

(4′) For each component $i$,

$\displaystyle y_{it} \le \sum_{j \in N(i)} y_{j(t-1)}$

But now we need to allow a component to be “covered” if it was covered in a previous iteration, even if the current color does not match. Thus, (5) becomes:

(5′) For each component $i$ and iteration $t$

$y_{it} \le x_{c_it} + y_{i(t-1)}$

We also need to change (2) equality to $\ge$ inequalities. Now, note that there are $O(m^2)$ constraints (4′).

The second trick is based on the assumption that in general, the optimal number of movements is much less than the number of components. Thus, solve the model for increasing values of $k$ starting with the number of colors until we find a feasible solution.

Even with these changes, we were not able to solve the instance in Figure 3. The 5×5 instance obtained from the first rows and cols of the matrix is solved in 2 minutes using 9 colors.

0 0 1 2 2 3 4 0 0 1 0 0 0 2 4 1 3 3 3 4 3 0 4 0 2 

Solution:

1 2 0 1 3 0 3 4 2

For the 6×6 instance obtained the same way, the solver does not find the optimal solution in an hour.

### Conclusion

In this post we gave a brief summary of Clifford et al. paper and then presented a integer linear programming approach to solve instances exactly.

As our experimental results showed, even for the smallest boards (14 x 14) we’re not able to obtain optimal solutions in feasible time.

As future work, we may try devising alternative models or find additional inequalities. Another possibility is to solve this same model using a commercial solver like CPLEX.

### References

[1] Clifford R., Jalsenius M., Montanaro A. and Sach B. – The Complexity of Flood Filling Games (arxiv.org)

## 5 thoughts on “Flood it! An exact approach”

1. Using your connected components, we can find a minimum number of moves to flood any component. Then, look at the components which are a maximum distance away. We might clear one of these components after the maximum distance number of moves, and need an additional move to clear each additional component. If all colors are not cleared, consider components that are maximum distance minus one moves away. Again, we may clear one of the uncleared colors making the maximum distance – 1 move, and we will eventually need another move for each additional uncleared color at that distance. We can keep backing up to earlier distances until we have accounted for every color being cleared.

This lower bound is fairly reasonable. Using this bound and the heuristic of always making a move that clears a color when possible, we can find optimal moves for 14×14 board positions searching on the order of 10,000 boards. For a 14×14 board we may typically estimate a lower bound of 17 to 19 when the actual solution requires 19 to 23 moves.

For example, for the board given in the Greedy Algorithm picture, the blue area can be reached in one move, as can the upper horizontal red stripe. All other areas can then be reached in two moves. There are two colors reachable in 2 moves, requiring at least 3 moves total; the 3rd color may be cleared on the 1st move.

So, in this example, the lower bound is, in fact, the optimal number of moves.

2. For the example you give (assuming I transcribed it correctly, I get:

An initial lower bound of 16.
45 nodes scanned to prove 16 moves is not possible;
336 nodes to disprove 17 moves;
2,556 nodes to disprove 18 moves;
2,914 nodes to find a solution.

solution (Pink, Green, Yellow, Violet, Blue, Red)
[‘p’, ‘g’, ‘p’, ‘y’, ‘v’, ‘g’, ‘y’, ‘g’, ‘v’, ‘p’, ‘r’, ‘v’, ‘b’, ‘r’, ‘b’, ‘v’, ‘p’, ‘y’, ‘g’]

3. Hi,

Thanks for you comment. I understood that the most distant component represents a lower bound, but I’m having trouble to see why it’s true that we can increase this lower bound by coloring the remaining components.

I’ll try to incorporate this lower bound in the algorithm. Did you implement these ideas to get the solution for the 14×14 board?

4. Yes, I implemented the above ideas to solve 14×14.

Define a “bubble” to be a group of connected cells all of the same color. We agree tat the maximum distance in which a bubble might be reachable is a lower bound. If there are multiple bubbles of different colors at that maximum distance, we can only clear one of those colors on the turn we move to the maximum distance. We need an additional turn to clear each remaining color.

So consider a simplified version of the example board above from the Flood-it is NP paper:
OORR
OORR
BBOO
BBRR

The four upper-right R cells are can be flooded in one move. The four lower-left B cells are also distance one away. The bubble of two oranges and the bubble of two reds are each distance two away. We can’t clear both of those colors on the second move, so we will need at least three moves to solve this board.

If the next to last distance contains multiple colors that aren’t contained in the last distance, we need additional moves to clear these extra colors as well.

It’s not that we’re “coloring the remaining components”. Rather, we are counting the colors that need to be cleared.

When we are playing the game ourselves, near the end of the game, we frequently get to a point where one color is blocking many bubbles of other colors. We flood with that color, and suddenly we have multiple colors that can each be removed in one move.

Tonight I implemented your idea of screen capturing the board position and extracting a machine readable version of the flood-it board. Worked great, thanks. So I’m now working on solving a 21×21 board, which is going kinda slow. ;-) My python program doesn’t process very many nodes per second.

So, I need a better estimate of the lower bound. When manually working through a board to find optimal solutions, it appears that sometimes a bubble can be in the shadow of another bubble of the same color. We may be able to use this to prove that a lower bound is too low and needs to be increased.

The basic idea is, for each bubble, to pick the turn on which that bubble will be flooded. By forcing a certain color to be flooded on a certain turn, any other color reachable on that turn must wait to be flooded until the next turn. Using these constraints we can re-create the distance map. If the lower bound derived from the new distance map is worse than the lower bound we re trying to disprove, we cannot flood the bubble on the turn that we tried.

If we can show that there is no turn on which some bubble can be chosen, then we’ve proven that the candidate lower bound is too low. If we can show that there are multiple bubbles that can only be chosen on the same turn, or if there are three or more bubbles that can only be chosen on a pair of turns…

The running time of this algorithm is going to be number of bubbles times some fraction of the lower bound, or perhaps log of the lower bound. This may be a significant chunk of time and may only be useful to get an initial lower bound. Ideally, this approach tells us enough about when bubbles must be taken that it can be used to guide the normal search…

In addition to the heuristic of always choosing a move that clears a color and ignoring other choices, it looks like there is a heuristic that tells us when a move is going to be useless and can be ignored. If we make a move that does not convert any bubbles at distance two into bubbles of distance one, and the move does not clear a color, the move was useless. We can make the move at a later time to more effect. (On the other hand, if our lower bound is any good, after we make the useless move, the lower bound estimate for the new board position will tell us it was a useless move…)

Anyway, I hope I’m not trashing your blog too badly while thinking through aspects of this puzzle…