Amortization and Persistence via Lazy Evaluation

In this chapter Okasaki works around the problem of doing amortized analysis with persistent data structures because the amortized analysis assumes in place modification while for persistent data structures (partial) copies are made. The intuition is that lazy evaluation, which comes with memoization and avoids recomputation, solves this problem.

He adapts the Banker’s and Physicists’s methods to work with lazy evaluated operations and applies them to a few structures including Binomial Heaps, Queues and Lazy Pairing Heaps. In this post we’ll only cover the examples of the Queues using both methods.

We’ll first introduce some concepts and terminology, then we’ll present a queue implementation using lazy evaluation that allows us analyzing it under persistence. Following that we’ll explain the Banker’s and Physicist’s methods and prove that the implementation for push/pop has an efficient amortized cost.

Persistence as a DAG

An execution trace is a DAG where nodes represent the operations (e.g. updates to a data structure) and an edge from nodes v to v' indicates that the operation corresponding to v' uses the output of the one corresponding to v.

For example, if we have this set of operations:

let a = push 0 newEmpty
let b = push 1 a
let c = pop b
let d = push 2 b
let e = append c d
let f = pop c
let g = push 3 d 

The corresponding execution graph is:

Execution Graph

The logical history of an operation v is the set of all operations it depends on (directly or indirectly, and including itself). Equivalently, in terms of the DAG, it’s the set of nodes that have a directed path to v.

A logical future of an operation v is any directed path from v to a terminal node.

A framework for analyzing lazy evaluated data structures

We need to introduce a few more concepts and terminology. Note: we’ll use suspension and lazy operation interchangeably, which can be evaluated or forced.

The unshared cost of an operation is the time it would take to execute it if it had already been performed and memoized before, so if the operation involves any expression that is lazy, that expression would be O(1).

The shared cost of an operation is the time it would take it to execute (force) all the suspensions created (but not evaluated) by the operation.

The complete cost is the sum of the shared and unshared costs. Alternatively, the complete cost of an operation is the time it would take to execute the operation if lazy evaluation was replaced by strict. To see why, first we note that the unshared costs have to be paid regardless of laziness. Since we’re assuming no laziness, the operation has to pay the cost associated with the suspension it creates, which corresponds to the shared costs. Note that under this assumption we wouldn’t need to account for the cost of forcing suspensions created by previous operations because in theory they have already been evaluated.

When talking about a sequence of operations, we can break down the shared costs into two types: realized and unrealized costs. The realized costs are the shared costs from suspensions were actually forced by some operation in the sequence. Example: say that operations A and B are in the sequence and A creates a suspension, and then B forces it. The cost for B to force it is included in the realized cost. The unrealized costs are the shared costs for suspensions that were created but never evaluated within the sequence. The total actual cost of a sequence of operations is the sum of the realized costs and the unshared costs.

Throughout a set of operations, we keep track of the accumulated debt, which starts at 0 at the beginning of the sequence. Whenever an operation is performed, we add its shared cost to it. For each operation, we can decide how much of this debt we want to pay. When the debt of a suspension is paid off, we can force it. The amortized cost of an operation is its unshared cost plus the amount of debt it paid (note that it does not include the realized cost). Note that as long as we always pay the cost of a suspension before it’s forced, the amortized cost will be an upper bound on the actual cost.

This framework simplifies the analysis for the case when a suspension is used more than once by assuming that its debt was paid off within the logical history of when it was forced, so we can always analyze a sequence of operations and don’t worry about branching. This might cause the debt being paid multiple times, but it simplifies the analysis.

The author uses the term discharge debit as synonym of pay off debt. I find the latter term easier to grasp, so I’ll stick with it throughout this post.

Let’s introduce an example first and then proceed with the explanation of the Physicist’s method and the corresponding analysis of the example.

The Stream Queue

To allow efficient operations on a queue in the presence of persistence, we can make some of the operations lazy. Recall in a previous post we defined a queue using two lists. To avoid immediate computation, a natural replacement for lists is using its lazy version, the stream data structure, which we also talked about in a previous post.

For the list-based queue, the invariant was that if the front list is empty, then the rear list must be empty as well. For the stream queue, we have a tighter constraint: ‘front’ must be always greater or equal than ‘rear’. This constraint is necessary for the analysis.

The definition of the stream queue is the following:

We store the lengths of the streams explicitly for efficiency.

We’ll be using the Stream developed in the previous chapter, so we’ll refer to the module Stream2 to avoid ambiguity with the standard Stream module.

Inserting an element at the end of the queue is straightforward, since the rear stream represents the end of the queue and is reversed:

The problem is that inserting at rear can cause the invariant of the queue to be violated. check() changes the structure so to conform to the invariant by potentially reversing rear and concatenating with front:

Removing an element from the queue requires us to evaluate the first element of the front stream. Again, the invariant can be violated in this case so we need to invoke check() again:

The complete code for the stream queue is on Github.

Analysis using the Banker’s Method

The idea of the Banker’s Method is basically define an invariant for the accumulated debt and a strategy for paying it off (that is, decide how much debt each operation pays off). Then we show that whenever we need to force a suspension, the invariant guarantees that the accumulated debt has been paid off. One property of the Banker’s method is that it allows associating the debt to specific locations of the data structure. This is particularly interesting for streams, because it contains multiple (nested) suspensions, so we might force parts of this structure before we paid the debt associated with the entire structure.

By inspection, we can see that the unshared cost of both push and pop are O(1). It’s obvious in the case of push, and in the case of pop, in theory check could take O(m) where m is the size of the queue, but since Stream2.concat() and Stream2.reverse() are both lazy, and hence memoized, they are not included in the unshared costs.

To show that the amortized cost of both operations is O(1), we can show that paying off O(1) debt at each operation is enough to pay for the suspension before it is forced. For the queue, we also need to associate the debt with parts of the data structure, so that we could force the suspension of only some parts of it (for example, on the stream we can evaluate only the head, not necessarily the entire structure).

We now define an invariant that should be respected by all operations. Let d(i) be the debt at the i-th node on the front stream, and D(i) = \sum_{j=0}^{i} d(j) the accumulated debt up to node i. The invariant is:

D(i) \le \min(2i, |f| - |r|)

This constraint allows us evaluating the head at any time, because D(0) = 0, which means its debt has been paid off. The second term in min(), guarantees that if |f| = |r| the entire stream can be evaluated, because D(i) = 0 for all i.

The author then proves that by paying off one debt in push() and two debt units in pop() is enough to keep the debt under the constraint.

Queue with suspended lists

Because the Physicist’s method cannot assign costs to specific parts of the data structure, it doesn’t matter if the structure can be partially forced (like streams) or if it’s monolithic. With that in mind, we can come up with a simpler implementation of the queue by working with suspended lists instead of streams. Only the front list has to be suspended because the cost we want to avoid, the reversal of the back list and concatenation to the front list, happens on the front list.

On the other hand, we don’t want to evaluate the front list when we perform a peek or pop, so we keep a evaluated version of the front list too.

The signature of the structure is as follows:

As mentioned in the code above, the invariants we want to enforce is that the front list is never smaller than the rear list

and that the evaluated version of the front list is never empty if the lazy version still has some elements.

The push and pop operations are similar to the other versions of queue, but since we mutate the structure, we might need to adjust it to conform to the invariants:

Finally, because of our second invariant, peeking at the queue is straightforward:

The complete code for the suspended queue is on Github.

Analysis using the Physicist’s Method

We’ve seen the Physicist’s Method in a previous post when we’re ignore the persistence of the data structures. We adapt the method to work with debits instead of credits. To avoid confusion, we’ll use \Psi to represent the potential function. \Psi(i), represents the accumulated debt of the structure at step i. At each operation we may decide to pay off some debit, which will be then included in the amortized cost. We have that \Psi(i) - \Psi(i - 1) is the increase in debt after operation i. Remember that the shared cost of an operation corresponds to the increase in debt if we don’t pay any of the debt. Thus, we can find out how much debt was paid off then by s_i - \Psi(i) - \Psi(i - 1), where s(i) is the shared costs of operation i. Let u(i) and c(i) be the unshared and complete costs of the operation i. Given that, by definition, c(i) = u(i) + s(i), we can then express the amortized cost as:

a_i = cc_i - (\Psi(i) - \Psi(i - 1))

To analyze the suspended queue we need to assign values to the potentials such that by the time we need to evaluate a suspension the potential on the structure is 0 (that is, the debt has been paid off). For the suspended queues we’ll use the following potential function:

\Psi(q) = \min(2|w|, |f| - |r|)

Where w is the forcedFront, f is lazyFront and r is rear.

We now claim that the amortized cost of push is at most 2. If we push and element that doesn’t cause a rotation (i.e. doesn’t violate |f| \ge |r|), then |r| increases by 1, and the potential decreases by 1. No shared is incurred and the unshared cost, inserting an element at the beginning of rear is 1, hence the amortized cost for this case is 1 – (-1) = 2. If it does cause a rotation, then it must be that after the insertion |f| = m and |r| = m + 1. After the rotation we have |f| = 2*m + 1 and |r| = 0, but w hasn’t changed and cannot be larger than the original f, so the potential function is at most 2*m. The reversal of r costs m + 1 and concatenating to a list of size x costs x (discussed previously), plus the cost of initially appending an element to read, so the unshared cost is 2*m + 2. No suspensions were created, so the amortized cost is given by 2*m + 2 - (2*m) = 2.

Our next claim is that the amortized cost of pop is at most 4. Again, if pop doesn’t cause a rotation, |w| decreases by 1, so the potential is reduced by 2. The unshared cost is 1, removing an element from w, and the shared cost, 1 comes from the suspension that lazily removes the head of lazyFront. The amortized cost is 2 – (-2) = 4. Note that if w = 0, we’ll evaluate f, but the ideas is that it has been paid off already. If the pop operation causes a rotation, then the analysis is similar to the push case, except that the complete cost is must account for the shared cost of lazily reming the head of lazyFront, so it’s 2*m + 3, for an amortized cost of 3.

Note that when the suspensions are evaluated, the potential is 0, either when |f| = |r| or |w| = 0.


In this post we covered a simple data structure, the queue, and modified it to be lazy evaluated and can show, with theory, that it allows for efficient amortized costs. The math for proving the costs is not complicated. The hardest part for me to grasp is to get the intuition of how the analysis works. The analogy with debt is very useful.

Meta: Since wordpress code plugin doesn’t support syntax highlighting for OCaml, I’m experimenting with the Gist plugin. Other advantages is that Gists allow comments and has version control!

Notes on Coursera’s Functional Programming Principles in Scala


I’ve just concluded the Functional Programming Principles in Scala class offered by École Polytechnique Fédérale de Lausanne through Coursera. The lecturer is Martin Odersky, the creator of the language. The course is very easy and lasts 4 weeks.

In this post I’ll write down my notes from a perspective of someone who knows basic Java and functional programming, that is, this will cover mostly syntax.

The first thing we need to do is install the Java compiler version 1.8.*, by downloading the latest SDK. Then we download sbt, the Scala Build Tool, which is a framework that handles compiling, testing (similar to Java’s Ant and Maven) and also offers a console (REPL). On mac we can get sbt via brew (0.13.8)

brew install sbt

The recommended project organization is similar to Maven’s but having separate directories for Java and Scala:


The target subdirectory is for the output of generated files.

The build definition file is named build.sbt and is located at the top level directory. For the sake of this introduction, we’ll use two tools to test and run the code: sbt‘s console and a basic skeleton for running code in a file (for larger programs). The skeleton is a scala file located in /src/main/scala/Example.scala:

object Example extends App {
  println("hello world");

  // We'll add example code down here

and in sbt we can just do:

> run
hello world

Don’t worry about object and App. We’ll cover them later. Like in JavaScript, semi-colons to end expressions are optional and there’s no consensus on whether to always use them or only in the rare cases they’re actually needed. In this post, I’ll omit them for simplicity though in practice my personal preference is to always use semi-colons.


There are 3 ways to define a variable, def, val and var. The difference between val and var is simple: a variable declared using val cannot be reassigned whereas if it is declared with var it can (note that if a val variables points to a mutable object, the object itself can be modified). We should use val whenever possible because it makes it easier to reason about code.

For val/var vs def, the difference is more subtle. Let’s first try these examples using sbt‘s console tool:

// Using def
scala> def xdef = util.Random.nextInt
scala> xdef
Int = 387633735
scala> xdef
Int = 305498057

// Using val
val xval = util.Random.nextInt
scala> xval
Int = 1203502093
scala> xval
Int = 1203502093

We can see that for xdef, the value returned by xdef is different every time, while for xval it’s the same. For val, the code in the assignment is executed only once, while for def, the code is executed every time it’s called. We can think of def as an alias to the RHS. def can also be used to define functions as we’ll see next.


Functions can be defined using def and the parameter type is written after the param itself and the return type is written after the parameters block. A simple example:

def add(a: Int, b: Int): Int = a + b

We can invoke the function as:

val result = add(3, 4)

Curly braces are optional for one-liners, but we need them for multi-line functions:

def add(a: Int, b: Int): Int = {
  val a2 = a
  val b2 = b
  a2 + b2

The last line is always returned as value, so we don’t use a return keyword even for multi-line functions. We can also define nested functions, which is convenient when we don’t want to define auxiliary functions. For example, if we want to implement a tail-recursive factorial function:

def factorial(n: Int) = {
  @tailrec def factorialAux(n: Int, f: Int): Int = {
    if (n == 0) f
    else factorialAux(n - 1, f * n)
  factorialAux(n, 1)

In the code above we see factorialAux() defines inside factorial so it’s not visible to other functions. We also made use of the if/else construct to handle the base case and because the auxiliary function is tail recursive, we can annotate it with @tailrec to hint to the compiler it can be optimized (and avoid using a stack for the recursion).


We can also have lambda/anonymous functions. In the code below we define squarer:

def squarer = (x: Int) => x * x
println(List(1, 2, 3) map squarer)

In the code above we’re also creating a List and using its map() method. In Scala methods can be written without the . and if there’s only one argument, parenthesis can be omitted.

There’s an even shorter version in which underscores are used instead of variables. The parameters are assigned to the underscore in order. For example:

(a, b) => a + b

Can be written as

_ + _

This form is only valid when we’re passing it directly as argument, like below, but it cannot be assigned to a variable like regular lambdas:

println(List(1, 2, 3).reduce(_ + _))

Partial application

Scala supports partial application but, differently from languages like OCaml and Haskell, this is something the function definition must be explicit about. We do that by grouping arguments in parenthesis, for example:

def sum(a: Int)(b: Int): Int = a + b
def increment = sum(1) _

In the code above we can do a partial application on the first argument of sum() but we need to make use of the _ on the remaining arguments.

Var arg

Functions support a variable length list of arguments:

def varArgFun(args: (Int)*) = println(args)
varArgFun(1, 3, 4)


Fun fact: Odersky was one of the responsible for adding generics to Java.

Before describing generics, let’s discuss type hierarchy in Scala. The main types are depicted in the image below. In particular we have Any which is the super type of all types and Nothing which is a subtype of all types. We also have the division between native types and object types. In the diagram we can also see dashed arrows, which means the types are not strictly subtypes but they can be coerced to the other (for example Ints can be converted to Longs)

Type Hierarchy (click for the full-size version)

Class Hierarchy (click for the full-size version)

In Scala generics are represented between [] (as opposed to the <> in Java). We can define functions that take a generic parameter as follows:

def singleton[T](a: T): T = a

From the invocation side, we can be explicit about which type we’re binding the generic T to, or let the compiler figure it out.

// Explicit Int type
println(singleton[Int](3)) // 3
// Explicit double type - casts 3 to double
println(singleton[Double](3)) // 3.0
// Implicit
println(singleton("str")) // str

We can also use constraints to limit which types we can provide. We can either constrain the types to be subtypes of a particular type ([T <: SuperType]), a supertype of a particular type ([T >: Subtype]) or both ([T >: Subtype <: Supertype]).

For example:

def identityIterable[T <: Iterable[Int]](a: T): T = a

This is slightly different from

def identityIterable(a: Iterable[Int]): Iterable[Int] = a

We can pass a subtype of Iterable, such as List[Int], to both versions but the first returning type is List[Int] while the second return’s type is Iterable[Int].

Covariance and contra-variance

This question arises often when talking about generics. If Derived is a subclass of Base, is List[Derived] a subclass of List[Base]? Is it different for Arrays? We can try both examples:

def takesBaseList(a: List[Base]) = Nil
def takesDerivedList(b: List[Derived]) = takesBaseList(b)

// Compile error
def takesBaseArray(a: Array[Base]) = Nil
def takesDerivedArray(b: Array[Derived]) = takesBaseArray(b)

We’ll see the code compiles for Lists but not for Arrays. This means that List[Derived] is a subclass of List[Base] while Array[Derived] is not a subclass of Array[Base]. Because of this property, we say that List[T] are covariant on the type T, while Arrays are not. The key difference is that Lists are immutable. The problem arises if we takesBaseArray does the following:

class Base {}
class Derived extends Base {}
class AnotherDerived extends Base {}

def takesBaseArray(a: Array[Base]) = {
    a :+ new AnotherDerived()

Now, if we pass a Array[Derived] as argument to takesBaseArray, because it’s mutable, the array passed as argument would have an element of AnotherDerived, even though its type is Array[Derived].

We can specify whether a class is covariant on type T, by adding a plus sign, for example,

class List[+T] ...

In general terms, a class C is covariant on a generic type T if, given a type Td which is a subtype of type Tb, then C[Td] is a subtype of C[Tb]. The contravariant property is when the implication is reversed, that is, if Td is a subtype of type Tb, then C[Tb] is a subtype of C[Td].

It’s less common for a class to be contravariant on a type. An interesting example presented in the videos is if we model a function that takes a single argument of type Tk and returns a value of type Tk. We can model this as a class:

class MyFunction[Tk, Tv](f: (Tk) => Tv) {
  def apply(x: Tk): Tv = f(x)

Now say that we define a higher order function that takes our function type and simply calls apply. Say it expects a function type that takes a Derived and returns Base. Could we pass a function with different type to higherOrderFunction?

def higherOrderFunction(f: MyFunction[Derived, Base]) = {
  val x = f(new Derived()).apply

Because of the parameter type, we know higherOrderFunction will only provide values of type Derived (or subtypes) to f, so we can pass any function that takes a super type of Derived. On the other hand, we know that higherOrderFunction() will only use methods from x (i.e. the return value of f) defined in Base, so we can pass any function that returns any subtype of Base. In other words, we could pass a function of type MyFunction[Base, Derived] to higherOrderFunction(). In general case, MyFunction is covariant on its return type but contravariant on its input type, so we can modify our definition to:

class MyFunction[-Tk, +Tv](f: (Tk) => Tv) {
  def apply(x: Tk): Tv = f(x)

This part is a bit confusing to understand, but having a concrete example helped me.

Implicit parameters

Now that we’ve seen generics, we can go back to functions and talk about implicit parameters. If we have

def foo(e: T, implicit ord: Ordering) {...}

A natural example for using implicit arguments is for sorting. We’ll use wrap List’s sortWith in a function, listSort, so that we can test passing an implicit comparator. See the example below with comments:

// First, we define the interface for the comparator
abstract class Comparator[T]{
  def compare(v1: T, v2: T): Boolean

// Because implicit cannot be applied to top-level objects, we'll
// define them within our sample App
object Example extends App {

 // Now we implement the comparator for Int and String. The
 // have to contain the 'implicit' modifier
 implicit object IntComparator extends Comparator[Int] {
   def compare(v1: Int, v2: Int): Boolean = v1 < v2;
 implicit object StringComparator extends Comparator[String] {
   def compare(v1: String, v2: String): Boolean = v1 < v2;

 // We have to define the implicit parameter as partial argument 
 // so the caller can omit it.
 def listSort[T](l: List[T])(implicit cmp: Comparator[T]) = l.sortWith(

 // Testing with a list of Ints
 println(listSort(List(3, 2, 1)))
 // Testing with a list of Strings
 println(listSort(List("a", "c", "b")))
 // Error: implicit parameter not implemented.
 println(listSort(List(List(1), List(1, 3), List(1, 2, 3))))

A few things on the code to highlight:

* We have to define the implicit parameter as partial argument
* We have to implement the implicit objects within another class/object
* If a type doesn’t have an implicit implementation, compilation fails.
* If a type has two or more implicit implementation, compilation fails due to ambiguity. We can see this by implementing another comparator for strings:

object Example extends App {
 implicit object StringComparator extends Comparator[String] {
   def compare(v1: String, v2: String): Boolean = v1 < v2;

[2] has more information about implicit.


At this point we’ve seen classes already, but in this section we’ll talk about their specific features and syntax. Below is a minimal example of a class representing an integer. It covers constructors, public/private methods, overriding methods from ancestors and defining operators.

class MyNumber(x: Int) { // <- implicit constructor
  private val privateMember = x;

  // Explicit constructor
  def this() = this(0)

  // Methods are public by default
  def publicMethod() = 1

  // Private method
  private def privateMethod() = 1

  // Overriding methods
  override def toString = privateMember.toString

  // Handy operator (e.g. instead of sub)
  def - (that: MyNumber) = 
    new MyNumber(privateMember - that.privateMember)

  // Unary operators: Convention
  // 'this' is optional but can be used when 
  // accessing a member/method
  def unary_- = new MyNumber(this.privateMember)

Abstract classes

Abstract classes are defined by adding the abstract modifier. We don’t need to add these modifiers to abstract methods, just leave them “un-implemented”. Abstract classes can provide method’s implementation too. Here’s an example:

abstract class Base {
  def toBeImplemented(v: Int): Int;

  def implemented(v: Int): Int = v + 1;


Scala doesn’t have interfaces, but it has traits. It looks very similar to abstract classes in that they cannot be instantiated and can have abstract methods. The main difference is that a class can “extend” or use multiple traits. For example, say we have two traits:

trait TraitA {
  def abstractMethod(x: Int): Int
  def methodA(): String = "a";

trait TraitB {
  def methodB(): String = "a";

trait TraitC {
  def methodC(): String = "c";

A class can extend a trait as it would extend an abstract class, but additional traits have to use the with construct:

class UseTraits extends TraitA with TraitB with TraitC {
  // Needs to be implemented
  def abstractMethod(x: Int): Int = x + 1

If any of the traits have methods with colliding signatures, a compilation error happens. Note that traits are basically multiple-inheritance, something that is not allowed in Java. It’s something that can be abused, but I do like to use in rare some cases.

Objects and static methods

Objects are basically a singleton (or a utils/helper class). They cannot be instantiated but its methods are equivalent to static methods in Java:

object MyObj {
  def get(): String = "value";


Interestingly, in Scala regular classes cannot have static methods. Either all methods are static (object) or none are (class). The workaround is that objects and classes can have the same name, so in practice we can keep static methods in the object and non-static in the class:

class MyEntity {
  def nonStatic(): String = "non-static";
object MyEntity {
  def static(): String = "static"

I actually like this separation. Public static methods often don’t belong to a class, but are rather helper or utils. By forcing them out of the class, it encourages placing them in an object which might have a more general scope. For example, we could have a class User and we have a function called titleize(), which would return the name with the first capitalized.

class User {
  def getName(): String = User.titleize(
  // NOTE: This is invalid syntax. It's just for 
  // demonstration purposes.
  def [static] titleize(name: String): String = ...

If we’re forced to move it to a different class, we could take a step up and put it into a help class called StringUtils or NameUtils, which is a higher level of abstraction and hence more re-usable.


Pattern Matching

In Scala we can to type matching, similar to OCaml‘s. The only caveat is that we need to explicitly add the case modifier to a class in order for them to be matched against.

trait Expr
case class Number(n: Int) extends Expr
case class Sum(e1: Expr, e2: Expr) extends Expr

Now we can match a generic Expr to either Number of Sum, including destructuring the constructor arguments:

def show(e: Expr) = e match {
  case Number(x) => x.toString
  case Sum(x, y) => show(x) + " + " + show(y)

We can also have destructuring assignments for some structures such as tuples and lists:

val (first, second) = (10, 11);
println(first, second);
val (x::xs) = List(1, 2, 3)
println(x, xs)


In this post we covered several aspects of the Scala language, which were introduced during the Coursera class, with some extra research here and there to understand the features better. We ended up not covering the data structures, because the post was getting too long.

This course is very easy for people with prior experience in Java and functional programming. It’s interesting to learn about specific language designs and the associated tradeoffs.

I’m looking forward to completing the other Scala classes in the series.

Further Reading

* Code Commit – Scala for Java Refugees (part 1 | part 2 | part 3)


[1] Stack Overflow – What is the difference between “def” and “val” to define a function
[2] Scala Documentation – Implicit Parameters

Domestic server using Raspberry Pi

There are tons of tutorials on setting up a domestic server using a Raspberry Pi. I’ll add one more to the mix by describing my experience and lessons learned in creating a simple server.

Raspberry Pi

Raspberry Pi

Before starting, let’s first introduce some concepts and terminology. If you already know the basics of IP (Internet Protocol), feel free to skip to the next section (Hardware).


The scenario we’re using as example is a typical home setup, in which we have a bunch of devices that are able to access the internet through a router. The connection between these devices and the router form a private network.

IP Address. The router is assigned a public IP address by the ISP (Internet Service Provider – e.g. Comcast, AT&T, Verizon, etc). This IP usually changes from time to time, so it’s also called dynamic IP.

An IP (IPv4) address is a 32-bit integer often written in 4 groups separated by dots. For example, The creators of the IP address didn’t envision such an explosive growth of the internet and we’re now running out of IPv4 addresses. With that in mind, a new version, called IPv6, was designed, which uses 128 bits. IPv6 is not fully deployed yet, and for the purpose of this tutorial we’ll assume IPv4 throughout.

Public and Private IP Addresses. Because of the shortage of IPv4 addresses, we don’t have the luxury to assign a unique IP address to every possible device that exists. To work with that, only the router has a unique IP address (public IP). The devices in the local network as assigned what we call private IPs. While within one local network private IPs addresses must be unique, they don’t have to be unique across multiple local networks (for example, my laptop might have the same private IP address as yours but my router will have a different public IP address than yours).

To avoid confusion and routing problems, the set of IP public and privates addresses are disjoint. The private IPs must fall within these 3 ranges: ( to, ( to and ( to

It’s up to the router to assign the IP addresses to the devices in local area network.

Let’s analyze two use cases: a computer acting as a client and another where it acts as a host.

Client. A computer in the private network mostly acts as a client to some remote server, for example, when we open an URL using a browser (aka agent) and receive back some HTML page.

Behind the scenes, first we need to resolve the URL to an actual IP address (the address of the host that will serve the HTML page). The operating system will request this by talking to a DNS server, which has a mapping from domains to IPs.

Then it executes an HTTP (or HTTPS) request to that remote address, using port 80. Since we need to receive the response back, we also need to send the source’s IP. The problem is that the IP of the sender is a private IP and is not supposed to be used externally. To avoid that, the router will assign a random port and associate to that private IP address and send its own public IP address plus this random port, so that when it receives the response back it can route it back to the computer. Because it translates an internal IP to external IP and vice-versa, we also say the router is a NAT (Network Address Translation) gateway.

Server. In a less common scenario, and one which we explore in this tutorial, is when one computer in our private network serves as a host that external agents can talk to.

In this case we need to register a domain to get a user-friendly URL that maps to our external IP. We also need to instruct the router how to route the request to the computer acting as host. Since the external agent doesn’t know about the internals of our network, only about the external IP address, we manually need to tell the router what to do, and we do this via port forwarding. When a request to our external IP is made with a specific port, we’ll have a rule which tells the router to forward the request to a specific private IP address.

Wikipedia has much more information on this subject. With this brief introduction, we’re ready to start the tutorial:


I got the Raspberry Pi Model B+ (back in 2014), but as of January 2017, there’s already the Raspberry 3 Model B with much better specs. For the record, my pi has the following specs:

* CPU: 700MHz Broadcom BCM2835, ARM architecture
* RAM: 512 MB SDRAM @ 400MHz
* 10/100 Ethernet RJ45 on-board network

As peripherals:

* Wi-fi USB card: Edimax 150Mbs
* SD card for storage: Samsung EVO 32GB of space and 48MB/s transfer.
* Zebra Case (see photo below)
* A micro USB power adapter (5V/2000mA)

All these totalled around ~$100.


OS. I decided to try the Raspbian OS, which is a fork of Debian (wheezy) adapter for Raspberry. We first download it and write the image to the SD card.

We can then insert the card into the Raspberry Pi and connect a monitor/keyboard/mouse. The boot process will ask us to fill in some information and should be straightforward. The wi-fi adapter worked out of the box.

SSH. After installed, it felts very sluggish to run GUI, so I decided to do everything through SSH. The instructables has a very detailed guide for enabling it through the UI. After changing the password as instructed in the guide, I go the internal IP of the Pi using

hostname -I

I can then connect through:

ssh pi@<IP>

We can then install everything through command line.

Text editor and webserver. I installed my favorite editor and the server nginx (we could have used Apache’s HTTP server alternatively).

sudo apt-get update
sudo apt-get install emacs nginx

To test if the server is installed properly, we can run it out of the box:

sudo /etc/init.d/nginx start

If you put the Pi’s IP on the browser address bar in your laptop you should be able to see the default page served.

Server Configuration

We can make some changes in the default nginx config to ease development.

We can edit the configuration file /etc/nginx/sites-available/default (has to be sudo). The first thing I changed is for it to read files from my home folder instead of /usr/share/nginx/www

server {
    root /home/pi/www;

Since I’m using for private purposes, I turned HTTP authentication on. First we need to register a login entry in a .htpasswd file using htpasswd application. This can be obtained in Debian via

sudo apt-get install apache2-utils

Then we run:

sudo htpasswd -c /etc/nginx/.htpasswd <USER>

replacing with the username you’ll provide at login. When running the command above, you’ll be prompted to provide the password. The user name and an encrypted password will be saved to /etc/nginx/.htpasswd. Now we can configure nginx to use credentials from that file to perform the access check:

server {
    auth_basic "Private Site";
    auth_basic_user_file /etc/nginx/.htpasswd;

We can now add a custom HTML file to /home/pi/www (or whatever path you put in the nginx config), such as /home/pi/www/index.html

  <title>Pi's webpage</title>
    Hello world

Restart the server and reload the page, and you should get the new custom page!

sudo /etc/init.d/nginx restart

In a future post we’ll see how to work with a Node.js server, but this is as far as we’ll go in this first tutorial.

Network Configuration

Static Internal IP. To make sure the internal IP of the Pi doesn’t keep changing you might need to configure your router. My router is a MediaLink 300N, which stores a table of MAC addresses (a unique identifier for your hardware) to internal IPs automatically so I don’t have to do anything.

Static External IP. The remaining problem is your external IP. Unless you have asked for static IP, chances are that your ISP (mine is Comcast) will change the external IP from time to time, so you don’t have much control over that.

Dynamic DNS. To solve that, first we need to get a domain (I registered a new one via Google domains). You can configure it to point to a specific IP (your external IP) which will be stored in a DNS. The problem, as we said, is that your external IP might change, so we need to update the mapping from periodically.

We don’t want to do this manually, so we can use a system like ddclient which runs a daemon on your server machine (the Pi in our case) that will periodically check the external IP and update the DNS entry with the new IP in case it has changed.

To install we can simply do

sudo apt-get install ddclient

We then need to configure it so it knows where to go to update the entry. The file lives in /etc/ddclient.conf (need to edit as sudo). The configuration will depend on what is your domain provider. For google domains it will look like:


There’s a good tutorial for how to setup ddclient using Google domains.

To run the daemon, we can do

sudo ddclient -debug

Port Forwarding. When we access an URL like, it’s implicitly assuming port 80 when routing that domain to the actual IP. We’ll need to tell our router to forward that request to a specific internal IP (otherwise how does it know whether it should go to your laptop or the pi?). Most routers offer a way to perform this mapping, which is also known as port forwarding.

For my MediaLink router it’s under Advanced Settings > Virtual Server > Port Range Forwarding.

NOTE: There’s one current issue I haven’t been able to figure out. The port forwarding seem to only work if I access it from outside of my local network, that is, through my phone network or via some VPN. It might be some issue with MediaLink.


In this post we learned some details of the Internet Protocol and learned how to configure a Raspberry Pi to act as a server in a domestic network.


[1] How-To Geek – How to Forward Ports on your Router.
[2] Wikipedia – IP address
[3] Server Fault – How does the HTTP GET method work in relation to DNS protocol?
[4] Wikipedia – Classful network
[5] Page to test if a specific port is open and being forwarded
[6] Setting up a nginx server on a Raspberry Pi

2016 in Review

This is a meta-post to review what happened in 2016.

This year I improved my knowledge on Web Development, learning more about HTTPS, JavaScript development, Web workers. I read a book about human-computer interaction, The Design of Everyday Things.

From my last year’s resolutions, I finished reading Code Complete. I’ve started learning about OCaml and started reading Purely Functional Data Structures by Okasaki, while implementing the data structures introduced in this book in OCaml. I’m still only 1/4 of the way in, so I’ll keep it in my 2017 goals.

I’ve only managed to try out one data visualization project, the hex map, and I’ll continue exploring this area next year.

I missed enrolling in any Coursera classes and learning about either Scala or Spark, unfortunately.


The end of the year is a good time to look back and remember all the things I’ve done besides work and the technical blog.


In 2016 I was lucky to have travelled a lot. In the beginning of the year, I did a road trip around Arizona, where I visited some beautiful National Parks and monuments, and a Biosphere, which inspired a blog post!

Arizona: Saguaro NP, Chiricahua Monument and Blue Mesas at Petrified Forest NP

1. Arizona: Saguaro NP; 2. Chiricahua Monument; 3. Blue Mesas at Petrified Forest NP

I’ve also been back to Brazil, which included a short trip to Caldas Novas in Goiás, where many resorts and hot springs are located.

Then I had an opportunity to work for a month in Tel Aviv, Israel. During my free time I visited Jerusalem, the Dead Sea and Masada National Park, and Eilat. I also visited the magnificent Petra, the ancient city carved on stone, in Jordan. This was the most fantastic and memorable trip of the year.

1. Yehuda Market in Jerusalem; 2. ruins in Petra, Jordan; 3. a beach in Tel Aviv, Israel

1. Yehuda Market in Jerusalem; 2. ruins in Petra, Jordan; 3. a beach in Tel Aviv, Israel

Later in the year I’ve been to Yellowstone and Grand Teton National Parks, and also stopped by some museums in Salt Lake City in the way.

1. Prismatic springs; 2. Yellowstone falls; 3. Grand Teton NP

1. Prismatic springs; 2. Yellowstone falls; 3. Grand Teton NP

Shorter trips included Austin TX, Seattle WA, Mammoth Lakes CA and Las Vegas NV. It was a pretty intense year in terms of travelling, including 2 new countries and 5 new US states. I’m very grateful for being able to see these places and I hope 2017 will also be plenty in exploring.


I read some really good books in 2016. Harari’s Sapiens: A Brief History of Humankind and Dawkins’ The Selfish Gene were my favorite science books. Empires of Indus is a great book on the history of civilization around the Indus river.

Farewell to Manzanar is a touching biography of Jeanne Wakatsuki focusing on internment camps Japanese Americans were sent to, in particular Manzanar, during the Second World War. I had a chance to visit the historical site that exists there today.

I haven’t read much fiction but Steinbeck’s Of Mice and Men and Morgenstern’s The Night Circus were entertaining.

As for biographies, my favorite is Einstein‘s by Walter Isaacson. I had enjoyed his work on Steve Jobs and he didn’t disappoint here. Einstein is one of the few scientists I also admire as a person.


I’m not a big movie watcher, but this year I made a point of watching well known classic movies and it was rewarding. I’ve seen: A Clockwork Orange, Empire of the Sun, The Godfather, Schindler’s List, 2001: A Space Odyssey, Seven Samurai, Akira, My Neighbor Totoro and Gandhi.

The Blog in 2016

The most popular post was the Introduction to the Parsec Library, with 2k visits. No post from this year got popular, all of them below 50 visits :( Overall it had 7.5k visitors.

I kept the resolution to post once a month. I missed April, but I made up for it in November, so I wrote a total of 12 posts (excluding the meta post). The blog completed 4 years with 51 posts.

And we now have a new domain, :)

Resolutions for 2017

I missed many resolutions from last year, so I’ll carry some over, which includes finishing Okasaki’s Purely Functional Data Structures and learning about Scala and Spark. I’d like to write an iPhone app next year too and continue exploring ideas around data visualization.

Amortization Analysis

In Chapter 5 of Purely Functional Data Structures, Okasaki introduces the amortization analysis as preparation for the complexity analysis of functional data structures.

The idea of amortized analysis is to prove the complexity of an operation when the average cost per operation is less than the worst case cost of each individual operation.

In this post we’ll perform analysis of 2 data structures and show that their amortized running time is much better than their worst case. There are 2 methods presented in the book, the Banker’s Method and the Physicist Method. I found the Physicist easier to work with, so I’ll only use that. Also, I find it easy to understand concepts with an examples, so we’ll first introduce the data structure and then introduce the analysis.

Queue with lists

An easy way to implement the queue data structure is by using two lists, one representing the beginning (left) and the other the end (right). Inserting/removing an element from the beginning of a list in OCaml is O(1), while doing it at the end is O(n). For this reason, we store the end of the queue reversed in the right list. (In imperative languages the opposite is true, insertions/removals at the end are O(1) and at the beginning O(n), so the left array that is reversed).


Queue using lists representing [1, 2, 3, 4, 5, 6]

Insertions are performed by appending an element to the beginning of the right array. Removing from the front of the queue requires removing the first element of the left array. When the left array is empty, we need to reverse the contents of the right array and move it to the left.

We work with an invariant, in which the left array is only empty if the right array is empty as well. That requires a slight change to the removal, where we perform the reversal and move of the right array when the left array has only one element and is about to become empty.

I’ve implemented this idea in Ocaml.

Analysis – The Physicist Method

If we only look at the worst case, we will get that removing from a queue is an O(n) operation. We can improve this by looking at the amortized cost. One way to perform this analysis is via the physicist method, in which we define a potential (or cost) function \Phi(d) for the data structure d. In the queue case, we can define it to be the size of the right list.

The amortized cost of step i, a_i is defined by

a_i = t_i + \Phi(d_i) - \Phi(d_{i-1})

Where t_i is the actual cost (in number of operations)

The intuition is that some costly operations such as reversing the list, which takes O(n) operations, can only happen after O(n) insertions happened, so the average cost per operation is O(1). Translating it to the potential analogy, each insersion operation increases the potential of the data structure by 1, so by the time a revert happens, the right list is emptied and that causes a loss of potential of n, which had been accumulated and that offsets the cost of the operation t_i.

We can get the accumulated amortized cost of the operations up to a specific step by summing them, and we’ll get

A_i = T_i + \Phi(d_i) - \Phi(d_0)

If we choose \Phi(d) such that \Phi(d_i) \ge \Phi(d_0), for all i, then A_i is an upper bound of T_i. Since we assume d_0 is the empty queue, \Phi(d_0) = 0 and we have \Phi(d_i) \ge 0, this property is satisfied.

One important remark is that this analysis is only applicable to non-persistent data structures, which is not the case in OCaml. We present the OCaml code as exercise, but the correct analysis will done in later chapters.

Splay Trees

Splay Tree is a binary search tree. The insertion of an element e consists in spliting the current tree in 2 subtrees, one of all elements less or equal than e, which we’ll call small and another will all elements larger than e, which we’ll call big. We then make e the root of tree and small the left subtree and big the right subtree.



The partition operation is O(height of tree). The intuition is, when looking for the small subtree, imagine we’re analysing a node x. If x >= e, then we know all elements in the left subtree are also >= e, so are all the elements in the right subtree, so we can discard it and only look at the left subtree. The same idea applies to looking for the big subtree.

In an ideal scenario, the tree will be well balanced in such a way that the height of the tree is O(log n), and so the insertion operation. But that’s not always the case, so we introduce a balancing strategy. Whenever we follow 2 lefts in a row when searching for the big subtree, we perform a right rotation. Similarly, when following 2 rights in a row when searching for the small subtree, we perform a left rotation.

These re-balancing are enough to make partition and also insertion an O(log n) amortized time, as we shall see in the next section.



The operation for finding the minimum element is trivial. To delete the minimum element, we also perform rotations. Since we only traverse the left paths, we only have to do a left rotation.

I’ve implemented these ideas in OCaml.

Amortized Analysis of Splay Trees

We first define a potential function. Given a tree t and a node x \in t, let t(x) be the subtree of t rooted in x. Then \phi(t(x)) = \log(|t(x)| + 1) be a potential function where |t(x)| the number of nodes of the subtree of which x is root, and let the potential of a tree t, \Phi(t), be the sum of the sub-potentials of all nodes of t, that is,

(1) \Phi(t) = \sum_{x \in t} \phi(t(x)).

The amortized equation of calling partition() is the actual cost of the partition, plus the potential of the resulting trees (small and big) minus the potential of the original tree. Let’s denote small(t) as s, and big(t) as b.

(2) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

We can prove that \mathcal{A}(t) \le 1 + 2 \log(|t|) (see appendix), which means the amortized cost of the partition operation is O(\log n). It implies that insertion also has an amortized cost of O(\log n).

The same idea can be used to prove the amortized costs of the deletion of the minimum element. Okasaki claims that splay trees are one of the fastest implementations for heaps when persistence is not needed.


I found amortized analysis very hard to grasp. My feeling is that the amortized analysis is too abstracted from the real problem, making it not very intuitive. It’s also not clear how one picks the potential function and whether there’s a choice of potential that might result in a better than the “actual” amortized cost.


As we can see in code, there are four cases to consider: left-left, left-right, right-left and right-right. The left-* and right-* are symmetric, so let’s focus on the left-left and left-right cases. For the left-left case, suppose we perform the following partition:

Left-left case of partition()

Left-left case of partition()

We start with (2)

(3) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

Since = \mathcal{T} corresponds to the number of recursive calls to partition, \mathcal{T}(t) = 1 + \mathcal{T}(u), so

(4) = 1 + \mathcal{T}(u) + \Phi(s) + \Phi(b) - \Phi(t)

The recursive call to u yields (s’, b’), so we have the equivalence, \mathcal{A}(u) = \mathcal{T}(u) + \Phi(s') + \Phi(b') - \Phi(u),

(5) = 1 + \mathcal{A}(u) - \Phi(s') - \Phi(b') + \Phi(u) + \Phi(s) + \Phi(b) - \Phi(t)

Since, s = s’,

(6) = 1 + \mathcal{A}(u) - \Phi(b') + \Phi(u) + \Phi(b) - \Phi(t)

and \Phi(b') - \Phi(b) = \phi(b(x)) + \phi(b(y)) + \Phi(c) + \Phi(d), and \Phi(t) - \Phi(u) = \phi(t(x)) + \phi(t(y)) + \Phi(c) + \Phi(d), we can cancel out some terms and simplify the equation above to

(7) = 1 + \mathcal{A}(u) - \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

by induction, \mathcal{A}(u) \le 1 + 2 \phi(u),

(8) \le 2 + 2\phi(u) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

Since u is a subtree of t(y), \phi(u) < \phi(t(y)) and since the nodes in b is a subset of t, \phi(b(y)) \le \phi(t(x))

(9) < 2 + \phi(u) + \phi(b(x))

It’s possible to show that, given x, y, z such that y + z \le x, then 1 + \log y + \log z < 2 \log x. We have that |t| = |u| + |c| + |d| + 2, while |b(x)| = |c| + |d| + 1, thus, |u| + |b(x)| < |t| and by the relation above, 1 + \log (|u|) + \log (|b(x)|) + 1 < 2 \log (|t|) + 1, \phi(u) + \phi(b(x)) + 1 < 2 \phi(t), which yields

(10) \mathcal{A}(t) < 1 + 2 \phi(t).

It’s very subtle, but in this last step we used the assumption of the rotation, in particular in |b(x)| = |c| + |d| + 1, if we hadn’t performed the rotation, |b(x)| = |b| + |c| + |d| + 2, and |u| + |b(x)| < |t| wouldn’t hold!

For the left-right case, we can do a similar analysis, except that now we’ll apply the recursion to c, and the resulting partition will look different, as depicted below:

Left-right case of partition()

Left-right case of partition()

In this case, (8) will become

(11) \le 2 + 2\phi(c) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

We then use that \phi(c) < \phi(t(y)) and \phi(t(y)) < \phi(t(x)) this will allow us to cancel out some terms to

(12) < 2 + \phi(b(x)) + \phi(b(y))

Similar to what we've done before, we can show that |b(x)| + |b(y)| = |s| + |b| = |t| since the partitions of |t| are mutually exclusive. We can still use the result from the other case and get (10).


[1] Wikipedia – Amortized Analysis

Streams and Lazy Evaluation in OCaml

In this short post we’ll discuss lazy evaluation in OCaml and study a data structure called Stream. It’s based mainly on chapter 4 of Purely Functional Data Structures and it’s part of a series of study notes on that book.

Lazy Evaluation

Lazy evaluation is a property in which an expression is not evaluated immediately (suspended) and when it’s evaluated the first time, the subsequent calls are cached (memoized). Functional languages like Haskell are lazy evaluated, but not OCaml, which is eagerly evaluated. Because the results are memoized, expressions that are lazily evaluated must always return the same value given the same inputs. In Haskell it’s easy to enforce because functions are pure, that is, they do not rely on side effects.


Lazy? Source: Flickr – Brian Gratwicke

In the book the author defines a notation for lazy evaluation:

datatype a susp = $ of a

In OCaml, we can work with lazily evaluated expressions through the Lazy module. The definition of a suspension is similar:

type 'a t = 'a lazy_t

and we can use the lazy construct. Let’s define a simple expensive function, a naive Fibonacci, which runs at O(2^n) time:

let rec fibo n =
  if n <= 1 then 1
  else (fibo (n - 1)) + (fibo (n - 2))

We can create a lazy evaluated version of it:

let lazy_fibo n = lazy (fibo n);;

We can see that by assigning it to a variable, it doesn’t cause the function to be executed:

let r = lazy_fibo 42;;

The author defines a matching operator ($) that causes a lazy expression to be evaluated, but I couldn’t find a corresponding operator in OCaml. Nevertheless, the Lazy module has the force() function, which does exactly that:

Lazy.force r;; // It might take a while!

Note that if we execute the same expression again, the second time it returns much faster, because of the memoization.

We are now ready to introduce the stream data structure.


A stream is a lazy version of a linked list. Recall that a linked list is composed of nodes which point to the next node, or equivalently, to the remaining of the list. The usual definition of a linked list is:

type 'a node = Nil | Node of 'a * 'a node

If we want to be explicit that a node is actually pointing to a sublist, we could use an intermediate type, listType:

type 'a node = Nil | Node of 'a * 'a list
and 'a list = 'a node

Note that node and list are mutually recursive (they depend on each other), so we have to define them together by using the and construct.

In a stream the pointer to the remaining of the list is lazily evaluated, so the type is:

type 'a streamCell = Nil | StreamCell of 'a * 'a stream
'a stream = ('a streamCell) Lazy.t

With this basic structure we can implement many of the list functions for streams.


Let’s start with the concat operator (++):

let rec (++) (streamA: 'a stream) (streamB: 'a stream): ('a stream) =
  let computedStreamA = Lazy.force streamA in
  match computedStreamA with
    | Nil -> streamB
    | StreamCell (elem, rest) -> lazy (StreamCell (elem, rest ++ streamB))

Note that it never evaluates streamB and it only evaluates the first cell of streamA.

To help us testing, we can define function to convert from a list:

let rec fromList (l: 'a list): ('a stream) = match l with
  | [] -> lazy Nil
  | x :: xs -> lazy (StreamCell (x, fromList xs))

and a function that forces the evaluation of the entire stream, essentially converting it back to a list:

let rec toList (stream: 'a stream): ('a list) =
  let computedStream = Lazy.force stream in
  match computedStream with
    | Nil -> []
    | StreamCell (elem, rest) -> elem :: (toList rest)


The take(n) function returns the first n elements from a stream. Like the concat function, only the first node of the stream is evaluated. The recursive call is suspended.

let rec take (n: int) (stream: 'a stream) : ('a stream) =
  if n == 0 then lazy Nil
    let computedStream = Lazy.force stream in
    match computedStream with
      | Nil -> lazy Nil
      | StreamCell (elem, rest) -> lazy (StreamCell (elem, (take (n - 1) rest)))


The drop(n) function removes the first n elements from a stream and returns the result. In this case, we need to evaluate all the n recursive calls:

let rec drop (n: int) (stream: 'a stream): ('a stream) =
  if n == 0 then stream
    let computedStream = Lazy.force stream in
    match computedStream with
      | Nil -> lazy Nil
      | StreamCell (_, rest) -> drop (n - 1) rest

take and drop look very similar but one is lazy while the other is not. That’s because the head of the stream is not suspended, but the tail is. In the drop case we need to find the (n+1)-th element that will be the new head of the stream. In the take case, we’re not changing the head, and since the tail is suspended, it can wait.


The reverse function reverts the order the elements in a stream. In this case it’s more obvious that since we’re changing the location of the head, it must be eagerly evaluated.

let reverse (stream: 'a stream): ('a stream) =
  let rec reverse' = fun oldStream newStream ->
    let computedStream = Lazy.force oldStream in
    match computedStream with
      | Nil -> newStream
      | StreamCell (elem, rest) -> reverse' rest  (lazy (StreamCell (elem, newStream)))
  in reverse' stream (lazy Nil)


In this post we saw that OCaml is not lazy evaluated but we can rely on the Lazy module to accomplish that. We also learned a new data structure, stream, which is recursively lazily evaluated and operations like concat and take play well with laziness, while other like drop and reverse do not.

The full implementation with comments is available on github.


[1] OCaml Module Lazy
[2] cyocum – Mutually Recursive Types
[3] Implementing lazy from scratch in OCaml

US as an hexagonal map

In this post we’ll study a way to visualize maps in a hexagonal grid, in which each entity have uniform area. We’ll then model that as a mathematical problem.

One challenge in displaying data in maps is that larger area countries or states tend to get more attention than smaller ones, even when economically or population-wise, the smaller state is more relevant (e.g. New Jersey vs. Alaska). One idea is to normalize the areas of all the states by using symbols such as squares. Recently I ran into a NPR map that used hexagons and it looked very neat, so I decided to try building it in D3 and perform some analysis.

Below is the result of plotting the state populations (log scale):

US Hexmap: Population (log scale)

US Hexmap: Population (log scale)

One important property of visualizing data in maps is familiarity of the location (you can easily find specific states because you remember where they are) and also adjacency patterns can provide insights. For example, if we plot a measure as a choropleth map and see that the West coast is colored differently from the Midwest, then we gain an insight we wouldn’t have by looking at a column chart for example.

Because of this, ideally the homogeneous area maps should preserve adjacencies as much as possible. With that in mind, we can come up with a similarity score. Let X be the set of pairs of states that share a border in the actual US map. Now, let Y be the set of pairs of states that share a border in the hexagonal map (that is, two hexagons sharing a side). The similarity score is the size of their symmetric difference and we can normalize by the size of the original:

(|X - Y| + |Y - X|) / |X|

The lower the score the better. In an ideal case, the borders sets would match perfectly for a score of 0.

The size of the symmetric difference between the two sets seems like a good measure for similarity, but I’m not sure about the normalization factor. I initially picked the size of the union of X and Y, but this wouldn’t let us model this problem as a linear programming model as we’ll see next. The problem with using the size of X is that the score could theoretically be larger than 1, but it’s trivial to place the hexagons in the grid in such a way that none of them are touching and thus Y is empty, so we can assume the score is between 0 and 1.

Hexgrid coordinates convention

Hexgrid coordinates convention

The score from the NPR maps is 0.67.

An Optimization Problem

Let’s generalize and formalize the problem above as follows: given a graph G = (V,E), and another graph H = (V_H, E_H) representing our grid, find the induced subgraph of H, I = (V_I, E_I), such that there’s bijection f: V \rightarrow V_I and the size of the symmetric difference of f(E) and E_I is minimized (f(E) is an abuse of notation, but it means applying the bijection to each vertex in the set of edges E).

To make it clearer, let’s apply the definition above to the original problem. G represents the adjacency of states in the original map. V is the set of states and E is the set of pairs of states that share a border. H is the hexagonal grid. V_H is the set of all hexagons and E_H is the set of pairs of hexagons that are adjacent. We want to find a subset of the hexagons where we can place each of the states (hence the bijection from states to hexagons) and if two hexagons are in the grid, and we place two states there, we consider the states to be adjacent, hence the need for an induced graph, so the adjacency in the grid is preserved.

Is this general case an NP-hard problem? We can reduce the Graph Isomorphism problem to this one. It consists in deciding whether two graphs A and B are isomorphic. If we set G = A and H = B, then A and B are isomorphic if and only if I = H and the symmetric difference of f(E) and E_I is 0. The problem is that it’s not known whether Graph Isomorphism belongs to NP-Complete.

What if G is planar (which is the case for maps)? I haven’t given much thought about it, but I decided to come up with an integer programming model nevertheless.

An Integer Linear Programming Model

Note: the model uses the original grid analogy instead of the more general problem so that the constraints are easier to understand.

Boolean algebra as linear constraints

Before we start, we need to recall how to model logical constraints (AND, OR and EQUAL) using linear constraints. Let a and b be binary variables. We want x to be the result of logical operators applied to a and b.

For AND, we can do (x = 1 if and only if a = 1 and b = 1)

x \le a
x \le b
x \ge a + b - 1

For OR, we can do (x = 0 if and only if a = 0 and b = 0)

x \ge a
x \ge b
x \le a + b

For EQUAL, we can do (x = 1 if and only if a = b)

x \le 1 - (a - b)
x \le 1 - (b - a)
x \ge a + b - 1
x \ge -(a + b - 1)

We can introduce a notation and assume these constraints can be generated by a function. For example, if we say
x = \mbox{EQ}(a, b), we’re talking about the four constraints we defined above for modeling EQUAL. This is discussed in [2].


Let G be the set of pairs (x,y) representing the grid positions. Let P be the set of pieces p that have to be placed in the grid. Let N(x,y) be the set of pairs (x',y') that are adjacent to (x, y) in the grid.

Let A_{v1, v2} represent whether v1 and v2 are adjacent to each other in the dataset.

“Physical” constraints

Let b_{x,y,s} be a binary variable, and equals 1 if and only if state s is placed position (x, y).

1) A piece has to be placed in exactly one spot in the grid:

\sum_{(x,y) \in G} b_{x,y,p} = 1 for all p \in P

2) A spot can only be occupied by at most one state:

\sum_s b_{x,y,s} \le 1 for all (x,y) \in G

Adjacency constraints

Let a_{p1, p2, x, y} be a binary variable and equals 1 if and only if piece p1 is placed in (x, y) and adjacent to p2 in the grid.

3) a_{p1, p2, x, y} has to be 0 if p1 is not in (x,y) or p2 is not adjacent to any of (x,y) neighbors:

a_{p1, p2, x, y} = \mbox{AND}(\sum_{(x', y') \in N(x, y)} b_{x', y', p2}, b_{x,y,p})

We have that a_{p1, p2, x, y} is 1 if and only if p1 is in (x,y) and p2 is adjacent to any of (x,y) neighbors.

Finally, we can model the adjacency between two pieces p1 and p2. Let a_{p1, p2} be a binary variable and equals 1 if and only if p1 and p2 are adjacent in the grid:

a_{p1, p2} = \sum_{(x,y) in G} a_{p1, p2, x, y}

Symmetric difference constraints

Let y_{p1, p2} be a binary variable and equals to 1 if and only if a_{p1, p2} \ne A_{p1, p2}.

4) y_{p1, p2} \ge a_{p1, p2} - A_{p1, p2}
5) y_{p1, p2} \ge A_{p1, p2} - a_{p1, p2}

Objective function

The sum of all y‘s is the size of the symmetric difference:

\min \sum_{p1, p2 \in P} y_{p1, p2}.

Practical concerns

This model can be quite big. For our US map example, we have |P| = 50 and we need to estimate the size of the grid. A 50×50 grid is enough for any type of arrangement. The problem is that the number of variables a_{p1, p2, x, y} is |P|^2|G| = 50^4 which is not practical.

We can also solve the problem for individual connected components in the original graph and it’s trivial construct the optimal solution from each optimal sub-solution. This doesn’t help much in our example, since only Hawaii and Alaska are disconnected, so we have |P| = 48. The grid could also be reduced. It’s very unlikely that an optimal solution would be a straight line. In the NPR map, the grid is 8×12. Sounds like doubling these dimensions would give the optimal solution enough room, so |G| = 8*12*4 = 384.

We can also assume states are orderer and we only have variables a_{p1, p2, x, y} for p1 < p2, so the number of a_{p1, p2, x, y} is about 450K. Still too large, unfortunately.

Another important optimization we can do in our case because we're working with a grid is to define the adjacency for x and y independently and combine them afterwards.

Refined adjacency constraints

Instead of working with b_{x,y,s} we use X_{x, s}, and equals 1 if and only if state s is placed position (x, y) for any y and Y_{y, s}, which equals 1 iff state s is placed position (x, y) for any x. The physical constraints are analogous to the previous model:

6) A piece has to be placed in exactly one spot in the grid:

\sum_{x \in G} X_{x,p} = 1 for all p \in P
\sum_{y \in G} Y_{y,p} = 1 for all p \in P

7) A spot can only be occupied by at most one state:

\sum_s X_{xs} \le 1 for all x \in G
\sum_s Y_{y,s} \le 1 for all y \in G

In a hexagonal grid, if we have the piece p1 in position (x,y), it will be adjacent to another piece p2 if and only if p2 is in one of these six positions: 1: (x-1, y), 2: (x+1, y), 3: (x-1, y-1), 4: (x, y-1), 5: (x-1, y+1) or 6: (x, y+1). We can define two adjacency categories: Type I, which happens when p1.y - p2.y = 0 and |p1.x - p2.x| = 1 (cases 1 and 2); and Type II, which is when |p1.y - p2.y| = 1 and p1.x - p2.x \le 0 (cases 3, 4, 5 and 6).

Let’s define Y_{d=0, p1, p2, y} = 1 iff p1.y - p2.y = 0 for a given y. Similarly we define X_{|d|=1, p1, p2, x} = 1 iff |p1.x - p2.x| = 1, Y_{|d|=1, p1, p2, y} = 1 iff |p1.y - p2.y| = 1 and finally X_{d \ge 0, p1, p2, x} = 1 iff p1.x - p2.x \ge 0.

8) We can have the following constraints do model the variables we just defined:

Y_{d=0, p1, p2, y} = \mbox{EQ}(Y_{y, p_1}, Y_{y, p2})
X_{|d|=1, p1, p2, x} = \mbox{EQ}(X_{x, p1}, X_{x-1, p2} + X_{x+1, p2})
Y_{|d|=1, p1, p2, y} = \mbox{EQ}(Y_{y, p1}, Y_{y-1, p2} + Y_{y+1, p2})
X_{d \ge 0, p1, p2, x} = \mbox{EQ}(X_{x, p1}, X_{x, p2} + X_{x+1, p2})

9) Let Y_{d=0, p1, p2} = 1 iff p1.x - p2.y = 0 for any y. We can define analogous variables for the other cases:

Y_{d=0, p1, p2} = \sum_{y} Y_{d=0, p1, p2, y}
X_{|d|=1, p1, p2} = \sum_{x} X_{d=0, p1, p2, x}
Y_{|d|=1, p1, p2} = \sum_{y} Y_{d=0, p1, p2, y}
X_{d \ge 0, p1, p2} = \sum_{x} Y_{d \ge 0, p1, p2, x}

10) Let T'_{p1, p2} = 1 iff p1 and p2 have the Type I adjacency and T''_{p1, p2} = 1 iff p1 and p2 have Type II adjacency:

T'_{p1, p2} = \mbox{AND}(Y_{d=0, p1, p2}, X_{|d|=1, p1, p2})
T''_{p1, p2} = \mbox{AND}(Y_{|d|=1, p1, p2}, X_{d \ge 0, p1, p2})

11) Finally, we say that p1 and p2 are adjacency iff either Type I or Type II occurs:

a_{p1, p2} = \mbox{OR}(T'_{p1, p2}, T''_{p1, p2})

The model for adjacency became much more complicated but we were able to reduce the number of adjacency variables are now roughly O(|P|^2 \sqrt{|G|}). The number of non-zero entries in the right hand size of (which represents the size of the sparse matrix) is roughly 11M, dominated by the type (8) constraints. I’m still not confident this model will run, so I’ll punt on implementing it for now.


In this post we explored a different way to visualize the US states map. I was mainly exploring the idea of how good of an approximation this layout is and a natural question was how to model this as an optimization problem. Turns out that if we model it using graphs, the problem definition is pretty simple and happens to be a more general version of the Graph Isomorphism problem.

I struggled with coming up with an integer programming model and couldn’t find one with a manageable size, but it was a fun exercise!

World Map?

One cannot help wondering if we can display the countries in a hexagonal map. I’m planning to explore this idea in a future post. The main challenge is that the US states areas are more uniform than the countries. For example, the largest state (Alaska) is 430 times larger than the smallest (Rhode Island). While the largest country (Russia) is almost 40,000,000 bigger than the smallest (Vatican City).

Also, the layout of the US map was devised by someone from NPR and they did a manual process. I’m wondering if we can come up with a simple heuristic to place the hexagons and then perform manual adjustments.


[1] NPR Visuals Team – Let’s Tesselate: Hexagons For Tile Grid Maps
[2] Computer Science: Express boolean logic operations in zero-one integer linear programming (ILP)
[3] SOM – Creating hexagonal heatmaps with D3.js
[4] Github – d3/d3-hexbin

Data sources

[5] US State Borders
[6] Wikipedia – Population of US states and territories
[7] Tool to download Wikipedia tables as CSV
[8] List of US states and territories by area