Polymorphic Recursion in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes recursive types that are non-uniform. In this post we’ll learn more about these types, how to implement them in OCaml and see an example by studying the implementation of Random Access Binary Lists using such a construct.

Uniform recursive type

As an example of uniform recursive data structure we have a simple list

Cons(1, Cons(2, Cons(3, Nil)))

Which has a recursive type, for example

type 'a myList = Nil | Cons of 'a * 'a myList

Each element of the list is either Nil (terminal) or it has a value of a polymorphic type 'a, followed a recursive list also of type 'a.

Non-uniform recursive type

Now, say that the type of the recursive list is not the same as the current list? Then we have a non-uniform polymorphic recursive type, for example:

type 'a seq = Nil | Cons of 'a * ('a * 'a) seq

We’ll name this a sequence. A int seq would have the value in the first node would have type int, but the element from the second node would have type (int, int), the third type ((int, int), (int, int)) and so on. This structure is equivalent to a complete binary tree, where the i-th element of seq represents the i-th level of the tree.

An example of value with this type is:

Cons (1, Cons ((2, 3), Cons (((4, 5), (6, 7)), Nil)))

We need a special syntax to type recursive functions that take recursive non-uniform types, because the type of the function called recursively might be a different polymorphic type than the caller. OCaml by default tries to infer the generic types of the function and bind them to specific instances [2]. For example, in

let f: 'a list -> 'a list = fun x -> 13 :: x

OCaml will assume 'a is int and will compile fine. We can see this by pasting that code in the command line, utop.

utop # let f: 'a list -> 'a list = fun x -> 13 :: x;;
val f : int list -> int list =

The function will then not be polymorphic anymore. To prevent OCaml from auto-binding specific type instances, we can use a special syntax introduced in OCaml 3.12 [3]

utop # let f3: 'a. 'a list -> 'a list = fun x -> 13 :: x;;

This time we’ll get a compilation error:

Error: This definition has type int list -> int list which is less general than ‘a. ‘a list -> ‘a list

The important thing is that this allow us binding the recursive calls with different types. According to the Jane Street Tech Blog [3]:

Note that a polymorphic type annotation holds inside the body of a recursive definition as well as outside, allowing what is known as polymorphic recursion, where a recursive call is made at some non-trivial instantiation of the polymorphic type.

So for example, we can write this function to calculate the size of a sequence:

The problem with this structure is that it can only represent lists of size in the form of 2^k - 1. To work around that, we allow some items to not hold any elements at all, so that each item corresponds to a digit in the binary representation of the size of the list.

type 'a seq = Nil | Zero of ('a * 'a) seq | One of 'a * ('a * 'a) seq

For example, we can now represent a list of 10 elements, as

Zero(One((1, 2), Zero(One(((3, 4), (5, 6)), ((7, 8),(9, 10))), Nil))))


Sequence binary random access list

We can use a sequence to implement a random access binary access list in a concise way.


Inserting an element at the beginning is analogous to incrementing the binary number, that is, starting from the least significant digit, if it’s a zero, we make it one, if it’s one we make it a 0, and carry over a 1, by adding it to the next digit.

The carry over process is simple in this structure. Because the type of an item following an item of type 'a is ('a, 'a), to merge the element to be added with the existing element, we simply make a tuple and pass it to the recursive call.

Head and Tail

Removing or retrieving the first element is analogous to decrementing a binary number. If the digit is one, we make it zero and return the element and the resulting list. If it’s a zero, we make a recursive call to get the next available element. However since the returned element is of type ('a, 'a), and our return type is 'a, we only use the first value of the pair.

Implementing head and tail using popAux is now trivial


Finding an element can be done by transforming the problem into smaller instances.

It helps to look at some simple examples. Let’s consider 3 cases.

Case 1. If we had a single element, we either return it if the index is 0, or throw if it’s larger than that.

0: (0)

Case 2. If we have 2 elements,

0: ()
1: (0, 1)

Notice that when we go from the first level to the second, all items “doubled” in size, so we can “transform” this to the single element case by treating pairs as a single element, but since the index has individual elements granularity, we need to transform it by halving it. We reduced it to Case 1.

If our initial index was either 0 or 1, it’s now 0, and we found our element in level 1.

1: (0)

The problem is that we need to return a single element at level 0, not a pair, so we need to undo the transformation. We can use the parity of the original index will to decide which side of the pair to return. If it’s even, we return the first element, otherwise the second one.

Case 3. If we have 3 elements,
0: (0)
1: (1, 2)

and our index is larger than 0, we move to the next level but we need to account for the level we’re skipping, so the transformation of index would look like:
0: ()
1: (0)

which is reduced to Case 2.

These 3 cases can be used to find elements larger than 3. For example, say we have 10 elements and want to find the element at position 6:

0: ()
1: (0, 1)
2: ()
3: (((2, 3), (4, 5)), ((6, 7), (8, 9)))

Step 1. This is Case 2. We need to transform this by treating pairs as elements and halving the index:

1': (0)
2': ()
3': ((1, 2), (3, 4))

Note how this reduced the problem of finding the element at position 3 of a list with size 5. Step 2. We now are in case 3, where we need to skip the current element:

1': ()
2': ()
3': (((0), (1)), ((2), (3)))

Our index is now 2. Step 3. we go one level down

2: ()
3: (0, 1)

With an index of 1. Step 4. Finally, we halve it once again and we finally found the right level that contains our index.

3: (0)

We now need to recurse back to find exactly which element on that level to pick. On Step 4, we can see our index 1 was on the right side of the pair in level 3, so we pick the right side, that is, ((6, 7), (8, 9)).

On Step 3, our index 2 was on the left size of the innermost pair, that is (6, 7). On Step 2, we skipped the current element but didn’t change levels, so there’s no need to choose an element from the pair. Finally, on Step 1, the index 6 was on the left side of the innermost pair, which should return the element with a corresponding index 6.

In general, we can tell which side of the innermost pair to pick by observing that the indexes are ordered from left to right in a given level. And because every level has an even number of elements, we can assume that the first index in the level – or the first element in the first pair – is even. Hence the parity of the index is sufficient to determine which side of the pair to pick.

With this algorithm in mind, the lookup function is quite compact:

The update follows a similar idea as the lookup, but the problem is that we need to return the updated level when returning from the recursion. That is, we need to update the level before returning.

To accomplish that, we can pass a callback, the updater, that encodes which pair we would pick at each level. We start with a function that simply returns the element to be updated

(fun _ -> element)

Then, at each level we create a new updater, which applies the previous updater on the left or right side of the pair, depending on the parity of the index:

When we finally find the level that has our index, we can apply the function, which has the effect of “narrowing” down the elements from a given level to a single element, replacing the value at the target index and then returning the updated elements when returning from the recursion.

After applying the updater, we return the updated level recursively.

Structural Decomposition

Okasaki introduces this implementation in the context of Structural Decomposition, which is a technique for creating data structures from incomplete ones. In this example, the raw implementation of the sequence can only represent lists of size 2^k, but modeling each node in the sequence to be zero or one, zero not having any elements, we can work around the size restriction.


The implementation of random access binary access list using sequences is very neat, but very hard to understand.

One of the major selling points of shorter code is that it tends to contain less bugs and also less corner cases. On the other hand, if the code is also harder to read and understand, it might be harder to spot bugs.

This post helped me understand a bit more about OCaml’s type system. Digging a little also led me to interesting topics such as Parametric Polymorphism [4] and Existential vs. Universally quantified types [5].


[1] Purely Function Data Structures – Chris Okasaki
[2] Jane Street – Ensuring that a function is polymorphic
[3] Ensuring that a function is polymorphic in Ocaml 3.12
[4] Wikipedia – Parametric polymorphism
[5] Existential vs. Universally quantified types in Haskell


Numerical Representations as inspiration for Data Structures


In this chapter Okasaki describes a technique for developing efficient data structures through analogies with numerical representations, in particular the binary and its variations.

We’ve seen this pattern arise with Binomial Heaps in the past. Here the author presents the technique in its general form and applies it to another data structure, binary random access lists.

Binary Random Access Lists

These lists allows efficient insertion at/removal from the beginning, and also access and update at a particular index.

The simple version of this structure consists in distributing the elements in complete binary leaf trees. A complete binary leaf tree (CBLF) is one that only stores elements only at the leaves, so a tree with height i, has 2^(i+1)-1 nodes, but only 2^i elements.

Consider an array of size n, and let Bn be the binary representation of n. If the i-th digit of Bn is 1, then we have a tree containing 2^i leaves. We then distribute the elements into these trees, starting with the least significant digit (i.e. the smallest tree) and traversing the tree in

For example, an array of elements (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) has 11 elements, which is 1011 in binary. So we have one tree with a single leave (1), a tree with 2 leaves (2, 3) and another containing 8 leaves (4, 5, 6, 7, 8, 9, 10, 11).

We use a dense representation of the binary number, by having explicit elements for the 0 digit. Here’s a possible signature for this implementation:

Inserting a new element consists in adding a new tree with a single leave. If a tree already exists for a given size, they have to be merged into a tree of the next index. Merging two CBLFs of the same size is straightforward. We just need to make them children of a new root. Since elements are stored in pre-order, the tree being inserted or coming from carry over should be the left child.

Looping back to our example, if we want to insert the element 100, we first insert a tree with a single leaf (100). Since the least significant digit already has an element, we need to merge them into a new tree containing (100, 1) and try to insert at the next position. A conflict will arise with (2, 3), so we again merge them into (100, 1, 2, 3) and try the next position. We finally succeed in inserting at position 2, for a new list containing trees like (100, 1, 2, 3) and (4, 5, 6, 7, 8, 9, 10, 11).

The complexity of inserting an element is O(log n) in the worst case which requires merging tree for all digits (e.g. if Bn = 111...111). Merging two trees is O(1).

Removing the first element is analogous to decrementing the number, borrowing from the next digit if the current digit is 0.

Searching for an index consists in first finding the tree containing the index and then searching within the tree. More specifically, because the elements are sorted beginning from the smallest tree to the largest, we can find the right tree just by inspecting the number of elements in each tree until we find the one whose range includes the desired index. Within a tree, elements are stored in pre-order, so we can find the right index in O(height) of the tree.

After finding the right index, returning the element at that index is trivial. Updating the element at a given index requires rebuilding the tree when returning from the recursive calls.

Okasaki then proposes a few different numbering systems that allow to perform insertion/removal in O(1) time. Here we’ll only discuss the less obvious but more elegant one, using skew binary numbers.


Skew Binary Random Access Lists

A skew binary number representation supports the digits 0, 1 and 2.

The weight of the i-th digit is 2^(i+1) - 1. In its canonical form, it only allows the least significant non-zero digit to be 2.



Decimal and Skewed Binary

It’s possible to show this number system offers a unique representation for decimal numbers. See the Appendix for a sketch of the proof and an algorithm for converting decimals to skewed binary numbers.

Incrementing a number follows these rules:

  • If there’s a digit 2 in the number, turn it into 0 and increment the next digit. By definition that is either 0 or 1, so we can safely increment it without having to continue carrying over.
  • Otherwise the least significant digit is either 0 or 1, and it can be incremented without carry overs.

The advantage of this number system is that increments (similarly, decrements) never carry over more than once so the complexity O(1), as opposed to the worst-case O(log n) for regular binary numbers.

A skew binary random access list can be implemented using this idea. We use a sparse representation (that is, not including 0s). Each digit one with position i corresponds to a tree with (2^(i+1) - 1) elements, in this case a complete binary tree with height i+1. A digit 2 is represented by two consecutive trees
with same weight.

Adding a new element to the beginning of the list is analogous to incrementing the number, which we saw can be done in O(1). Converting a digit 0 to 1 or 1 to 2, is a matter of prepending a tree to a list. To convert a 2 to 0 and increment the next position, we need to merge two trees representing it with the element to be inserted. Because each tree is traversed in pre-order, we make the element the root of the tree.

Elements are inserted in pre-order in each tree, so when searching for an
index, we can first find the right tree by looking at the tree sizes and within a tree we can do a “binary search” in O(height) of the tree.

Binomial Heaps

In this chapter, this technique is also applied to improve the worst case runtime of insertion of binomial heaps. The implementation, named Skewed Binomial Heap, is on github.


This chapter demonstrated that binary representations are a useful analogy to come up with data structures and algorithms, because they’re simple. This simplicity can lead to inefficient running times, though. Representations such as skewed binary numbers can improve the worst case of some operations with the trade-off of less intuitive and more complex implementations.

Appendix A – Proof

Sketch of the proof. First, it’s obvious that two different decimals cannot map to the same binary representation. Otherwise the same equation with the same weights would result in different values. We need to show that two binary representations do not map to the same decimal.

Suppose it does, and let them be B1 and B2. Let k be the largest position where these number have a different digit. Without loss of generality, suppose that B1[k] > B2[k].

First case. suppose that B1[k] = 1, and B2[k] = 0 and B2 doesn’t have any digit 2. B1 is then at least M + 2^{k+1} - 1, while B2 is at most M + \sum_{i = 1}^{k} (2^{i} - 1) which is M + 2^{k + 1} - k (M corresponds to the total weight of digits in positions > k). This implies that B2 < B1, a contradiction.

Second case. suppose that B1[k] = 1, but now B2 does have a digit 2 at position j. It has to be that j < k. Since only zeros follow it, we can write B2‘s upper bound as

M + \sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1

Since 2(2^{j + 1} - 1) < 2^{j + 2} - 1, we have

\sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1 < \sum_{i = j + 2}^{k} (2^{i} - 1) + 2^{j + 2} - 1

We can continue this argument until we get that B2 is less than M + 2(2^{k} - 1) which is less than M + 2^{k + 1} - 1, B1.

Third case. Finally, suppose we have B1' such that B1'[k] = 2, and B2'[k] = 1. We can subtract 2^{k+1} - 1 from both and reduce to the previous case. ▢

Appendix B – Conversion algorithm

Converting from a decimal representation to a binary one is straightforward, but it’s more involved to do so for skewed binary numbers.

Suppose we allow trailing zeros and have all the numbers with k-digits. For example, if k=2, we have 00, 01, 02, 10, 11, 12 and 20. We can construct the numbers with k+1-digits by either prefixing 0 or 1, and the additional 2 followed by k zeros. For k=3, we have 000, 001, 002, 010, 011, 012, 020, 100, 101, 102, 110, 111, 112, 120 and finally 200.

More generally, we can see there are 2^(k+1) - 1 numbers with k digits. We can construct the k+1 digits by appending 0 or 1 and then adding an extra number which is starts 2 and is followed by k zeros, for a total of 2^(k+1) - 1 + 2^(k+1) - 1 + 1 = 2^(k + 2) - 1, so we can see this invariant holds by induction on k, and we verify that is true for the base, since for k = 1 we enumerated 3 numbers.

This gives us a method to construct the skewed number representation if we know the number of its digits say, k. If the number is the first 2^(k) - 1 numbers, that is, between 0 and 2^k - 2, we know it starts with a 0. If it’s the next 2^(k) - 1, that is, between 2^k - 1 and 2^(k+1) - 3, we know it starts with a 1. If it’s the next one, exactly 2^(k+1) - 2, we know it starts with a 2.

We can continue recursively by subtracting the corresponding weight for this digit until k = 0. We can find out how many digits a number n has (if we’re to exclude leading zeros) by find the smallest k such that 2^(k+1)-1 is greater than n. For example, for 8, the smallest k is 3, since 2^(k+1)-1 = 15, and 2^(k)-1 = 7.

The Python code below uses these ideas to find the skewed binary number representation in O(log n):

One might ask: why not OCaml? My excuse is that I already have a number theory repository in Python, so it seemed like a nice addition. Converting this to functional code, in particular OCaml is easy.

This algorithm requires an additional O(log n) memory, while the conversion to a binary number can be done with constant extra memory. My intuition is that this is possible because the weights for the binary numbers are powers of the same number, 2^k, unlike the skewed numbers’ weights. Is it possible to work around this?

Lazy Rebuilding


In this chapter Okasaki starts with a common technique data structures use to achieve efficient complexity times. A classic example are balanced binary search trees, which, on one hand, need to be balanced to avoid degenerated cases (think of a binary tree which can degenerate to a path), but on the other hand, balancing is usually too expensive to perform at every operation.

The tradeoff is to allow a certain degree of “imbalance”, such that the big-O complexity of operations does not degenerate, and to let us perform re-balances only every so often, in a way that leads to an efficient amortized complexity. Structures such as AVL trees and Red-black trees make use of this technique.

The general technique is named batched rebuilding. The main issue with it though is that we saw that amortized analysis does not play well persistent data structures.

To address that problem, a technique was developed by Mark Overmars, called global rebuilding.

Global Rebuilding

The basic idea is to keep two copies of a structure in parallel, one of which we perform incremental updates (write) and the other is used for read operations. After a few operations, we copy the write version into the read-only version. The idea is that the incremental updates are cheap, but might put the structure in a state that is not suitable for reading. This allows for distributing the costs across operations such that each operation has worst case efficient bounds.

The major downside of this technique is the complexity and corner cases, because we also need to account for changes to the structure that renders the write copy of the structure invalid.

We’ll now describe yet another implementation of queues that use this technique, developed by Robert Hood and Robert Melville. Similar to other queue implementations, this structure maintains a front and rear lists, the latter in reverse order. It also has the invariant that the rear list cannot be larger than the front.

When that happens we must perform a rotation, which consists in reversing the rear queue and appending it to the end of the front queue. We cannot perform this operation piecemeal, because we only have efficient access to the first element of a list. The operation we can do partially is concatenating the reverse of a list to the beginning of another list, that is, given xs and ys, we can do ~xs + ys, where ~ represents the reverse of a list. Note we can also reverse a list, that is xs to ~xs piecemeal by having ys = [].

Now, to achieve our goal which is xs + ~ys, we reverse both xs and ys, then concatenate the reverse of ~xs to ~ys:

1) Start with xs and ys
2) Reverse both: ~xs and ~ys
3) Then (~(~xs)) + ~ys which is xs + ~ys

We can perform these operations one step at a time by using a “state machine”, we first start with a state Reversing(xs, ys) which will reverse xs and ys at the same time into ~xs and ~ys, at which point we switch to the state Appending(~xs, ~yx) which will concatenate the xs into ~ys, so then we get to Done(xs + ~ys).

The problem of keeping a separate state is that it might not be accurate by the time it’s done. In particular, if we remove an element from the front, the Appending step can be shortcut. We can keep a count of how many of the elements in ~xs of Appending are still present. If, by the time we’re appending the number of present elements goes to 0, the last elements of ~xs (that is, the first elements of xs) have been removed, so they do not need to be transferred to ~ys.

A possible implementation of the states is:

The Idle is just a placeholder to make sure the state machine can keep going to the next state even when we’re not currently performing a rotation. The state transition definition is given by:

One important detail here is that Appending ({validCount = 0; front; rear}) must appear before the other matching rule for Appending, because the other one includes this.

When we remove an element from the front of the queue we need to update the number of valid elements in the front of the rotation state:

Similar to the Real-time queues, we can guarantee constant time worst case for all the operations if we execute the state machine twice for each operation. The check() function verifies if we need a rotation and also executes the nextStep() function twice.

The other operations are then straightforward. The only thing is that pop() must call the invalidate() function because it’s decreasing the size of front:

The full, up-to-date implementation in on github.

Lazy Rebuilding

As we can see, the Hood-Melville implementation is quite involved. Okasaki argues that making use of lazy evaluation and memoization simplifies the implementation. We can indeed see that the implementation of Real-time queues is much cleaner.

One simplification is that we don’t have to sync the write copy with the read copy. We just need to make sure to execute items from the schedule by the time they are needed. Memoization will take care of the rest.

Another advantage in this case is that reversing and concatenating lazily evaluated lists does not require an intermediate reversal of the front, which means that we can remove the front element without the need to invalidate any items in the schedule.

The author provided more examples of lazy rebuilding for deques (double-ended queues) by first presenting an amortized version using the Banker’s method and then a lazy rebuilding version. The Banker’s deque and Real-time deque are also on my github.


In this chapter the technique behind Real-time queues was formalized. I really enjoy the organization of the book in which the author presents a data structure and from there he extracts a generic pattern of technique that can be applicable to other cases.

When I was studying data structures I don’t recall learning about general techniques underlying existing data structures, such as batched rebuilding for AVL trees and Red-black trees. That would have been interesting.

Monitoring System using Raspberry Pi

A few months ago, I described how to use a Raspberry Pi to setup a domestic server. In this post we’ll see how to take snapshots using the Pi and a camera, so combining it with the server, we can create a simple monitoring system accessible from a phone.


Node.js server

First thing we have to do is to run another server, separate from nginx to handle the requests. nginx is a very basic server that can’t do much else, but it’s capable of forwarding requests/responses.

I chose Node.js because I’m familiar with JavaScript and haven’t used much Node.js, such it was a good learning opportunity.

To install it, first we need to find out which architecture our Pi’s processor is. The Raspbian has the command arch which tells returned armv6l in my case. This tells us which version to pick from the Node.js site. We can install it manually by downloading and unpacking it:

cd ~/Downloads
wget https://nodejs.org/dist/v6.9.2/node-v6.9.2-linux-armv6l.tar.xz
tar xf node-v6.9.2-linux-armv6l.tar.xz

We can put it in a local directory (no need for sudo). We move it to a convenient place:

mv ~/Downloads/node-v6.9.2-linux-armv6l workspace/node

and then make sure that path is being looked for when running executables:

export PATH="$HOME/workspace/node/bin:$PATH"

Now we need to tell nginx to forward requests to another IP address, which our Node.js server will be listening to, by adding this to `/etc/nginx/sites-available/default`:

Finally we create a very simple Node.js application that listens to localhost (, port 3000, and servers an HTML file as response:


I got the Raspberry PI 5MP Camera Board Module to take the still photos. It’s an official product from Raspberry pi, so it comes with some nice integrations.

In particular, I was interested in the raspistill utility.

It has a few parameters and image transformation options, and we can specify an output file to which an image will be saved in JPEG format.

raspistill -w 600 -h 400 -rot 180 -o images/image.jpg

Putting things together

The basic idea of this system is that whenever we hit our webpage, it will take a snapshot using the camera and display the image.

Because it takes some time for a photo to be taken, we do it asynchronously. In the meantime it shows the latest snapshot it had on file.

To avoid overloading the Pi we debounce snapshot commands and only take a new snapshot after at least a minute since the previous one. I wrote a bash script to handle that:

Now we just need out Node.js app to call this script asynchronously:


This simple project was inspired by a friend, who used a Raspberry Pi + camera to monitor his apartment during his vacation. I copied the idea with the intention to learn about the process and put the Pi I bought a couple of years back to use.

I haven’t actively used this system, but it was fun to work on it and I’m now looking forward on working on other “home” projects.

Eliminating Amortization


In this chapter Okasaki presents techniques to convert persistent data structures with amortized bounds to worst-case bounds. A few reasons to want worst-case guarantees include:

  • Real-time systems, where an expensive operation can cause the system to miss a hard deadline, even if it’s efficient when amortized over time.
  • Parallel systems, in which we want avoid one processor to take more time than the other if it happens to execute the expensive operation.

The intuition is to make the theoretical amortized analysis part of the actual implementation. In the amortized analysis, we saw either through the Banker’s method or the Physicist method the idea of paying debt ahead of time so by the time an expensive operation takes place, we could show it’s cost had already been distributed throughout previous operations.

To eliminate the amortization, we can literally pay off these costs when running the algorithm. We do it through a structure called schedule, which contains the unevaluated operations on the data structure. Whenever we perform an operation, we evaluate one or more items of the schedule. Due to memoization, by the time we actually need the evaluated structure, it will be evaluated.

Often the amortized analysis can dictate how the execution of the schedule is performed. For example, if the analysis tells us to pay off 2 units of debt on every action, that can be translated to executing 2 items from the schedule on every action.

The main challenge in this conversion is to modify the data structure in such a way it can be partially executed.

We’ll discuss an example using queues and then binomial heaps.

Real-time Queues

For the queue structure, the costly operation is the rotation that takes place when we need to combine the rear with the front. It’s a monolithic operation, so we need to change that.

Let’s start by defining the structure. It’s similar to the stream queue, except that we don’t need the rear to be a stream and we have a schedule field, which is also a stream. The idea is that whenever a rotation happens, it will be “scheduled” to be executed little by little.

The rotation() function is the most complicated part of the structure. The invariant is that we only call this function when |rear| = |front| + 1.

We construct a stream with the elements of rear reversed, newSchedule and on the return step of the recursion we append the elements of front to that stream.

Note that this is performed lazily, so a call to rotate only executes one step.

Now we have the execution of the schedule. It serves two purposes. One is to execute an item from the schedule and the other is to trigger a rotation when the is schedule empty.

The schedule being empty is a proxy to when |rear| = |front| + 1. Why? Because when the schedule is populated, it has the same size of front (they’re both assigned the same stream), and rear is empty. Whenever we insert an element, increasing the size of rear by one, or remove an element, reducing the size of front by one, we decrease the difference between |front| - |rear| by 1, and so the size of the schedule is decreased by 1.

Implementation-wise, maybe it would be more clear if we checked for the empty schedule outside and only called exec() is it was not empty.

With these in place, the usual operations for queue are straightforward. The ones that mutate the queue, push() and pop(), need to make sure to call exec().

Scheduled Binomial Heaps

We discussed Binomial Heaps in a previous post. The idea is that a binomial heap is a list of binomial trees, an a binomial tree is defined recursively based on it’s rank.

The heap is represented by a binary number (as a list of digits). If the k-th digit of the number is 1, it indicates the existence of a binomial tree of rank k (containing 2^(k-1)). A heap with n elements, has a unique representation, which is the binary representation of n.

For example, if n = 10, then the binary representation of the heap is 1010, meaning it contains a binomial tree of rank 2 (2 elements), and one with rank 4 (8 elements), holding 10 elements in total.

Inserting an element into the heap is analogous to incrementing a binary number by 1. We first create a binomial tree with rank 0 containing that element, and try to insert it into the beginning of the digits list. If the position we want to insert is occupied, we need to merge the trees, to generate a tree with a higher rank, which is analogous to the carry over operation when adding two binary numbers.

The schedule is a list of all the numbers generated when any operation is performed.

The structure for the heap is then the following:

As we discussed above, the insertion is analogous to incrementing the binary number. But because the digits are a stream, we need to deal with lazy evaluation:

Executing the schedule consists in evaluating one digit from the first number on the schedule. The key is to avoid evaluating the part of the number that already has been evaluated. It’s possible to show this happens when we find the first digit one. The intuition is that the trailing zeros from the current number were 1’s before the increment, so there was a mutation (linking) while the remaining digits were not modified by that operation.

For example, if we have the number 101011, an increment causes it to become 101100. The two trailing zeros in 101100 correspond to a linking of the binomial tree.

Finally, inserting a new element consists in creating a binomial tree of ranking 0, insert it, and then execute the schedule.

The full code is available on github.

Lessons learned

Because we now have several different implementations of queues, I wanted to implement tests that were applicable to any queue implementing an “interface”.

One way to do that is to put the interface in a separate module, IQueue.ml and have each implementation require it as their module type:

To make the test work with any module implementing the IQueue interface, we can use a functor (module transformation) and

Other things I’ve learned were matching lazy patterns. Matching a lazily evaluated variable with the keyword lazy forces the evaluation, so we can use a cleaner syntax, for example:

Another syntactic construct that helps with legibility is the record. The examples in Okazaki’s book use tuples for most of composed structs, but I prefer the more verbose and explicit records.

Finally, another lesson learned is that adding an insertion operation to the Stream API is likely wrong. The idea of the stream is that is avoids evaluation of its tail, so the whole block has to lazily evaluated.

For example, in the queue implementation, in the first block, we will evaluate all the rotation and then make the result lazy, which is not what we want.


Eliminating evaluation is a very interesting technique, but it has limited application is practice. It complicates the implementation and, as I learned, it can be tricky to spot bugs (for example, the one in which we’re evaluating the rotate() function) that would only show up if we profile the data structure.

Notes on JavaScript Interpreters

In a previous post, Notes on how browsers work, we studied the high-level architecture of a browser, specifically the Rendering Engine. We used the following diagram,

In this post we’ll focus on the JavaScript interpreter part. Different browsers use different implementations of the interpreters. Firefox uses SpiderMonkey, Chrome uses V8, Safari uses JavaScriptCore, Microsoft Edge uses Chakra, to name a few. V8 is also used as a standalone interpreter, most notably by Node.js.

These interpreters usually comply to one of the versions of the ECMAScript, which is a standardization effort of the JavaScript language. ECMA-262 is the document with the specification. As it happens with other languages, from their first inception, design flaws are identified, new development needs arise, so the language spec is always evolving. For that reason, there are a few versions of ECMAScript. Most browsers support the 5th edition, also known as ES5.

There’s already the 7th edition (as of 2016), but it takes time for browsers to catch up. Because of that, JavaScript programs that are capable of translating newer specs into ES5 were created, such as Babel. The advantage of using this technique, also known as transpiling, is that developers can use newer versions of the step, such as ES6, without depending on browser support. Disadvantages include the extra complexity by adding a new step in the deployment process and it makes harder to debug since it’s hard to map errors that happen in the transformed code back to the original source.

Section 8.4 of the ECMA-262 describes the execution model of JavaScript:

A Job is an abstract operation that initiates an ECMAScript computation when no other ECMAScript computation is currently in progress. A Job abstract operation may be defined to accept an arbitrary set of job parameters.

Execution of a Job can be initiated only when there is no running execution context and the execution context stack is empty. A PendingJob is a request for the future execution of a Job. A PendingJob is an internal Record whose fields are specified in Table 25. Once execution of a Job is initiated, the Job always executes to completion. No other Job may be initiated until the currently running Job completes. However, the currently running Job or external events may cause the enqueuing of additional PendingJobs that may be initiated sometime after completion of the currently running Job.

MDN’s Concurrency model and Event Loop [2] describes this spec in a more friendly way. As in other programming environments such as C and Java, we have two types of memory available: the heap and the stack. The heap is the general purpose memory and the stack is where we keep track of scopes for example when doing function calls.

In JavaScript we also have the message queue, and for each message in the queue there is an associated function to be executed.

JavaScript Execution Model

The event loop

The execution model is also called the event loop because of the high-level working of this model:

When the stack is empty, the runtime processes the first message from the queue. While executing the corresponding function, it adds an initial scope to the stack. The function is guaranteed to execute “atomically”, that is, until it returns. The execution might cause new messages to be enqueued, for example if we call

The callback passed as the first argument to setTimeout() will be enqueued, not stacked. On the other hand, if we have a recursive function, such as a factorial:

The recursive calls should all go to the same stack, so they will be executed to completion, that is until the stack is empty. The calls provided to setTimeout() will get enqueued and be executed only after the value of the factorial gets printed.

In fact, using setTimeout() is a common trick to control the execution order of functions. For example, say that function A called B and B calls another function C. If B calls C normally, then C will have to finish before B returns, as the sample code below:

In case we want to finish executing A first, then B can call C using setTimeout(C, 0), because then it will be enqueued, and then both A and B will finish until a new message is processed from the queue, as the code below:

Web Workers

We discussed Web Workers in a previous post, in which we mentioned that it’s a separate thread that shares no memory with the main thread. In fact, according to [2], it has its own stack, heap and queue. It doesn’t violate the execution model of the main thread because communications must be done via a publisher/subscriber API, which means communicating between two threads is subject to the queueing.


Beyond the spec, each JavaScript engine is free to implement feature the way they want. V8’s architecture is described in their wiki page [3], so we can study some of its key components.

Fast Property Access

In JavaScript, structures like Object can be mutated (added and removed properties) in runtime. Implementing them as hashtables can lead to performance problems when accessing properties of these structures. Compare that to compiled languages like Java in which instances of a class can be allocated with all its members in a single chunk of memory and accessing properties of objects consists in adding an offset to the object’s pointer.

V8 optimizes the Object allocation by creating hidden classes. It makes use of the fact that properties are mutated in the same pattern. For example in

In this case, we always insert the property x and then y. V8 starts with an empty class C0 when the object is first created. When x is assigned, it creates another class C1 with property x and that points to C0. When y is assigned, it creates yet another class C2 with property y that point to C1.

When the Point constructor is finished, it will be an instance of C2. It has to instantiate 3 objects one from C0, then C1, then C2.

Accessing property y is an adding an offset operation, while accessing property x is two offset operations, but still fast than a table lookup. Other instances of Point will share the same class C2. If for some reason we have a point with only x set, then it will be an instance of class C1. This sharing of structures resembles the persistent data structures that we studied previously.

Another interesting property of this method is that it doesn’t need to know in advance whether the code has indeed a pattern in mutating objects. It’s sort of a JIT compilation.

Dynamic Machine Code Generation

According to [3]:

V8 compiles JavaScript source code directly into machine code when it is first executed. There are no intermediate byte codes, no interpreter.

Efficient Garbage Collection

We wrote about V8’s memory management in a previous post.


In this post we revisited a bit of the history of JavaScript and ECMAScript. We also studied the event loop model that is part of the spec and finally saw some aspects of one of the JavaScript engines, V8.


  • [1] ECMAScript 2016 – Language Specification
  • [2] MDN: Concurrency model and Event Loop
  • [3] V8 – Design Elements

Largest sets of subsequences in OCaml

I’ve noticed that there is this set of words in English that look very similar: tough, though, through, thought, thorough, through and trough. Except thought, they have one property in common: they’re all subsequence of thorough. It made me wonder if there are interesting sets of words that are subsequences of other words.

Word cloud made with Jason Davie’s tool (https://www.jasondavies.com/wordcloud/)

This post is an attempt to answer a more general question: given a list of words, what is the largest set of these words such that they’re subsequences of a given word?

A word A is a subsequence of a word B if A can be obtained by removing zero or more characters from B. For example, “ac” is a subsequence of “abc”, so is “bc” and even “abc”, but not “ba” nor “aa”.

A simple algorithm to determine if a word A is a subsequence of another is to start with 2 pointers at the beginning of each word, pA and pB. We move pB forward until pA and pB point to the same character. In that case we move pA forward. A is a subsequence of B if and only if we reach the end of A before B. We could then iterate over each word W and find all the words that are subsequences of W. If the size of the dictionary is n, and the size of the largest word is w, this algorithm would run in O(n^2 w).

For English words, we can use entries from /usr/share/dict/words. In this case, n is around 235k (wc -l /usr/share/dict/words), so a quadratic algorithm will take a while to run (around 5e10 operations).

Another approach is to generate all subsequences of words for a given word W and search the dictionary for the generated word. There are O(2^w) subsequences of a word of length w. If we use a hash table, we can then do it in O(n w 2^w). In /usr/share/dict/words, the length of the largest word, w, is 24.

Running a calculation with the numbers (R script), the number of high-level operations is 4e10, about the same order of magnitude as the quadratic algorithm.

Distribution using ggplot2

A third approach is to use a trie. This data structure allows us to store the words from the dictionary in a space-efficient way and we can search for all subsequences using this structure. The trie will have at most 2e6 characters (sum of characters of all words), less because of shared prefixes. Since any valid subsequence has to be a node in the trie, the cost of search for a given word cannot be more than the size of the trie t, so the complexity per word is O(\min(2^w, t)). A back of envelope calculation gives us 2e9. But we’re hoping that the size of the trie will be much less than 2e6.

Before implementing the algorithm, let’s define out trie data structure.

The Trie data structure

A trie is a tree where each node has up to |E| children, where |E| is the size of the alphabet in consideration. For this problem, we’ll use lower case ascii only, so it’s 26. The node has also a flag telling whether there’s a word ending at this node.

Notice that in this implementation of trie, the character is in the edge of the trie, not in the node. The Map structure from the stlib uses a tree underneath so get and set operations are O(log |E|).

The insertion is the core method of the structure. At a given node we have a string we want to insert. We look at the first character of the word. If a corresponding edge exists, we keep following down that path. If not, we first create a new node.

To decide whether a trie has a given string, we just need to traverse the trie until we either can’t find an edge to follow or after reaching the end node it doesn’t have the hasEntry flag set to true:

This and other trie methods are available on github.

The search algorithm

Given a word W, we can search for all its subsequences in a trie with the following recursive algorithm: given a trie and a string we perform two searches: 1) for all the subsequences that contain the first character of current string, in which case we “consume” the first character and follow the corresponding node and 2) for all the subsequences that do not contain the first character of the current string, in which case we “consume” the character but stay at the current node. In pseudo-code:

Search(t: TrieNode, w: string):
    Let c be the first character of w.
    Let wrest be w with the first character removed

    If t contains a word, it's a subsequence of the
    original word. Save it.

    // Pick character c
    Search(t->child[c], wrest)

    // Do not pick character c
    Search(t, wrest)

The implementation in OCaml is given below:


Our experiment consists in loading the words from /usr/share/dict/words into a trie, and then, for each word in the dictionary, look for its subsequences. The full code is available on github.

The code takes 90 seconds to run on my laptop. Not too bad but I’m still exploring ways to improve the performance. One optimization I tried is to, instead of returning an explicit list of strings as mentioned in the search implementation, return them encoded in a trie, since we can save some operations due to shared prefixes. I have that version on github, but unfortunately that takes 240 seconds to run and requires more memory.

Another way is to parallelize the code. The search for subsequences is independent for each word, so it’s an embarrassingly parallel case. I haven’t tried this path yet.

The constructed trie has 8e5 nodes or ~40% of the size of sum of characters.

Subsequences of “thorough”

The question that inspired this analysis was finding all the subsequences of thorough. It turns out it has 44 subsequences, but most of them are boring, that is, single letter or small words that look completely unrelated to the original word. The most interesting ones are those that start with t and have at least three letters. I selected some of them here:

  • tho
  • thoo
  • thoro
  • thorough
  • thou
  • though
  • thro
  • throu
  • through
  • thug
  • tog
  • tou
  • toug
  • tough
  • trough
  • trug
  • tug

The word with most subsequences is pseudolamellibranchiate, 1088! The word cloud at the beginning of the post contains the 100 words with the largest number of subsequences. I tried to find interesting words among these, but they’re basically the largest words – large words have exponentially more combination of subsequences, and hence the chance of them existing in the dictionary is greater. I tried to come up with penalization for the score:

1) Divide the number of subsequences by the word’s length. This is not enough, the largest words still show on top.
2) Apply log2 to the number of subsequences and divide by the word’s length. In theory this should account for the exponential number of subsequences of a word. This turns out to be too much of a penalization and the smallest word fare too well in this scenario.

I plotted the distribution of number of subsequences by word lengths. We can see a polynomial curve but with increased variance:

Generated with this ggplot2

In the chart above, we’d see all points with the same x-value in a single vertical line. One neat visualization trick is to add noise (jitter) so we also get a sense of density.

If we use a box plot instead, we can see a quadratic pattern more clearly by looking at the median for each length.

Generated with this ggplot2

Given this result, I tried a final scoring penalization, by dividing the number of subsequences by the square of the length of the word, but it’s still not enough to surface too many interesting words. Among the top 25, streamlined is the most common word, and it has 208 subsequences.

One interesting fact is that the lowest scoring words are those with repeated patterns, for example: kivikivi, Mississippi, kinnikinnick, curucucu and deedeed. This is basically because we only count unique subsequences.


This was a fun problem to think about and even though it didn’t have very interesting findings, I learned more about OCaml and R. After having to deal with bugs, compilation and execution errors, I like OCaml more than before and I like R less than before.

R has too many ways of doing the same thing and the API is too lenient. That works well for the 80% of the cases which it supports, but finding what went wrong in the other 20% is a pain. OCaml on the other hand is very strict. It doesn’t even let you add an int and a float without an explicit conversion.

I learned an interesting syntax that allows to re-use the qualifier/namespace between several operations when chaining them, for example:

I also used the library Batteries for the first time. It has a nice extension for the rather sparse String module. It allows us to simply do open Batteries but that overrides a lot of the standard modules and that can be very confusing. I was scratching my head for a long time to figure out why the compiler couldn’t find the union() function in the Map module, even though I seemed to have the right version, until I realized it was being overridden by Batteries. From now on, I’ll only use the specific modules, such as BatString, so it’s easy to tell which method is coming from which module.



  • [1] OCaml Tutorials > Map
  • [2] Strings – Batteries included
  • [3] Using batteries when compiling


  • [1] R Tutorial – Histogram
  • [2] Creating plots in R using ggplot2 – part 10: boxplots
  • [3] My R Cheat sheet