Writing JavaScript using OCaml


In this post we’ll explore BuckleScript, a framework that enables developers to write JavaScript applications using OCaml syntax.

Note that BuckleScript is not a way to convert general OCaml code into JavaScript, but rather write JavaScript applications using OCaml’s syntax and type system. The end runtime and libraries are still JavaScript.

For example, if you want to work with dates, you wouldn’t look for the corresponding OCaml library, but instead include the JavaScript Date module in your OCaml code.

Let’s first compare BuckleScript with similar frameworks to understand better why it exists and when to use it. Then we’ll follow with several examples to get a better grasp of BuckleScript and how it translates to JavaScript.

It’s better to have some knowledge of OCaml, and we have written extensively about it, but it’s also easy enough to get the gist of the benefits from our simple examples.


BuckleScript vs js_of_ocaml

js_of_ocaml is another framework that connects the OCaml and JavaScript worlds.

According to [4] both projects aim to compile OCaml code to JavaScript. The differences pointed are:

* js_of_ocaml takes low-level bytecode from OCaml compiler, BuckleScript takes the high-level rawlambda representation from OCaml compiler
* js_of_ocaml focuses more on existing OCaml ecosystem(opam) while BuckleScript’s major goal is to target npm

A simplistic way to see the differences is that BuckleScript is for JavaScript developers to start using better language features from OCaml, while js_of_ocaml is for OCaml developers to be able to run their code in the browser.

BuckleScript vs ReasonML

ReasonML is often mentioned together with BuckleScript, which makes it a bit confusing to understand their differences at first.

Compared do OCaml, ReasonML has a friendlier syntax (for people coming from JS) and better support for JSX (inline XML tags). The difference in syntax is significant that we are really talking about some dialect of OCaml.

Note that BuckleScript and ReasonML are complementary. BuckleScript can compile either OCaml or ReasonML to JavaScript. ReasonML in mostly about the syntax.

BuckleScript vs TypeScript

Typescript is a framework for adding types to a JavaScript codebase, similar to Flow.

At first glance, TypeScript and BuckleScript seem to serve different purpose, but one of the main advantages of using OCaml to write JavaScript applications is to provide type safety.

In [2], Hongbo provides a comparison between the two systems. Some of the pros and cons raised are:


  •  Pros:
    • Designed for JS, easy inter-operate with JS
  • Cons:
    • Compiler slow (not scalable for FB/Google scale)
    • Verbose, very limited type inference
    • Start up time slow (very hard to build traditional build tools)
    • Types are only used for tooling – soundness is not the design goal, not very reliable
    • No code optimizations


  • Pros:
    • Compiles much faster, scales better
    • Sound type system, global type inference
    • Types are used in code optimization, optimizing compiler
    • Outputs to JavaScript but can also generate native backend for X86, ARM
  • Cons:
    • Learning curve is higher compared with TypeScript

The author also suggests that OCaml is a better language than JavaScript, without providing more details.


The easiest way to try out examples is through this BuckleScript playground.

To try it locally, we can follow [5]. It generates an initial bsconfig.json file and the compilation process can be done via npm. The process involve converting a OCaml file (.ml) to a JavaScript file (.bs.js). The latter should be included in your application, but both should be considered part of your codebase and hence committed.


Because most people learning BuckleScript are familiar with JavaScript and less familiar with OCaml, I’ll provide the reverse examples: how to do implement a given JavaScript snippet of code in OCaml?

console.log(‘Hello World’)

How to print to stdout using BucketScript? The most basic program is simply printing a Hello World. In OCaml we would do “Print.printf” to print to stdout, but OCaml modules are not readily available for BuckleScript. We can use the Js.log() function for this purpose:

let _ = Js.log ("Hello World")

view raw
hosted with ❤ by GitHub

This maps to:

console.log("Hello World");

view raw
hosted with ❤ by GitHub

Note that Js log is a library provided by BuckleScript to make the integration with JavaScript more seamless.

Looping over an Array

We can use an imperative-style code in OCaml:

let sum arr =
let v = ref 0 in
for i = 0 to (Array.length arr) 1 do
v := !v + arr.(i)
let () = Js.log (sum [| 10; 20; 30 |])

view raw
hosted with ❤ by GitHub

Note that arrays have a different syntax than in JavaScript: [| 10; 20; 30 |]. If we generate this code we get:

var Caml_array = require("./stdlib/caml_array.js");
function sum(arr) {
var v = 0;
for(var i = 0 ,i_finish = arr.length 1 | 0; i <= i_finish; ++i){
v = v + Caml_array.caml_array_get(arr, i) | 0;
return v;
console.log(sum(/* array */[

view raw
hosted with ❤ by GitHub

The interesting thing here is that we do have access to some basic OCaml libraries, for example Array.

Since we are dealing with a functional language, we might as well use some more idiomatic OCaml code. If we are to translate the code to native JavaScript structures, we have to use functions from the Js module:

let sum arr = Js.Array.reduce (+) 0 arr
let () = Js.log (sum [| 10; 20; 30 |])

view raw
hosted with ❤ by GitHub

Which maps to

function sum(arr) {
return arr.reduce((function (prim, prim$1) {
return prim + prim$1 | 0;
}), 0);
console.log(sum(/* array */[

view raw
hosted with ❤ by GitHub

Looping over a List

So Array is the basic structure to represent a sequence of items in JavaScript but in OCaml it’s List. What happens if we use List instead?

let sum arr = List.fold_left (+) 0 arr
let () = Js.log (sum [10; 20; 30])

view raw
hosted with ❤ by GitHub

We simply dropped the | to use a List instead of Array and now we can use the more standard fold_left instead of reduce. This translates to:

var List = require("./stdlib/list.js");
function sum(arr) {
return List.fold_left((function (prim, prim$1) {
return prim + prim$1 | 0;
}), 0, arr);
console.log(sum(/* :: */[
/* :: */[
/* :: */[
/* [] */0

view raw
hosted with ❤ by GitHub

This is very interesting! We studied functional data structures OCaml extensively in the past and the key concept is that by default data structures are persistent. If we look closely what is being passed to sum(), we see it’s a linked-list like structure: [10, [20, [30, 0]]], the 0 being used to indicate the end.


Another common pattern in JavaScript is to group a bunch of functions inside an Object. The Object name serves as namespace or a module name that can be referenced elsewhere. A natural mapping here are the OCaml modules:

module MyModule : sig
val add: int -> int -> int
val sub: int -> int -> int
end = struct
let add x y = x + y
let sub x y = x y

view raw
hosted with ❤ by GitHub

which maps to:

function add(x, y) {
return x + y | 0;
function sub(x, y) {
return x y | 0;
var MyModule = /* module */[
/* add */add,
/* sub */sub
exports.MyModule = MyModule;

view raw
hosted with ❤ by GitHub

Note that the signature is only used for compilation/transpilation checks and is erased from the final JavaScript code. Another curious thing is that the functions are exported as arrays to MyModule. To me it would make more sense to export them as Object.

Currying Functions

A nice feature from functional languages is the concept of currying, which allow us to perform partial application of functions. For example, if we have a function that adds two numbers, we can derive an increment function that partially applies sum by binding the first value to 1:

module Curry = struct
let add x y = x + y
let inc = add 1

view raw
hosted with ❤ by GitHub

The resulting JavaScript code is:

function add(x, y) {
return x + y | 0;
function inc(param) {
return 1 + param | 0;
var Curry = /* module */[
/* add */add,
/* inc */inc

view raw
hosted with ❤ by GitHub

Note that we can already perform partial application of function in vanilla JavaScript via bind(), but the syntax is not as neat:

const sum = function(a, b) { return a + b; };
const inc = sum.bind(null, 1);

view raw
hosted with ❤ by GitHub

Chaining Functions

Another neat syntax from OCaml is the chaining operator. One can chain functions via the |> operator: the result of the lefthand function is passed as argument to the righthand function.

A common use for this is when we want to apply a series of function in sequence without assigning to a variable.

For example, if we have a function to normalize a given string by converting it to lower case, trimming the leading and trailing whitespaces and also converting intermediate spaces to underscores, we could write, in JavaScript:

function normalize(s) {
s = s.toLowerCase();
s = s.trim();
s = s.replace(/\s/g, "_");
return s;

view raw
hosted with ❤ by GitHub

An alternative would be to nest the calls so we don’t have to repeat the variable, but that would hurt legibility. In OCaml, we could chain these calls:

let normalize s =
Js.String.toLowerCase s |>
Js.String.trim |>
Js.String.replaceByRe [%bs.re "/ /g"] "_"

view raw
hosted with ❤ by GitHub

Note the [%bs.re] tag. It is a macro that allows us to write regexes using JavaScript syntax. We can avoid repeating the module names if they are all the same:

let normalize s = Js.String.(
toLowerCase s |>
trim |>
replaceByRe [%bs.re "/ /g"] "_"

Using JavaScript libraries

One of the main selling points of BuckleScript is that you can adopt it gradually, module by module. This is possible because we can require JavaScript modules inside our OCaml code. For example, if we were to convert the following code that reads a file asynchronously in JavaScript:

fs = require("fs");
fs.readFile("data.json", "utf8", (err, data) => {

view raw
hosted with ❤ by GitHub

We could do:

external readFile :
name:string ->
([ `utf8
| `useAscii [@bs.as "ascii"]
] [@bs.string]) ->
(string -> string -> unit) ->
unit = ""
[@@bs.module "fs"]
let callback error data = Js.log(data)
let _ = readFile ~name:"data.json" `utf8 callback

view raw
hosted with ❤ by GitHub

Here, the OCaml code is more verbose but we provided a stronger contract by typing the function readFile(). The syntax for importing modules is

external <function name alias> :
<function type interface> =
<function name> [[@@bs.module "<module name>"]

view raw
hosted with ❤ by GitHub

Note: if is the same as , the latter can be omitted.

Objects as Maps

In JavaScript Objects are often used either as maps (key-value) or records (entries of distinct fields). In OCaml we can rely on types to enforce the specific use we want via types. In the example below, we declare a map with type string to int. If we try to set a value with a different type we get a compilation error:

let mapping: string Js.Dict.t = Js.Dict.empty ()
(* Error: This expression has type int but an expression was expected of type *)
let () = Js.Dict.set mapping "some_key" 10

view raw
hosted with ❤ by GitHub

Objects as Records

To represent an Object as a record, we can use a OCaml record type syntax:

type person = {
name: string;
age: int;
job: string [@bs.optional];
} [@@bs.deriving abstract]

view raw
hosted with ❤ by GitHub

We added the [@bs.optional] to indicate that a field is optional. We also added the [@@bs.deriving abstract] attribute to indicate it should not be directly instantiated like

{name: "Joe", age: 20, job: "teacher"}

view raw
hosted with ❤ by GitHub

Instead, it generates a “constructor” function. In the same way, the properties of a record are not directly available. They also need to go through intermediate auto-generated accessors:

(* Creating record *)
let joe = person ~name:"Joe" ~age:20 ~job:"Programmer"
let joeNameVerbose = name joe
let joeName = joe |. name

view raw
hosted with ❤ by GitHub

The generated JavaScript code translates to an Object:

var joe = {
name: "Joe",
age: 20,
job: "carpenter"

view raw
hosted with ❤ by GitHub

The interesting thing is that the generated JavaScript Object is mutable, but within the OCaml code, the record cannot be modified. It’s possible to mark it mutable, but the default immutability makes it easier to reason about code.


The benefits being stated, there are two main potential drawbacks of using BuckleScript.

Mixed languages. Adopting BuckleScript will cause the codebase to have a mix of different languages, which makes it harder for new developers to ramp up. Of course this can be mitigated by converting the entire codebase to use OCaml.

Debugging. We’ll be writing code in on language but it’s another language that will end up being executed. If a problem happens in the underlying JavaScript code, how to figure out which OCaml code is generating the faulty code?

BuckleScript tries to solve this issue by preserving the structure of the code as much as possible so that it’s easier to understand what parts maps to what. This works well if we are using the Js wrappers that resembles the JavaScript code patterns, but it’s unclear how easy the structure is preserved if we use more of OCaml persistent data structures or functional programming patterns like currying.

One possible improvement would be to add some traceability to the generated JavaScript code such that you won’t need to look at the JavaScript code most of the time, in the same way that one doesn’t usually need to inspect Assembly code when their C++ application crashes.


In this post we did a comparison of BuckleScript with different frameworks and libraries to understand why and when to use it. Following that, we studied a few basic examples which one might encounter in a day-to-day JavaScript programming and how to express that in OCaml.

Through these examples, we saw the OCaml type system in use, as well as some neat syntax and immutable data structures, which can lead to more readable, succinct and reliable code.


[1] BuckleScript Examples
[2] BuckleScript: An OCaml to JavaScript compiler (Hacker News discussion)
[3] BuckleScript playground
[4] BuckleScript/bucklescript – Comparisons
[5] BuckleScript – New Project

Generalized Tries in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes a generalized trie. In this post we’ll see how tries can be seen as maps over a specific type of keys and how we can construct maps over any type of keys.

We discussed tries in OCaml in a previous post. At first glance, it looks like a special data structure but we’ll now see it’s basically a map.

The basic form of a map is a pair of (key, value). In most programming languages the key has to be a scalar (e.g. int, string) or some complex that can be converted to one (think of hashCode() in Java).

For simplicity, let’s assume our scalar is an int and our value is of generic type T. Also, note that the syntax we present below is not valid OCaml (I find OCaml’s type syntax a bit hard to read). A map with int keys can be represented as:

map<int, T>

If we want a two dimensional key, for example (int, int), we can have a map of maps:

map<int, map<int, T>>

and if the dimensions are not fixed, for example, a list of integers, then we can define a recursive structure for our map

map<list<int>, T> = 
  Node of (option<T> * map<int, mapOverList>)

If we think of strings as lists of characters, a trie is then a map where the key is a list of characters. From the type above we can simply change the key of our map above to a list of characters and for a basic trie T is a boolean

map<string, bool> = 
  Node of (bool * map<char, mapOverList>)

Note that option<bool> is redundant, so we can use bool only.

Maps over trees


We can now generalize the key to more complex recursive types such as trees. Suppose we fave the following type for tree:

tree<int> = (int * tree<int> * tree<int>)

The outermost map is indexed by the root of the tree. The inner maps are indexed by the left and right subtrees respectively:

map<tree<int>, T> =
  map<int, map<tree<int>, map<tree<int>, T>>>

If we expand the key of the second map we get the following:

map<tree<int>, map<tree<int>, T>> =
  map<int, map<tree<int>, map<tree<int>, map<tree<int>, T>>>

It gets pretty involved very quickly but because we traverse these types recursively, the implementation is still simple. Let’s see how to implement these in OCaml. The type of a map over an int tree can be defined as follows:

open Map
module IntMap = Map.Make(Int64)
type 'a tree = Empty | Node of 'a * 'a tree * 'a tree
type key = IntMap.key tree
type 'a trie = Trie of 'a option * 'a trie trie IntMap.t

view raw
hosted with ❤ by GitHub

Note that the outermost map is a regular IntMap, which uses the root element of the map, a scalar, as key.

The search function takes a tree representing the key, and the map. The base case is when the tree is empty, when we’re just past a leaf. The recursive case consists in obtaining the maps in order, first using the root element, then using the left tree and finally using the right tree:

let rec find: 'a. key -> 'a trie -> 'a =
fun tree trie -> match (tree, trie) with
| (Empty, Trie (None, children)) -> raise Not_found
| (Empty, Trie (Some value, children)) -> value
| (Node (root, left, right), Trie (_, children)) ->
let trieForRoot = M.find root children in
let trieForLeft = find left trieForRoot in
find right trieForLeft

view raw
hosted with ❤ by GitHub

Note that the recursive call is non-uniform, so we need explicit annotations, as we discussed previously.

The insertion is similar, expect that when we fail to find a key in any of the maps, we need to first insert it with an empty element before recursing.

Because the Map.find() implementation throws exceptions when a key doesn’t exist, we can wrap the call with a try and if an exception occurs, we can insert the missing key (alternatively we could use Map.mem()).

let rec insert: 'a. key -> 'a -> 'a trie -> 'a trie =
fun tree value trie -> match (tree, trie) with
| (Empty, Trie (_, children)) -> Trie (Some value, children)
| (Node (root, left, right), Trie (trieValue, children)) ->
let trieForRoot = try
M.find root children
Not_found -> empty
let trieForLeft = try
find left trieForRoot
Not_found -> empty
let newTrieForLeft = insert right value trieForLeft in
let newTrieForRoot = insert left newTrieForLeft trieForRoot in
let newChildren = M.add root newTrieForRoot children in
Trie (trieValue, newChildren)

view raw
hosted with ❤ by GitHub

Maps over products and sums

There are two ways a structure can branch out recursively: through combination or through alternative. For a combination, refer to our definition of Node:

Node of 'a * 'a tree * 'a tree

In theory, any combination of values for 'a, 'a tree, 'a tree are possible values for a Node. The * in between the components in fact represent the cartesian product operator. This is also known as product.

For alternative, the type of the tree itself can be either Empty or Node:

type 'a tree = Empty | Node of (...)

In this case, the valid values for a tree is the sum of values of each alternative. Hence, this is also known as sum.

We can generalize the map with keys of any types by looking at their definition. If it’s a product, we end up with nested maps. For example, if

Tk = (Tk1 * Tk2)

then the map over Tk can be defined as

map<Tk, T> = map<Tk1, map<Tk2, T>>

In our example, this came the nested maps 'a trie trie IntMap.t.

For sums, if the type is

Tk = Tk1 | Tk2

the map over Tk would end up as a product of maps:

map<Tk, T> = (map<Tk1, T> * map<Tk2, T>)

In our example, this came the product ('a option) and ('a trie trie IntMap.t). option can be thought as a one-dimensional map.


In this post we saw how a trie can be modeled as a map using the same principles as the ones we use to construct matrices, that is, two-dimensional maps (nested maps). We then generalized the string keys to trees, and implemented the insert/find functions in OCaml. I found it pretty hard to reason about these structures.

We then went a step further and saw how to construct maps based on the key structure. And we learned about product and sum of types when discussing recursive types.

Mutually Recursive Modules in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes a technique called data structure bootstrapping. It’s a way to reuse existing implementation of data structures to construct (bootstrap) new ones.

In one of the examples he creates a new heap implementation with an efficient merge operation using another heap as basis, but it turns out that to implement this, we need to rely on mutually recursive modules, that is, two modules A and B, where A depends on B, and B depends on A.

In this post we’ll study the bootstrapped heap and learn how to implement mutually recursive modules in OCaml.

Heap with efficient merging

Assume we have a heap implementation with O(1) insert, and O(log n) merge, findMin and deleteMin operations. We’ve seen such an implementation with Skewed Binomial Heaps

We’ll see how to construct a new heap implementation which will improve the merge complexity to O(1).

Let’s call the base heap PrimaryHeap and define our heap type as

type 'a heap = Empty | Heap of 'a * ('a heap) PrimaryHeap.heap

view raw
hosted with ❤ by GitHub

this type can be either empty or a node with an element (root) and a primary heap whose element is the bootstrapped heap itself, that is, heap and PrimaryHeap.heap form a mutually recursive types. Note that the above is not a valid OCaml code. We’re using it to explain the theoretical concepts.

We can think of this as a k-ary tree where the element is the root and the children of that node are the subtrees, but these subtrees are stored in a heap instead of an array.

The root element at each node is the smallest among all of the subtrees. Hence, to obtain the minimum element for a heap, findMin, is trivial: we can simply return that element:

let findMin (heap: heap): tv =
match heap with
| Empty -> raise Empty_heap
| Heap(elem, _) -> elem

view raw
hosted with ❤ by GitHub

Merging two bootstrapped heaps is analogous to linking two trees. The only invariant we need to maintain is that the smallest root continues being the root.

let merge (heap1: heap) (heap2: heap) = match (heap1, heap2) with
| (Empty, heap2) -> heap2
| (heap1, Empty) -> heap1
| (Heap (element1, primaryHeap1), Heap (element2, primaryHeap2)) ->
if ((Element.compare element1 element2) < 0)
then Heap(element1, PrimaryHeap.insert heap2 primaryHeap1)
else Heap(element2, PrimaryHeap.insert heap1 primaryHeap2)

view raw
hosted with ❤ by GitHub

Since the primary heap has O(1) insert, the bootstrapped heap has O(1) merge, which was our goal. Note that we can implement insert using merge by creating a singleton node and merging it with an existing heap.

let singleton (elem: tv): heap = Heap(elem, PrimaryHeap.empty)
let insert (elem: tv) (heap: heap): heap =
merge heap (singleton elem)

view raw
hosted with ❤ by GitHub

We need to handle the deletion of the minimum element, which is the more involved operation. It consists in discarding the root of the present node, and finding a new root from the primary heap.

Since each element in the primary heap is a bootstrapped heap, we first obtain the bootstrapped heap containing the smallest element:

(Heap (newMinElem, minPrimaryHeap)) =
PrimaryHeap.findMin primaryHeap

view raw
hosted with ❤ by GitHub

then we remove this node from the primaryHeap, and we merge the minPrimaryHeap back into primaryHeap.

let restPrimaryHeap = PrimaryHeap.deleteMin primaryHeap
in PrimaryHeap.merge minPrimaryHeap restPrimaryHeap

view raw
hosted with ❤ by GitHub

finally we make newMinElem the new root element of our top level bootstrapped heap. The complete code is

let deleteMin (heap: heap) =
match heap with
| Empty -> raise Empty_heap
| Heap(_, primaryHeap) ->
if PrimaryHeap.isEmpty primaryHeap then Empty
let (Heap (newMinElem, minPrimaryHeap)) =
PrimaryHeap.findMin primaryHeap
in let restPrimaryHeap = PrimaryHeap.deleteMin primaryHeap
in Heap (newMinElem, PrimaryHeap.merge minPrimaryHeap restPrimaryHeap)

view raw
hosted with ❤ by GitHub

The only missing part is defining the correct type of the bootstrapped heap.

Mutually Recursive Modules


Drawing Hands by M. C. Escher

Okasaki mentions that recursive structures are not supported in Standard ML (at least at the time my copy of the book was printed), but they are supported in OCaml.

To make modules mutually depend on another, we need to mark it as recursive via the rec keyword, and declaring both modules at the same time by using the and connector. Let’s work with a toy example: two modules Even and Odd, where each depend on the other.

module rec Even = struct
type t = Zero | Succ of Odd.t
and Odd = struct
type t = Succ of Even.t

view raw
hosted with ❤ by GitHub

This will lead to a compilation error:

Error: Recursive modules require an explicit module type.

We need to write the signatures explicitly:

module rec Even : sig
type t = Zero | Succ of Odd.t
end = struct
type t = Zero | Succ of Odd.t
and Odd : sig
type t = Succ of Even.t
end = struct
type t = Succ of Even.t

view raw
hosted with ❤ by GitHub

This blog post from Jane Street describes a way to define mutually recursive modules by only listing its signatures:

module rec Even: sig
type t = Succ of Odd.t
end = Even
and Odd: sig
type t = Succ of Even.t
end = Odd

view raw
hosted with ❤ by GitHub

The OCaml compiler can infer the implementation part from the type definitions, but unfortunately this won’t work if the module has function definitions, which is the case for our heap implementation.

Things get more complicated in our case because the primary heap implementation uses a functor to generate a heap taking the element’s module as parameter. In this case the element’s module is our bootstrapped heap. A valid module definition is given below:

module rec BootstrappedElement: sig
type t = Empty | Heap of Element.t * PrimaryHeap.heap
val compare: t -> t -> int
end = struct
type t = Empty | Heap of Element.t * PrimaryHeap.heap
let compare heap1 heap2 = match (heap1, heap2) with
| (Heap (x, _), Heap (y, _)) -> Element.compare x y
and PrimaryHeap: IHeapWithMerge
with type tv := BootstrappedElement.t = SkewBinomialHeap(BootstrappedElement);;

view raw
hosted with ❤ by GitHub

Let’s understand what is going on.

type t = Empty | Heap of Element.t * PrimaryHeap.heap

is the definition we presented above. We also implement the methods from the Set.OrderedType interface, namely compare, since this is the interface the heap maker expects. The comparison is based solely on the root element.

Then we declare the PrimaryHeap type at the same time, with type IHeapWithMerge, and because tv is unbound in that interface, we need to bind it to BootstrappedElement.t:

PrimaryHeap: IHeapWithMerge with type tv := BootstrappedElement.t

Finally we provide the implementation, using the result of the SkewBinomialHeap() functor having the BootstrappedElement module as element type:

PrimaryHeap (...) = SkewBinomialHeap(BootstrappedElement)

The syntax is pretty involved, but it accomplishes what we wanted. We can further refine this definition by adding

include Set.OrderedType with type t := t

to the BootstrappedElement signature. This includes all the interface of Set.OrderedType.

These newly defined modules are defined within a functor, the BootstrappedHeap, together with the methods we defined above. Like other heap generators, the functor takes a module representing the element type as parameter. In this case we can also allow the primary heap type to be passed as parameter so we don’t have to use SkewBinomialHeap as implementation. Any heap with merge will do.

module BootstrappedHeap
(Element: Set.OrderedType)
(MakeHeap: functor (Element: Set.OrderedType) -> IHeapWithMerge with type tv = Element.t)
: IHeap with type tv = Element.t =
module rec BootstrappedElement: sig
type t = Empty | Heap of Element.t * PrimaryHeap.heap
include Set.OrderedType with type t := t
end = struct
type t = Empty | Heap of Element.t * PrimaryHeap.heap
let compare heap1 heap2 = match (heap1, heap2) with
| (Heap (x, _), Heap (y, _)) -> Element.compare x y
and PrimaryHeap: IHeapWithMerge
with type tv := BootstrappedElement.t = MakeHeap(BootstrappedElement);;
(* Methods definition below… *)

view raw
hosted with ❤ by GitHub

The constructors define within BootstrappedElement are visible within BootstrappedHeap but they need qualification, such as BootstrappedElement.Heap. To avoid repeating this qualifier, we can use:

include BootstrappedElement

The complete implementation for BootstrappedHeap can be found on github.


The idea of using implementations of a given data structure to yield improve implementations is amazing! The mutual recursion nature of the bootstrap heap got me at first, but making analogies with a k-ary tree made it easier to understand.

I was struggling a lot to get the syntax right for the recursive modules required for this implementation until I stumbled upon this github repository, from which I learned many new things about OCaml.


[1] Purely Function Data Structures, Chapter 10 – Chris Okasaki
[2] Jane Street Tech Blog – A trick: recursive modules from recursive signatures
[3] Github yuga/readpfds: bootStrappedHeap.ml

Polymorphic Recursion in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes recursive types that are non-uniform. In this post we’ll learn more about these types, how to implement them in OCaml and see an example by studying the implementation of Random Access Binary Lists using such a construct.

Uniform recursive type

As an example of uniform recursive data structure we have a simple list

Cons(1, Cons(2, Cons(3, Nil)))

Which has a recursive type, for example

type 'a myList = Nil | Cons of 'a * 'a myList

Each element of the list is either Nil (terminal) or it has a value of a polymorphic type 'a, followed a recursive list also of type 'a.

Non-uniform recursive type

Now, say that the type of the recursive list is not the same as the current list? Then we have a non-uniform polymorphic recursive type, for example:

type 'a seq = Nil | Cons of 'a * ('a * 'a) seq

We’ll name this a sequence. A int seq would have the value in the first node would have type int, but the element from the second node would have type (int, int), the third type ((int, int), (int, int)) and so on. This structure is equivalent to a complete binary tree, where the i-th element of seq represents the i-th level of the tree.

An example of value with this type is:

Cons (1, Cons ((2, 3), Cons (((4, 5), (6, 7)), Nil)))

We need a special syntax to type recursive functions that take recursive non-uniform types, because the type of the function called recursively might be a different polymorphic type than the caller. OCaml by default tries to infer the generic types of the function and bind them to specific instances [2]. For example, in

let f: 'a list -> 'a list = fun x -> 13 :: x

OCaml will assume 'a is int and will compile fine. We can see this by pasting that code in the command line, utop.

utop # let f: 'a list -> 'a list = fun x -> 13 :: x;;
val f : int list -> int list =

The function will then not be polymorphic anymore. To prevent OCaml from auto-binding specific type instances, we can use a special syntax introduced in OCaml 3.12 [3]

utop # let f3: 'a. 'a list -> 'a list = fun x -> 13 :: x;;

This time we’ll get a compilation error:

Error: This definition has type int list -> int list which is less general than ‘a. ‘a list -> ‘a list

The important thing is that this allow us binding the recursive calls with different types. According to the Jane Street Tech Blog [3]:

Note that a polymorphic type annotation holds inside the body of a recursive definition as well as outside, allowing what is known as polymorphic recursion, where a recursive call is made at some non-trivial instantiation of the polymorphic type.

So for example, we can write this function to calculate the size of a sequence:

let rec size: 'a. 'a seq -> int = fun xs
| Nil -> 0
| Cons (_, xs) -> 1 + 2 * (size xs)

view raw
hosted with ❤ by GitHub

The problem with this structure is that it can only represent lists of size in the form of 2^k - 1. To work around that, we allow some items to not hold any elements at all, so that each item corresponds to a digit in the binary representation of the size of the list.

type 'a seq = Nil | Zero of ('a * 'a) seq | One of 'a * ('a * 'a) seq

For example, we can now represent a list of 10 elements, as

Zero(One((1, 2), Zero(One(((3, 4), (5, 6)), ((7, 8),(9, 10))), Nil))))


Sequence binary random access list

We can use a sequence to implement a random access binary access list in a concise way.


Inserting an element at the beginning is analogous to incrementing the binary number, that is, starting from the least significant digit, if it’s a zero, we make it one, if it’s one we make it a 0, and carry over a 1, by adding it to the next digit.

The carry over process is simple in this structure. Because the type of an item following an item of type 'a is ('a, 'a), to merge the element to be added with the existing element, we simply make a tuple and pass it to the recursive call.

let rec push: 'a. 'a -> 'a seq -> 'a seq =
fun element digits -> match digits with
| Nil -> One (element, Nil)
| Zero restDigits -> One (element, restDigits)
| One (currentElement, restDigits) ->
Zero (push (element, currentElement) restDigits)

view raw
hosted with ❤ by GitHub

Head and Tail

Removing or retrieving the first element is analogous to decrementing a binary number. If the digit is one, we make it zero and return the element and the resulting list. If it’s a zero, we make a recursive call to get the next available element. However since the returned element is of type ('a, 'a), and our return type is 'a, we only use the first value of the pair.

let rec popAux: 'a. 'a seq -> ('a * 'a seq) = function
| Nil -> raise EmptyListException
| One (elem, Nil) -> (elem, Nil)
| One (elem, restDigits) -> (elem, Zero restDigits)
| Zero (restDigits) ->
let ((left, right), restResult) = popAux restDigits in
(left, One (right, restResult))

view raw
hosted with ❤ by GitHub

Implementing head and tail using popAux is now trivial

let head digits = let (elem, _) = popAux digits in elem;;
let tail digits = let (_, newDigits) = popAux digits in newDigits;;

view raw
hosted with ❤ by GitHub


Finding an element can be done by transforming the problem into smaller instances.

It helps to look at some simple examples. Let’s consider 3 cases.

Case 1. If we had a single element, we either return it if the index is 0, or throw if it’s larger than that.

0: (0)

Case 2. If we have 2 elements,

0: ()
1: (0, 1)

Notice that when we go from the first level to the second, all items “doubled” in size, so we can “transform” this to the single element case by treating pairs as a single element, but since the index has individual elements granularity, we need to transform it by halving it. We reduced it to Case 1.

If our initial index was either 0 or 1, it’s now 0, and we found our element in level 1.

1: (0)

The problem is that we need to return a single element at level 0, not a pair, so we need to undo the transformation. We can use the parity of the original index will to decide which side of the pair to return. If it’s even, we return the first element, otherwise the second one.

Case 3. If we have 3 elements,
0: (0)
1: (1, 2)

and our index is larger than 0, we move to the next level but we need to account for the level we’re skipping, so the transformation of index would look like:
0: ()
1: (0)

which is reduced to Case 2.

These 3 cases can be used to find elements larger than 3. For example, say we have 10 elements and want to find the element at position 6:

0: ()
1: (0, 1)
2: ()
3: (((2, 3), (4, 5)), ((6, 7), (8, 9)))

Step 1. This is Case 2. We need to transform this by treating pairs as elements and halving the index:

1': (0)
2': ()
3': ((1, 2), (3, 4))

Note how this reduced the problem of finding the element at position 3 of a list with size 5. Step 2. We now are in case 3, where we need to skip the current element:

1': ()
2': ()
3': (((0), (1)), ((2), (3)))

Our index is now 2. Step 3. we go one level down

2: ()
3: (0, 1)

With an index of 1. Step 4. Finally, we halve it once again and we finally found the right level that contains our index.

3: (0)

We now need to recurse back to find exactly which element on that level to pick. On Step 4, we can see our index 1 was on the right side of the pair in level 3, so we pick the right side, that is, ((6, 7), (8, 9)).

On Step 3, our index 2 was on the left size of the innermost pair, that is (6, 7). On Step 2, we skipped the current element but didn’t change levels, so there’s no need to choose an element from the pair. Finally, on Step 1, the index 6 was on the left side of the innermost pair, which should return the element with a corresponding index 6.

In general, we can tell which side of the innermost pair to pick by observing that the indexes are ordered from left to right in a given level. And because every level has an even number of elements, we can assume that the first index in the level – or the first element in the first pair – is even. Hence the parity of the index is sufficient to determine which side of the pair to pick.

With this algorithm in mind, the lookup function is quite compact:

let rec lookup: 'a. int -> 'a seq -> 'a =
fun index digits -> match (index, digits) with
| (index, Nil) -> raise IndexOutOfBoundsException
(* Case 1 *)
| (0, One (elem, restDigits)) -> elem
(* Case 2 *)
| (index, One (_, restDigits)) -> lookup (index 1) (Zero restDigits)
(* Case 3 *)
| (index, Zero restDigits) ->
let (left, right) = lookup (index / 2) restDigits
in if index mod 2 == 0 then left else right

view raw
hosted with ❤ by GitHub

The update follows a similar idea as the lookup, but the problem is that we need to return the updated level when returning from the recursion. That is, we need to update the level before returning.

To accomplish that, we can pass a callback, the updater, that encodes which pair we would pick at each level. We start with a function that simply returns the element to be updated

(fun _ -> element)

Then, at each level we create a new updater, which applies the previous updater on the left or right side of the pair, depending on the parity of the index:

let updater' = fun (x, y) -> if index mod 2 == 0
then (updater x, y)
else (x, updater y)

view raw
hosted with ❤ by GitHub

When we finally find the level that has our index, we can apply the function, which has the effect of “narrowing” down the elements from a given level to a single element, replacing the value at the target index and then returning the updated elements when returning from the recursion.

After applying the updater, we return the updated level recursively.

let rec fupdate: 'a. ('a -> 'a) -> int -> 'a seq -> 'a seq =
fun updater index digits -> match (index, digits) with
| (index, Nil) -> raise IndexOutOfBoundsException
| (0, One (elem, restDigits)) -> One (updater elem, restDigits)
| (index, One (elem, restDigits)) ->
push elem (fupdate updater (index 1) (Zero restDigits))
| (index, Zero restDigits) ->
let updater' = fun (x, y) -> if index mod 2 == 0
then (updater x, y)
else (x, updater y)
in Zero (fupdate updater' (index / 2) restDigits)
let rec update index element digits =
fupdate (fun x -> element) index digits

view raw
hosted with ❤ by GitHub

Structural Decomposition

Okasaki introduces this implementation in the context of Structural Decomposition, which is a technique for creating data structures from incomplete ones. In this example, the raw implementation of the sequence can only represent lists of size 2^k, but modeling each node in the sequence to be zero or one, zero not having any elements, we can work around the size restriction.


The implementation of random access binary access list using sequences is very neat, but very hard to understand.

One of the major selling points of shorter code is that it tends to contain less bugs and also less corner cases. On the other hand, if the code is also harder to read and understand, it might be harder to spot bugs.

This post helped me understand a bit more about OCaml’s type system. Digging a little also led me to interesting topics such as Parametric Polymorphism [4] and Existential vs. Universally quantified types [5].


[1] Purely Function Data Structures – Chris Okasaki
[2] Jane Street – Ensuring that a function is polymorphic
[3] Ensuring that a function is polymorphic in Ocaml 3.12
[4] Wikipedia – Parametric polymorphism
[5] Existential vs. Universally quantified types in Haskell

Numerical Representations as inspiration for Data Structures


In this chapter Okasaki describes a technique for developing efficient data structures through analogies with numerical representations, in particular the binary and its variations.

We’ve seen this pattern arise with Binomial Heaps in the past. Here the author presents the technique in its general form and applies it to another data structure, binary random access lists.

Binary Random Access Lists

These lists allows efficient insertion at/removal from the beginning, and also access and update at a particular index.

The simple version of this structure consists in distributing the elements in complete binary leaf trees. A complete binary leaf tree (CBLF) is one that only stores elements only at the leaves, so a tree with height i, has 2^(i+1)-1 nodes, but only 2^i elements.

Consider an array of size n, and let Bn be the binary representation of n. If the i-th digit of Bn is 1, then we have a tree containing 2^i leaves. We then distribute the elements into these trees, starting with the least significant digit (i.e. the smallest tree) and traversing the tree in

For example, an array of elements (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) has 11 elements, which is 1011 in binary. So we have one tree with a single leave (1), a tree with 2 leaves (2, 3) and another containing 8 leaves (4, 5, 6, 7, 8, 9, 10, 11).

We use a dense representation of the binary number, by having explicit elements for the 0 digit. Here’s a possible signature for this implementation:

type 'a tree = Leaf of 'a | Node of {
size: int; (* Number of elements/leaves in the tree – not nodes *)
left: 'a tree; (* left sub-tree *)
right: 'a tree; (* right sub-tree *)
(* Binary representation of the list *)
type 'a digit = Zero | One of 'a tree;;
type 'a t = 'a digit list;;

Inserting a new element consists in adding a new tree with a single leave. If a tree already exists for a given size, they have to be merged into a tree of the next index. Merging two CBLFs of the same size is straightforward. We just need to make them children of a new root. Since elements are stored in pre-order, the tree being inserted or coming from carry over should be the left child.

Looping back to our example, if we want to insert the element 100, we first insert a tree with a single leaf (100). Since the least significant digit already has an element, we need to merge them into a new tree containing (100, 1) and try to insert at the next position. A conflict will arise with (2, 3), so we again merge them into (100, 1, 2, 3) and try the next position. We finally succeed in inserting at position 2, for a new list containing trees like (100, 1, 2, 3) and (4, 5, 6, 7, 8, 9, 10, 11).

The complexity of inserting an element is O(log n) in the worst case which requires merging tree for all digits (e.g. if Bn = 111...111). Merging two trees is O(1).

let size tree = match tree with
| Leaf _ -> 1
| Node ({size}) -> size
let link tree1 tree2 = Node {
size = (size tree1) + (size tree2);
left = tree1;
right = tree2;
let rec pushTree tree digits = match digits with
| [] -> [One tree]
| Zero :: restDigits -> (One tree) :: restDigits
| (One currentTree) :: restDigits ->
Zero :: (pushTree (link tree currentTree) restDigits)
let push element digits = pushTree (Leaf element) digits;;

Removing the first element is analogous to decrementing the number, borrowing from the next digit if the current digit is 0.

Searching for an index consists in first finding the tree containing the index and then searching within the tree. More specifically, because the elements are sorted beginning from the smallest tree to the largest, we can find the right tree just by inspecting the number of elements in each tree until we find the one whose range includes the desired index. Within a tree, elements are stored in pre-order, so we can find the right index in O(height) of the tree.

After finding the right index, returning the element at that index is trivial. Updating the element at a given index requires rebuilding the tree when returning from the recursive calls.

let rec updateTree index element tree = match tree with
| Leaf _ ->
if index == 0 then (Leaf element) else raise IndexOutOfBoundsException
| (Node {size; left; right}) -> if index < size / 2
then Node {size; left = updateTree index element left; right}
else Node {
right = updateTree (index size / 2) element right;
let rec update index element digits = match digits with
| [] -> raise IndexOutOfBoundsException
| Zero :: restDigits -> Zero :: (update index element restDigits)
| (One tree) :: restDigits ->
if index < size tree
then (One (updateTree index element tree)) :: restDigits
else (One tree) :: (update (index (size tree)) element restDigits)

Okasaki then proposes a few different numbering systems that allow to perform insertion/removal in O(1) time. Here we’ll only discuss the less obvious but more elegant one, using skew binary numbers.


Skew Binary Random Access Lists

A skew binary number representation supports the digits 0, 1 and 2.

The weight of the i-th digit is 2^(i+1) - 1. In its canonical form, it only allows the least significant non-zero digit to be 2.



Decimal and Skewed Binary

It’s possible to show this number system offers a unique representation for decimal numbers. See the Appendix for a sketch of the proof and an algorithm for converting decimals to skewed binary numbers.

Incrementing a number follows these rules:

  • If there’s a digit 2 in the number, turn it into 0 and increment the next digit. By definition that is either 0 or 1, so we can safely increment it without having to continue carrying over.
  • Otherwise the least significant digit is either 0 or 1, and it can be incremented without carry overs.

The advantage of this number system is that increments (similarly, decrements) never carry over more than once so the complexity O(1), as opposed to the worst-case O(log n) for regular binary numbers.

A skew binary random access list can be implemented using this idea. We use a sparse representation (that is, not including 0s). Each digit one with position i corresponds to a tree with (2^(i+1) - 1) elements, in this case a complete binary tree with height i+1. A digit 2 is represented by two consecutive trees
with same weight.

type 'a node = Leaf of 'a | Node of {
element: 'a;
left: 'a node; (* left sub-tree *)
right: 'a node; (* right sub-tree *)
type 'a tree = {size: int; tree: 'a node};;
type 'a t = 'a tree list;;

view raw
hosted with ❤ by GitHub

Adding a new element to the beginning of the list is analogous to incrementing the number, which we saw can be done in O(1). Converting a digit 0 to 1 or 1 to 2, is a matter of prepending a tree to a list. To convert a 2 to 0 and increment the next position, we need to merge two trees representing it with the element to be inserted. Because each tree is traversed in pre-order, we make the element the root of the tree.

let push element ls = match ls with
| {size = size1; tree = tree1} ::
{size = size2; tree = tree2} ::
rest -> if size1 == size2 (* First non-zero digit is 2 *)
then {
size = 1 + size1 + size2;
tree = Node {element; left = tree1; right = tree2}
} :: rest
(* Add a tree of size 1 to the beginning, analogous to converting a
digit 0 to 1, or 1 to 2 *)
else {size = 1; tree = Leaf element} :: ls
| _ -> {size = 1; tree = Leaf element} :: ls

view raw
hosted with ❤ by GitHub

Elements are inserted in pre-order in each tree, so when searching for an
index, we can first find the right tree by looking at the tree sizes and within a tree we can do a “binary search” in O(height) of the tree.

let rec updateTree index newElement tree size = match tree with
| Leaf element -> Leaf newElement
| Node {element; left; right} ->
let newSize = ((size + 1) / 2) 1 in
if index == 0 then Node {element = newElement; left; right}
else if index <= newSize then
Node {
left = updateTree (index 1) newElement left newSize;
Node {
right = updateTree (index newSize 1) newElement right newSize;
let rec update index element ls = match ls with
| [] -> raise IndexOutOfBoundsException
| {size; tree} :: rest -> if index < size
then {size; tree = updateTree index element tree size} :: rest
else {size; tree} :: (update (index size) element rest)

view raw
hosted with ❤ by GitHub

Binomial Heaps

In this chapter, this technique is also applied to improve the worst case runtime of insertion of binomial heaps. The implementation, named Skewed Binomial Heap, is on github.


This chapter demonstrated that binary representations are a useful analogy to come up with data structures and algorithms, because they’re simple. This simplicity can lead to inefficient running times, though. Representations such as skewed binary numbers can improve the worst case of some operations with the trade-off of less intuitive and more complex implementations.

Appendix A – Proof

Sketch of the proof. First, it’s obvious that two different decimals cannot map to the same binary representation. Otherwise the same equation with the same weights would result in different values. We need to show that two binary representations do not map to the same decimal.

Suppose it does, and let them be B1 and B2. Let k be the largest position where these number have a different digit. Without loss of generality, suppose that B1[k] > B2[k].

First case. suppose that B1[k] = 1, and B2[k] = 0 and B2 doesn’t have any digit 2. B1 is then at least M + 2^{k+1} - 1, while B2 is at most M + \sum_{i = 1}^{k} (2^{i} - 1) which is M + 2^{k + 1} - k (M corresponds to the total weight of digits in positions > k). This implies that B2 < B1, a contradiction.

Second case. suppose that B1[k] = 1, but now B2 does have a digit 2 at position j. It has to be that j < k. Since only zeros follow it, we can write B2‘s upper bound as

M + \sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1

Since 2(2^{j + 1} - 1) < 2^{j + 2} - 1, we have

\sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1 < \sum_{i = j + 2}^{k} (2^{i} - 1) + 2^{j + 2} - 1

We can continue this argument until we get that B2 is less than M + 2(2^{k} - 1) which is less than M + 2^{k + 1} - 1, B1.

Third case. Finally, suppose we have B1' such that B1'[k] = 2, and B2'[k] = 1. We can subtract 2^{k+1} - 1 from both and reduce to the previous case. ▢

Appendix B – Conversion algorithm

Converting from a decimal representation to a binary one is straightforward, but it’s more involved to do so for skewed binary numbers.

Suppose we allow trailing zeros and have all the numbers with k-digits. For example, if k=2, we have 00, 01, 02, 10, 11, 12 and 20. We can construct the numbers with k+1-digits by either prefixing 0 or 1, and the additional 2 followed by k zeros. For k=3, we have 000, 001, 002, 010, 011, 012, 020, 100, 101, 102, 110, 111, 112, 120 and finally 200.

More generally, we can see there are 2^(k+1) - 1 numbers with k digits. We can construct the k+1 digits by appending 0 or 1 and then adding an extra number which is starts 2 and is followed by k zeros, for a total of 2^(k+1) - 1 + 2^(k+1) - 1 + 1 = 2^(k + 2) - 1, so we can see this invariant holds by induction on k, and we verify that is true for the base, since for k = 1 we enumerated 3 numbers.

This gives us a method to construct the skewed number representation if we know the number of its digits say, k. If the number is the first 2^(k) - 1 numbers, that is, between 0 and 2^k - 2, we know it starts with a 0. If it’s the next 2^(k) - 1, that is, between 2^k - 1 and 2^(k+1) - 3, we know it starts with a 1. If it’s the next one, exactly 2^(k+1) - 2, we know it starts with a 2.

We can continue recursively by subtracting the corresponding weight for this digit until k = 0. We can find out how many digits a number n has (if we’re to exclude leading zeros) by find the smallest k such that 2^(k+1)-1 is greater than n. For example, for 8, the smallest k is 3, since 2^(k+1)-1 = 15, and 2^(k)-1 = 7.

The Python code below uses these ideas to find the skewed binary number representation in O(log n):

def decimalToSkewedBinaryInner (n, weight):
if n < weight:
return (n, [])
rest, skewDigits = decimalToSkewedBinaryInner(n, 2*weight + 1)
if rest == 2*weight :
return (0, skewDigits + [2])
elif rest >= weight :
return (rest weight, skewDigits + [1])
return (rest, skewDigits + [0])
def decimalToSkewedBinary (n):
if n == 0:
return [0]
remainder, digits = decimalToSkewedBinaryInner(n, 1)
assert remainder == 0
return digits

One might ask: why not OCaml? My excuse is that I already have a number theory repository in Python, so it seemed like a nice addition. Converting this to functional code, in particular OCaml is easy.

This algorithm requires an additional O(log n) memory, while the conversion to a binary number can be done with constant extra memory. My intuition is that this is possible because the weights for the binary numbers are powers of the same number, 2^k, unlike the skewed numbers’ weights. Is it possible to work around this?

Lazy Rebuilding


In this chapter Okasaki starts with a common technique data structures use to achieve efficient complexity times. A classic example are balanced binary search trees, which, on one hand, need to be balanced to avoid degenerated cases (think of a binary tree which can degenerate to a path), but on the other hand, balancing is usually too expensive to perform at every operation.

The tradeoff is to allow a certain degree of “imbalance”, such that the big-O complexity of operations does not degenerate, and to let us perform re-balances only every so often, in a way that leads to an efficient amortized complexity. Structures such as AVL trees and Red-black trees make use of this technique.

The general technique is named batched rebuilding. The main issue with it though is that we saw that amortized analysis does not play well persistent data structures.

To address that problem, a technique was developed by Mark Overmars, called global rebuilding.

Global Rebuilding

The basic idea is to keep two copies of a structure in parallel, one of which we perform incremental updates (write) and the other is used for read operations. After a few operations, we copy the write version into the read-only version. The idea is that the incremental updates are cheap, but might put the structure in a state that is not suitable for reading. This allows for distributing the costs across operations such that each operation has worst case efficient bounds.

The major downside of this technique is the complexity and corner cases, because we also need to account for changes to the structure that renders the write copy of the structure invalid.

We’ll now describe yet another implementation of queues that use this technique, developed by Robert Hood and Robert Melville. Similar to other queue implementations, this structure maintains a front and rear lists, the latter in reverse order. It also has the invariant that the rear list cannot be larger than the front.

When that happens we must perform a rotation, which consists in reversing the rear queue and appending it to the end of the front queue. We cannot perform this operation piecemeal, because we only have efficient access to the first element of a list. The operation we can do partially is concatenating the reverse of a list to the beginning of another list, that is, given xs and ys, we can do ~xs + ys, where ~ represents the reverse of a list. Note we can also reverse a list, that is xs to ~xs piecemeal by having ys = [].

Now, to achieve our goal which is xs + ~ys, we reverse both xs and ys, then concatenate the reverse of ~xs to ~ys:

1) Start with xs and ys
2) Reverse both: ~xs and ~ys
3) Then (~(~xs)) + ~ys which is xs + ~ys

We can perform these operations one step at a time by using a “state machine”, we first start with a state Reversing(xs, ys) which will reverse xs and ys at the same time into ~xs and ~ys, at which point we switch to the state Appending(~xs, ~yx) which will concatenate the xs into ~ys, so then we get to Done(xs + ~ys).

The problem of keeping a separate state is that it might not be accurate by the time it’s done. In particular, if we remove an element from the front, the Appending step can be shortcut. We can keep a count of how many of the elements in ~xs of Appending are still present. If, by the time we’re appending the number of present elements goes to 0, the last elements of ~xs (that is, the first elements of xs) have been removed, so they do not need to be transferred to ~ys.

A possible implementation of the states is:

type 'a rotationState =
| Reversing of {
validCount: int;
front: 'a list;
rear: 'a list;
frontReversed: 'a list;
rearReversed: 'a list;
| Appending of {
validCount: int;
front: 'a list;
rear: 'a list;
| Done of 'a list

view raw
hosted with ❤ by GitHub

The Idle is just a placeholder to make sure the state machine can keep going to the next state even when we’re not currently performing a rotation. The state transition definition is given by:

let nextState rotationStep =
match rotationStep with
| Reversing ({
front = firstElem :: restFront;
rear = lastElem :: restRear;
}) -> Reversing ({
validCount = validCount + 1;
front = restFront;
frontReversed = firstElem :: frontReversed;
rear = restRear;
rearReversed = lastElem :: rearReversed;
| Reversing ({
front = [];
(* Invariant: restRear must be empty *)
rear = lastElem :: restRear;
}) -> Appending ({
front = frontReversed;
rear = lastElem :: rearReversed;
| Appending ({validCount = 0; front; rear}) -> Done rear
(* Transfer one element of front to rear *)
| Appending ({validCount; front = elem :: restFront; rear}) ->
Appending ({
validCount = validCount 1;
front = restFront;
rear = elem :: rear;
| rotationStep -> rotationStep (* No-op *)

view raw
hosted with ❤ by GitHub

One important detail here is that Appending ({validCount = 0; front; rear}) must appear before the other matching rule for Appending, because the other one includes this.

When we remove an element from the front of the queue we need to update the number of valid elements in the front of the rotation state:

let invalidate rotationStep = match rotationStep with
| Reversing ({validCount; front; frontReversed; rear; rearReversed})
-> Reversing ({
validCount = validCount 1;
| Appending ({validCount = 0; rear = lastElem :: restRear})
-> Done restRear
| Appending ({validCount; front; rear}) ->
Appending ({validCount = validCount 1; front; rear})
| rotationStep -> rotationStep

view raw
hosted with ❤ by GitHub

Similar to the Real-time queues, we can guarantee constant time worst case for all the operations if we execute the state machine twice for each operation. The check() function verifies if we need a rotation and also executes the nextStep() function twice.

let processStateFromQueue {frontSize; front; rearSize; rear; rotation} =
let newState = (nextState (nextState rotation)) in match newState with
| Done newFront ->
{frontSize; front = newFront; rearSize; rear; rotation = Idle}
| _ -> {frontSize; front; rearSize; rear; rotation = newState}
let check queue = match queue with
{frontSize; front; rearSize; rear; rotation} ->
if rearSize <= frontSize then processStateFromQueue queue
(* Initiate the rotation process. *)
let newState = Reversing ({
validCount = 0;
frontReversed = [];
rearReversed = [];
in processStateFromQueue {
frontSize = frontSize + rearSize;
rearSize = 0;
rear = [];
rotation = newState;

view raw
hosted with ❤ by GitHub

The other operations are then straightforward. The only thing is that pop() must call the invalidate() function because it’s decreasing the size of front:

let push elem {frontSize; front; rearSize; rear; rotation} =
check {
rearSize = rearSize + 1;
rear = elem :: rear;
let peek {frontSize; front; rearSize; rear; rotation} = match front with
| [] -> raise Empty_queue
| firstElem :: restFront -> firstElem
let pop {frontSize; front; rearSize; rear; rotation} = match front with
| [] -> raise Empty_queue
| firstElem :: restFront ->
let newState = invalidate rotation in
check {
frontSize = frontSize 1;
front = restFront;
rotation = newState;

view raw
hosted with ❤ by GitHub

The full, up-to-date implementation in on github.

Lazy Rebuilding

As we can see, the Hood-Melville implementation is quite involved. Okasaki argues that making use of lazy evaluation and memoization simplifies the implementation. We can indeed see that the implementation of Real-time queues is much cleaner.

One simplification is that we don’t have to sync the write copy with the read copy. We just need to make sure to execute items from the schedule by the time they are needed. Memoization will take care of the rest.

Another advantage in this case is that reversing and concatenating lazily evaluated lists does not require an intermediate reversal of the front, which means that we can remove the front element without the need to invalidate any items in the schedule.

The author provided more examples of lazy rebuilding for deques (double-ended queues) by first presenting an amortized version using the Banker’s method and then a lazy rebuilding version. The Banker’s deque and Real-time deque are also on my github.


In this chapter the technique behind Real-time queues was formalized. I really enjoy the organization of the book in which the author presents a data structure and from there he extracts a generic pattern of technique that can be applicable to other cases.

When I was studying data structures I don’t recall learning about general techniques underlying existing data structures, such as batched rebuilding for AVL trees and Red-black trees. That would have been interesting.

Eliminating Amortization


In this chapter Okasaki presents techniques to convert persistent data structures with amortized bounds to worst-case bounds. A few reasons to want worst-case guarantees include:

  • Real-time systems, where an expensive operation can cause the system to miss a hard deadline, even if it’s efficient when amortized over time.
  • Parallel systems, in which we want avoid one processor to take more time than the other if it happens to execute the expensive operation.

The intuition is to make the theoretical amortized analysis part of the actual implementation. In the amortized analysis, we saw either through the Banker’s method or the Physicist method the idea of paying debt ahead of time so by the time an expensive operation takes place, we could show it’s cost had already been distributed throughout previous operations.

To eliminate the amortization, we can literally pay off these costs when running the algorithm. We do it through a structure called schedule, which contains the unevaluated operations on the data structure. Whenever we perform an operation, we evaluate one or more items of the schedule. Due to memoization, by the time we actually need the evaluated structure, it will be evaluated.

Often the amortized analysis can dictate how the execution of the schedule is performed. For example, if the analysis tells us to pay off 2 units of debt on every action, that can be translated to executing 2 items from the schedule on every action.

The main challenge in this conversion is to modify the data structure in such a way it can be partially executed.

We’ll discuss an example using queues and then binomial heaps.

Real-time Queues

For the queue structure, the costly operation is the rotation that takes place when we need to combine the rear with the front. It’s a monolithic operation, so we need to change that.

Let’s start by defining the structure. It’s similar to the stream queue, except that we don’t need the rear to be a stream and we have a schedule field, which is also a stream. The idea is that whenever a rotation happens, it will be “scheduled” to be executed little by little.

type 'a realTimeQueue = {
front: 'a stream;
rear: 'a list;
schedule: 'a stream

view raw
hosted with ❤ by GitHub

The rotation() function is the most complicated part of the structure. The invariant is that we only call this function when |rear| = |front| + 1.

We construct a stream with the elements of rear reversed, newSchedule and on the return step of the recursion we append the elements of front to that stream.

Note that this is performed lazily, so a call to rotate only executes one step.

Rotate the queue to generate a new front of the queue by reversing the
rear and appending it. Lazily evaluated.
let rec rotate ({ front; rear; schedule }: 'a realTimeQueue): 'a stream =
match rear with
This should never happen with a valid queue because rotate is only called
when |rear| > |front|
| [] -> raise Empty_queue
| last_elem :: rest_rear ->
let newSchedule = Stream.insert last_elem schedule in
match front with
At this point, newSchedule contains the rear of the queue reversed
| lazy Nil -> newSchedule
| lazy (StreamCell (first_elem, rest_front)) ->
lazy (
let newFront = rotate {
front = rest_front;
rear = rest_rear;
schedule = newSchedule
} in
StreamCell (first_elem, newFront)

view raw
hosted with ❤ by GitHub

Now we have the execution of the schedule. It serves two purposes. One is to execute an item from the schedule and the other is to trigger a rotation when the is schedule empty.

The schedule being empty is a proxy to when |rear| = |front| + 1. Why? Because when the schedule is populated, it has the same size of front (they’re both assigned the same stream), and rear is empty. Whenever we insert an element, increasing the size of rear by one, or remove an element, reducing the size of front by one, we decrease the difference between |front| - |rear| by 1, and so the size of the schedule is decreased by 1.

Implementation-wise, maybe it would be more clear if we checked for the empty schedule outside and only called exec() is it was not empty.

* exec evaluates the first element of the schedule stream. Because of memoization, this means that
* whenever we evaluate 'front', we guarantee that all operations are already memoized.
let exec ({ front; rear; schedule }: 'a realTimeQueue) =
match schedule with
| lazy (StreamCell (_, rest_schedule)) ->
{ front; rear; schedule = rest_schedule }
(* Due to invariants, this means that |rear| > |front| *)
| lazy Nil ->
(* Now newFront = front @ (rev rear) *)
let newFront = rotate {front; rear; schedule = Stream.empty} in
{front = newFront; rear = []; schedule = newFront}

view raw
hosted with ❤ by GitHub

With these in place, the usual operations for queue are straightforward. The ones that mutate the queue, push() and pop(), need to make sure to call exec().

let push
(elem: 'a)
({ front; rear; schedule }: 'a realTimeQueue):
('a realTimeQueue) =
exec { front; rear = elem :: rear ; schedule }
let pop ({ front; rear; schedule }: 'a realTimeQueue): 'a realTimeQueue =
match front with
| lazy Nil -> raise Empty_queue
| lazy (StreamCell (_, rest_front)) ->
exec { front = rest_front ; rear ; schedule }
let peek ({ front; rear; schedule }: 'a realTimeQueue): 'a =
match front with
| lazy Nil -> raise Empty_queue
| lazy (StreamCell (first_elem, _)) -> first_elem

view raw
hosted with ❤ by GitHub

Scheduled Binomial Heaps

We discussed Binomial Heaps in a previous post. The idea is that a binomial heap is a list of binomial trees, an a binomial tree is defined recursively based on it’s rank.

The heap is represented by a binary number (as a list of digits). If the k-th digit of the number is 1, it indicates the existence of a binomial tree of rank k (containing 2^(k-1)). A heap with n elements, has a unique representation, which is the binary representation of n.

For example, if n = 10, then the binary representation of the heap is 1010, meaning it contains a binomial tree of rank 2 (2 elements), and one with rank 4 (8 elements), holding 10 elements in total.

Inserting an element into the heap is analogous to incrementing a binary number by 1. We first create a binomial tree with rank 0 containing that element, and try to insert it into the beginning of the digits list. If the position we want to insert is occupied, we need to merge the trees, to generate a tree with a higher rank, which is analogous to the carry over operation when adding two binary numbers.

The schedule is a list of all the numbers generated when any operation is performed.

The structure for the heap is then the following:

Binomial tree corresponding to a digit in the heap.
type tree = Node of Element.t * tree list;;
type tv = Element.t
A heap with n elements can be associated to the binary representation of
n. The 0's correspond to no tree, while the 1s in position i correspond to
trees with size 2^i
type digit = Zero | One of tree;;
type schedule = digit stream list;;
type heap = {
digits: digit stream;
schedule: schedule;

view raw
hosted with ❤ by GitHub

As we discussed above, the insertion is analogous to incrementing the binary number. But because the digits are a stream, we need to deal with lazy evaluation:

Links two trees of same rank r. The resulting tree has rank r + 1
let link (tree1: tree) (tree2: tree) = match (tree1, tree2) with
(Node (node1, children1), Node (node2, children2)) ->
if (Element.compare node1 node2) <= 0
then Node (node1, tree2 :: children1)
else Node (node2, tree1 :: children2)
Insert a tree into a stream of digits. This assumes that
the rank of the tree being inserted has the rank corresponding to the
position of the current digits.
This is analogous to the carrying over operation of adding a 1 to a binary
number. For example, if we are to add 1 to 1011, then we'll have
101(11) -> match One, link -> 10(11)0
10(11)0 -> match One, link -> 1(1)00
1(1)00 -> match Zero -> 1100
let rec insertTree (tree: tree) (digits: digit stream): digit stream =
match digits with
| lazy Nil -> Stream.empty |> Stream.insert (One tree)
| lazy (StreamCell (firstDigit, rest)) -> match firstDigit with
| Zero -> Stream.insert (One tree) rest
| One firstTree ->
lazy (StreamCell (Zero, (insertTree (link tree firstTree) rest)))

view raw
hosted with ❤ by GitHub

Executing the schedule consists in evaluating one digit from the first number on the schedule. The key is to avoid evaluating the part of the number that already has been evaluated. It’s possible to show this happens when we find the first digit one. The intuition is that the trailing zeros from the current number were 1’s before the increment, so there was a mutation (linking) while the remaining digits were not modified by that operation.

For example, if we have the number 101011, an increment causes it to become 101100. The two trailing zeros in 101100 correspond to a linking of the binomial tree.

Execute (evaluate) the first item of the schedule. If the first digit of
the item is Zero, we re-schedule the rest of the digits. If it's a One, it
means that the remaining digits have been executed, so we just discard it.
let execSchedule
(schedule: digit stream list)
: digit stream list = match schedule with
| [] -> []
| firstItem :: rest -> match firstItem with
| lazy (StreamCell (Zero, job)) -> job :: rest
| _ -> rest

view raw
hosted with ❤ by GitHub

Finally, inserting a new element consists in creating a binomial tree of ranking 0, insert it, and then execute the schedule.

Inserts an element in the heap.
let insert (elem: tv) (heap: heap): heap =
match heap with {digits; schedule} ->
let newDigits = insertTree (Node (elem, [])) digits in
let newSchedule = execSchedule (newDigits :: schedule) in
{digits = newDigits; schedule = newSchedule}

view raw
hosted with ❤ by GitHub

The full code is available on github.

Lessons learned

Because we now have several different implementations of queues, I wanted to implement tests that were applicable to any queue implementing an “interface”.

One way to do that is to put the interface in a separate module, IQueue.ml and have each implementation require it as their module type:

open IQueue;;
module Queue_stream: IQueue =

view raw
hosted with ❤ by GitHub

To make the test work with any module implementing the IQueue interface, we can use a functor (module transformation) and

(* TestBase.ml *)
module MakeTest(Queue: IQueue) =
let testSingleInsertion text_ctx =
let result = Queue.push 10 Queue.newEmpty in
let resultAsList = Queue.toList result in
~msg:"Should insert one element properly"
let run = run_test_tt_main suite
(* RealTimeQueueTest.ml *)
open RealTimeQueue;;
open TestBase;;
module RealTimeQueueTest = TestBase.MakeTest(RealTimeQueue);;
let () =

view raw
hosted with ❤ by GitHub

Other things I’ve learned were matching lazy patterns. Matching a lazily evaluated variable with the keyword lazy forces the evaluation, so we can use a cleaner syntax, for example:

(* Before *)
| value ->
let forcedValue = Lazy.force value in match forcedValue with
(* After *)
| lazy forcedValue -> match forcedValue with

view raw
hosted with ❤ by GitHub

Another syntactic construct that helps with legibility is the record. The examples in Okazaki’s book use tuples for most of composed structs, but I prefer the more verbose and explicit records.

(* Tuple *)
type 'a realTimeQueue = 'a stream * 'a list * 'a stream
(* Record *)
type 'a realTimeQueue = {
front: 'a stream;
rear: 'a list;
schedule: 'a stream

view raw
hosted with ❤ by GitHub

Finally, another lesson learned is that adding an insertion operation to the Stream API is likely wrong. The idea of the stream is that is avoids evaluation of its tail, so the whole block has to lazily evaluated.

For example, in the queue implementation, in the first block, we will evaluate all the rotation and then make the result lazy, which is not what we want.

(* Incorrect Stream.insert() version *)
let newFront = rotate {
front = rest_front;
rear = rest_rear;
schedule = newSchedule
} in
Stream.insert first_elem newFront (* ==> lazy StreamCell (first_elem, newFront) *)
(* Correct version *)
lazy (
let newFront = rotate {
front = rest_front;
rear = rest_rear;
schedule = newSchedule
} in
StreamCell (first_elem, newFront)

view raw
hosted with ❤ by GitHub


Eliminating evaluation is a very interesting technique, but it has limited application is practice. It complicates the implementation and, as I learned, it can be tricky to spot bugs (for example, the one in which we’re evaluating the rotate() function) that would only show up if we profile the data structure.

Largest sets of subsequences in OCaml

I’ve noticed that there is this set of words in English that look very similar: tough, though, through, thought, thorough, through and trough. Except thought, they have one property in common: they’re all subsequence of thorough. It made me wonder if there are interesting sets of words that are subsequences of other words.

Word cloud made with Jason Davie’s tool (https://www.jasondavies.com/wordcloud/)

This post is an attempt to answer a more general question: given a list of words, what is the largest set of these words such that they’re subsequences of a given word?

A word A is a subsequence of a word B if A can be obtained by removing zero or more characters from B. For example, “ac” is a subsequence of “abc”, so is “bc” and even “abc”, but not “ba” nor “aa”.

A simple algorithm to determine if a word A is a subsequence of another is to start with 2 pointers at the beginning of each word, pA and pB. We move pB forward until pA and pB point to the same character. In that case we move pA forward. A is a subsequence of B if and only if we reach the end of A before B. We could then iterate over each word W and find all the words that are subsequences of W. If the size of the dictionary is n, and the size of the largest word is w, this algorithm would run in O(n^2 w).

For English words, we can use entries from /usr/share/dict/words. In this case, n is around 235k (wc -l /usr/share/dict/words), so a quadratic algorithm will take a while to run (around 5e10 operations).

Another approach is to generate all subsequences of words for a given word W and search the dictionary for the generated word. There are O(2^w) subsequences of a word of length w. If we use a hash table, we can then do it in O(n w 2^w). In /usr/share/dict/words, the length of the largest word, w, is 24.

Running a calculation with the numbers (R script), the number of high-level operations is 4e10, about the same order of magnitude as the quadratic algorithm.

Distribution using ggplot2

A third approach is to use a trie. This data structure allows us to store the words from the dictionary in a space-efficient way and we can search for all subsequences using this structure. The trie will have at most 2e6 characters (sum of characters of all words), less because of shared prefixes. Since any valid subsequence has to be a node in the trie, the cost of search for a given word cannot be more than the size of the trie t, so the complexity per word is O(\min(2^w, t)). A back of envelope calculation gives us 2e9. But we’re hoping that the size of the trie will be much less than 2e6.

Before implementing the algorithm, let’s define out trie data structure.

The Trie data structure

A trie is a tree where each node has up to |E| children, where |E| is the size of the alphabet in consideration. For this problem, we’ll use lower case ascii only, so it’s 26. The node has also a flag telling whether there’s a word ending at this node.

module ChildList = Map.Make(Char);;
1. Map of chars to child nodes
2. Whether this node represent a word
type trie = Node of trie ChildList.t * bool;;

view raw
hosted with ❤ by GitHub

Notice that in this implementation of trie, the character is in the edge of the trie, not in the node. The Map structure from the stlib uses a tree underneath so get and set operations are O(log |E|).

The insertion is the core method of the structure. At a given node we have a string we want to insert. We look at the first character of the word. If a corresponding edge exists, we keep following down that path. If not, we first create a new node.

let rec insert (s: char list) (trie: trie) : trie =
let Node(children, hasEntry) = trie in match s with
| [] -> Node(children, true)
| first_char :: rest ->
let currentChild = if ChildList.mem first_char children
then (ChildList.find first_char children)
else empty
let newChild = insert rest currentChild in
let newChildren = ChildList.add first_char newChild children in

view raw
hosted with ❤ by GitHub

To decide whether a trie has a given string, we just need to traverse the trie until we either can’t find an edge to follow or after reaching the end node it doesn’t have the hasEntry flag set to true:

let rec has (s: char list) (trie: trie): bool =
let Node(children, hasEntry) = trie in match s with
| [] -> hasEntry
| first_char :: rest ->
if ChildList.mem first_char children then
has rest (ChildList.find first_char children)
else false

view raw
hosted with ❤ by GitHub

This and other trie methods are available on github.

The search algorithm

Given a word W, we can search for all its subsequences in a trie with the following recursive algorithm: given a trie and a string we perform two searches: 1) for all the subsequences that contain the first character of current string, in which case we “consume” the first character and follow the corresponding node and 2) for all the subsequences that do not contain the first character of the current string, in which case we “consume” the character but stay at the current node. In pseudo-code:

Search(t: TrieNode, w: string):
    Let c be the first character of w.
    Let wrest be w with the first character removed

    If t contains a word, it's a subsequence of the
    original word. Save it.

    // Pick character c
    Search(t->child[c], wrest)

    // Do not pick character c
    Search(t, wrest)

The implementation in OCaml is given below:

(* Search for all subsequences of s in a trie *)
let rec searchSubsequenceImpl
(s: char list)
(trie: trie)
: char list list = match s with
| [] -> []
| first_char :: rest_s ->
let Node(children, _) = trie in
(* Branch 1: Pick character *)
let withChar = if Trie.ChildList.mem first_char children
let nextNode = Trie.ChildList.find first_char children in
let matches = searchSubsequenceImpl rest_s nextNode in
let Node(_, next_is_word) = nextNode in
let fullMatches = if next_is_word then ([] :: matches) else matches in
(* Add the current matching character to all matches *)
List.map (fun word -> first_char :: word) fullMatches
else []
(* Branch 2: Do not pick character *)
let withoutChar = searchSubsequenceImpl rest_s trie
(* Merge results *)
withChar @ withoutChar
let searchSubsequence (s: string) (trie: trie): string list =
let chars = BatString.to_list s in
let results = searchSubsequenceImpl chars trie in
List.map BatString.of_list results |> removeDuplicates

view raw
hosted with ❤ by GitHub


Our experiment consists in loading the words from /usr/share/dict/words into a trie, and then, for each word in the dictionary, look for its subsequences. The full code is available on github.

The code takes 90 seconds to run on my laptop. Not too bad but I’m still exploring ways to improve the performance. One optimization I tried is to, instead of returning an explicit list of strings as mentioned in the search implementation, return them encoded in a trie, since we can save some operations due to shared prefixes. I have that version on github, but unfortunately that takes 240 seconds to run and requires more memory.

Another way is to parallelize the code. The search for subsequences is independent for each word, so it’s an embarrassingly parallel case. I haven’t tried this path yet.

The constructed trie has 8e5 nodes or ~40% of the size of sum of characters.

Subsequences of “thorough”

The question that inspired this analysis was finding all the subsequences of thorough. It turns out it has 44 subsequences, but most of them are boring, that is, single letter or small words that look completely unrelated to the original word. The most interesting ones are those that start with t and have at least three letters. I selected some of them here:

  • tho
  • thoo
  • thoro
  • thorough
  • thou
  • though
  • thro
  • throu
  • through
  • thug
  • tog
  • tou
  • toug
  • tough
  • trough
  • trug
  • tug

The word with most subsequences is pseudolamellibranchiate, 1088! The word cloud at the beginning of the post contains the 100 words with the largest number of subsequences. I tried to find interesting words among these, but they’re basically the largest words – large words have exponentially more combination of subsequences, and hence the chance of them existing in the dictionary is greater. I tried to come up with penalization for the score:

1) Divide the number of subsequences by the word’s length. This is not enough, the largest words still show on top.
2) Apply log2 to the number of subsequences and divide by the word’s length. In theory this should account for the exponential number of subsequences of a word. This turns out to be too much of a penalization and the smallest word fare too well in this scenario.

I plotted the distribution of number of subsequences by word lengths. We can see a polynomial curve but with increased variance:

Generated with this ggplot2

In the chart above, we’d see all points with the same x-value in a single vertical line. One neat visualization trick is to add noise (jitter) so we also get a sense of density.

If we use a box plot instead, we can see a quadratic pattern more clearly by looking at the median for each length.

Generated with this ggplot2

Given this result, I tried a final scoring penalization, by dividing the number of subsequences by the square of the length of the word, but it’s still not enough to surface too many interesting words. Among the top 25, streamlined is the most common word, and it has 208 subsequences.

One interesting fact is that the lowest scoring words are those with repeated patterns, for example: kivikivi, Mississippi, kinnikinnick, curucucu and deedeed. This is basically because we only count unique subsequences.


This was a fun problem to think about and even though it didn’t have very interesting findings, I learned more about OCaml and R. After having to deal with bugs, compilation and execution errors, I like OCaml more than before and I like R less than before.

R has too many ways of doing the same thing and the API is too lenient. That works well for the 80% of the cases which it supports, but finding what went wrong in the other 20% is a pain. OCaml on the other hand is very strict. It doesn’t even let you add an int and a float without an explicit conversion.

I learned an interesting syntax that allows to re-use the qualifier/namespace between several operations when chaining them, for example:

Trie.(empty |> insertString "abc" |> insertString "abd")

view raw
hosted with ❤ by GitHub

I also used the library Batteries for the first time. It has a nice extension for the rather sparse String module. It allows us to simply do open Batteries but that overrides a lot of the standard modules and that can be very confusing. I was scratching my head for a long time to figure out why the compiler couldn’t find the union() function in the Map module, even though I seemed to have the right version, until I realized it was being overridden by Batteries. From now on, I’ll only use the specific modules, such as BatString, so it’s easy to tell which method is coming from which module.



  • [1] OCaml Tutorials > Map
  • [2] Strings – Batteries included
  • [3] Using batteries when compiling


  • [1] R Tutorial – Histogram
  • [2] Creating plots in R using ggplot2 – part 10: boxplots
  • [3] My R Cheat sheet

Amortization and Persistence via Lazy Evaluation

In this chapter Okasaki works around the problem of doing amortized analysis with persistent data structures because the amortized analysis assumes in place modification while for persistent data structures (partial) copies are made. The intuition is that lazy evaluation, which comes with memoization and avoids recomputation, solves this problem.

He adapts the Banker’s and Physicists’s methods to work with lazy evaluated operations and applies them to a few structures including Binomial Heaps, Queues and Lazy Pairing Heaps. In this post we’ll only cover the examples of the Queues using both methods.

We’ll first introduce some concepts and terminology, then we’ll present a queue implementation using lazy evaluation that allows us analyzing it under persistence. Following that we’ll explain the Banker’s and Physicist’s methods and prove that the implementation for push/pop has an efficient amortized cost.

Persistence as a DAG

An execution trace is a DAG where nodes represent the operations (e.g. updates to a data structure) and an edge from nodes v to v' indicates that the operation corresponding to v' uses the output of the one corresponding to v.

For example, if we have this set of operations:

let a = push 0 newEmpty
let b = push 1 a
let c = pop b
let d = push 2 b
let e = append c d
let f = pop c
let g = push 3 d 

The corresponding execution graph is:

Execution Graph

The logical history of an operation v is the set of all operations it depends on (directly or indirectly, and including itself). Equivalently, in terms of the DAG, it’s the set of nodes that have a directed path to v.

A logical future of an operation v is any directed path from v to a terminal node.

A framework for analyzing lazy evaluated data structures

We need to introduce a few more concepts and terminology. Note: we’ll use suspension and lazy operation interchangeably, which can be evaluated or forced.

The unshared cost of an operation is the time it would take to execute it if it had already been performed and memoized before, so if the operation involves any expression that is lazy, that expression would be O(1).

The shared cost of an operation is the time it would take it to execute (force) all the suspensions created (but not evaluated) by the operation.

The complete cost is the sum of the shared and unshared costs. Alternatively, the complete cost of an operation is the time it would take to execute the operation if lazy evaluation was replaced by strict. To see why, first we note that the unshared costs have to be paid regardless of laziness. Since we’re assuming no laziness, the operation has to pay the cost associated with the suspension it creates, which corresponds to the shared costs. Note that under this assumption we wouldn’t need to account for the cost of forcing suspensions created by previous operations because in theory they have already been evaluated.

When talking about a sequence of operations, we can break down the shared costs into two types: realized and unrealized costs. The realized costs are the shared costs from suspensions were actually forced by some operation in the sequence. Example: say that operations A and B are in the sequence and A creates a suspension, and then B forces it. The cost for B to force it is included in the realized cost. The unrealized costs are the shared costs for suspensions that were created but never evaluated within the sequence. The total actual cost of a sequence of operations is the sum of the realized costs and the unshared costs.

Throughout a set of operations, we keep track of the accumulated debt, which starts at 0 at the beginning of the sequence. Whenever an operation is performed, we add its shared cost to it. For each operation, we can decide how much of this debt we want to pay. When the debt of a suspension is paid off, we can force it. The amortized cost of an operation is its unshared cost plus the amount of debt it paid (note that it does not include the realized cost). Note that as long as we always pay the cost of a suspension before it’s forced, the amortized cost will be an upper bound on the actual cost.

This framework simplifies the analysis for the case when a suspension is used more than once by assuming that its debt was paid off within the logical history of when it was forced, so we can always analyze a sequence of operations and don’t worry about branching. This might cause the debt being paid multiple times, but it simplifies the analysis.

The author uses the term discharge debit as synonym of pay off debt. I find the latter term easier to grasp, so I’ll stick with it throughout this post.

Let’s introduce an example first and then proceed with the explanation of the Physicist’s method and the corresponding analysis of the example.

The Stream Queue

To allow efficient operations on a queue in the presence of persistence, we can make some of the operations lazy. Recall in a previous post we defined a queue using two lists. To avoid immediate computation, a natural replacement for lists is using its lazy version, the stream data structure, which we also talked about in a previous post.

For the list-based queue, the invariant was that if the front list is empty, then the rear list must be empty as well. For the stream queue, we have a tighter constraint: ‘front’ must be always greater or equal than ‘rear’. This constraint is necessary for the analysis.

The definition of the stream queue is the following:

- size of front stream
- stream representing the front of the queue
- size of rear stream
- stream representing the (reversed) rear of the queue
type 'a queueStream = int * 'a stream * int * 'a stream;;
view raw stream_queue_def.ml hosted with ❤ by GitHub

We store the lengths of the streams explicitly for efficiency.

We’ll be using the Stream developed in the previous chapter, so we’ll refer to the module Stream2 to avoid ambiguity with the standard Stream module.

Inserting an element at the end of the queue is straightforward, since the rear stream represents the end of the queue and is reversed:

let push (elem: 'a) (queue: 'a queueStream): ('a queueStream) = match queue with
(frontSize, front, rearSize, rear) ->
check (frontSize, front, rearSize + 1, Stream2.insert elem rear)
view raw stream_queue_push.ml hosted with ❤ by GitHub

The problem is that inserting at rear can cause the invariant of the queue to be violated. check() changes the structure so to conform to the invariant by potentially reversing rear and concatenating with front:

let check (queue: 'a queueStream): ('a queueStream) = match queue with
(leftSize, left, rightSize, right) ->
if rightSize <= leftSize then queue
else (leftSize + rightSize, Stream2.concat left (Stream2.reverse right), 0, Stream2.empty)
view raw check.ml hosted with ❤ by GitHub

Removing an element from the queue requires us to evaluate the first element of the front stream. Again, the invariant can be violated in this case so we need to invoke check() again:

let pop (queue: 'a queueStream): ('a queueStream) = match queue with
(leftSize, left, rightSize, right) ->
let forcedLeft = Lazy.force left in
match forcedLeft with
| Nil -> raise Empty_queue
| StreamCell (_, rest) -> check (leftSize - 1, rest, rightSize, right)
view raw stream_queue_pop.ml hosted with ❤ by GitHub

The complete code for the stream queue is on Github.

Analysis using the Banker’s Method

The idea of the Banker’s Method is basically define an invariant for the accumulated debt and a strategy for paying it off (that is, decide how much debt each operation pays off). Then we show that whenever we need to force a suspension, the invariant guarantees that the accumulated debt has been paid off. One property of the Banker’s method is that it allows associating the debt to specific locations of the data structure. This is particularly interesting for streams, because it contains multiple (nested) suspensions, so we might force parts of this structure before we paid the debt associated with the entire structure.

By inspection, we can see that the unshared cost of both push and pop are O(1). It’s obvious in the case of push, and in the case of pop, in theory check could take O(m) where m is the size of the queue, but since Stream2.concat() and Stream2.reverse() are both lazy, and hence memoized, they are not included in the unshared costs.

To show that the amortized cost of both operations is O(1), we can show that paying off O(1) debt at each operation is enough to pay for the suspension before it is forced. For the queue, we also need to associate the debt with parts of the data structure, so that we could force the suspension of only some parts of it (for example, on the stream we can evaluate only the head, not necessarily the entire structure).

We now define an invariant that should be respected by all operations. Let d(i) be the debt at the i-th node on the front stream, and D(i) = \sum_{j=0}^{i} d(j) the accumulated debt up to node i. The invariant is:

D(i) \le \min(2i, |f| - |r|)

This constraint allows us evaluating the head at any time, because D(0) = 0, which means its debt has been paid off. The second term in min(), guarantees that if |f| = |r| the entire stream can be evaluated, because D(i) = 0 for all i.

The author then proves that by paying off one debt in push() and two debt units in pop() is enough to keep the debt under the constraint.

Queue with suspended lists

Because the Physicist’s method cannot assign costs to specific parts of the data structure, it doesn’t matter if the structure can be partially forced (like streams) or if it’s monolithic. With that in mind, we can come up with a simpler implementation of the queue by working with suspended lists instead of streams. Only the front list has to be suspended because the cost we want to avoid, the reversal of the back list and concatenation to the front list, happens on the front list.

On the other hand, we don’t want to evaluate the front list when we perform a peek or pop, so we keep a evaluated version of the front list too.

The signature of the structure is as follows:

Implemention of queue using lazy lists. As opposed to the stream-based
queue, this structure allows for a monolithic amortization analysis, for
example, using the Physicist's method as described in Okazaki's Purely
Functional Data Structures, Chapter 6.
Only the front list needs to be lazy. The rear of the queue is a regular
list. We store a copy of the suspended list to be able to access the head of
the list without having to evaluate the entire list.
The first element in the structure is the evaluated version of the front
(forcedFront). The second represents the size of the front list (frontSize).
The third element is the suspended version of the front (lazyFront). The
fourth element is the size of the rear list (rearSize) and finally the fifth
element is the rear list (rear).
The invariants that must be respected by all operations are:
1) The front list must never be smaller than the rear list
2) Whenever lazyFront is non-empty, forcedFront is non-empty
type 'a queueSuspended = 'a list * int * ('a list) Lazy.t * int * 'a list;;

As mentioned in the code above, the invariants we want to enforce is that the front list is never smaller than the rear list

let conformToFrontNotSmallerThanRear (
queue: 'a queueSuspended
): ('a queueSuspended) = match queue with
(forcedFront, frontSize, lazyFront, rearSize, rear) ->
if rearSize <= frontSize then queue
let front = Lazy.force lazyFront
in (
frontSize + rearSize,
lazy (front @ (List.rev rear)),

and that the evaluated version of the front list is never empty if the lazy version still has some elements.

let conformToForcedFrontInvariant (
queue: 'a queueSuspended
): ('a queueSuspended) = match queue with
| ([], frontSize, lazyFront, rearSize, rear) ->
(Lazy.force lazyFront, frontSize, lazyFront, rearSize, rear)
| queue -> queue

The push and pop operations are similar to the other versions of queue, but since we mutate the structure, we might need to adjust it to conform to the invariants:

let conformToInvariants (queue: 'a queueSuspended): ('a queueSuspended) =
let queue = conformToFrontNotSmallerThanRear queue
in conformToForcedFrontInvariant queue
let push (queue: 'a queueSuspended) (elem: 'a): ('a queueSuspended) =
match queue with (forcedFront, frontSize, lazyFront, rearSize, rear) ->
conformToInvariants (
rearSize + 1,
elem :: rear
let pop (queue: 'a queueSuspended): ('a queueSuspended) = match queue with
| ([], _, _, _, _) -> raise Empty_queue
| (head :: forcedFront, frontSize, lazyFront, rearSize, rear) ->
conformToInvariants (
frontSize - 1,
lazy (List.tl (Lazy.force lazyFront)),
view raw push_pop.ml hosted with ❤ by GitHub

Finally, because of our second invariant, peeking at the queue is straightforward:

let peek (queue: 'a queueSuspended): 'a = match queue with
| ([], _, _, _, _) -> raise Empty_queue
| (head :: forcedFront, _, _, _, _) -> head
view raw peek.ml hosted with ❤ by GitHub

The complete code for the suspended queue is on Github.

Analysis using the Physicist’s Method

We’ve seen the Physicist’s Method in a previous post when we’re ignore the persistence of the data structures. We adapt the method to work with debits instead of credits. To avoid confusion, we’ll use \Psi to represent the potential function. \Psi(i), represents the accumulated debt of the structure at step i. At each operation we may decide to pay off some debit, which will be then included in the amortized cost. We have that \Psi(i) - \Psi(i - 1) is the increase in debt after operation i. Remember that the shared cost of an operation corresponds to the increase in debt if we don’t pay any of the debt. Thus, we can find out how much debt was paid off then by s_i - \Psi(i) - \Psi(i - 1), where s(i) is the shared costs of operation i. Let u(i) and c(i) be the unshared and complete costs of the operation i. Given that, by definition, c(i) = u(i) + s(i), we can then express the amortized cost as:

a_i = cc_i - (\Psi(i) - \Psi(i - 1))

To analyze the suspended queue we need to assign values to the potentials such that by the time we need to evaluate a suspension the potential on the structure is 0 (that is, the debt has been paid off). For the suspended queues we’ll use the following potential function:

\Psi(q) = \min(2|w|, |f| - |r|)

Where w is the forcedFront, f is lazyFront and r is rear.

We now claim that the amortized cost of push is at most 2. If we push and element that doesn’t cause a rotation (i.e. doesn’t violate |f| \ge |r|), then |r| increases by 1, and the potential decreases by 1. No shared is incurred and the unshared cost, inserting an element at the beginning of rear is 1, hence the amortized cost for this case is 1 – (-1) = 2. If it does cause a rotation, then it must be that after the insertion |f| = m and |r| = m + 1. After the rotation we have |f| = 2*m + 1 and |r| = 0, but w hasn’t changed and cannot be larger than the original f, so the potential function is at most 2*m. The reversal of r costs m + 1 and concatenating to a list of size x costs x (discussed previously), plus the cost of initially appending an element to read, so the unshared cost is 2*m + 2. No suspensions were created, so the amortized cost is given by 2*m + 2 - (2*m) = 2.

Our next claim is that the amortized cost of pop is at most 4. Again, if pop doesn’t cause a rotation, |w| decreases by 1, so the potential is reduced by 2. The unshared cost is 1, removing an element from w, and the shared cost, 1 comes from the suspension that lazily removes the head of lazyFront. The amortized cost is 2 – (-2) = 4. Note that if w = 0, we’ll evaluate f, but the ideas is that it has been paid off already. If the pop operation causes a rotation, then the analysis is similar to the push case, except that the complete cost is must account for the shared cost of lazily reming the head of lazyFront, so it’s 2*m + 3, for an amortized cost of 3.

Note that when the suspensions are evaluated, the potential is 0, either when |f| = |r| or |w| = 0.


In this post we covered a simple data structure, the queue, and modified it to be lazy evaluated and can show, with theory, that it allows for efficient amortized costs. The math for proving the costs is not complicated. The hardest part for me to grasp is to get the intuition of how the analysis works. The analogy with debt is very useful.

Meta: Since wordpress code plugin doesn’t support syntax highlighting for OCaml, I’m experimenting with the Gist plugin. Other advantages is that Gists allow comments and has version control!

Amortization Analysis

In Chapter 5 of Purely Functional Data Structures, Okasaki introduces the amortization analysis as preparation for the complexity analysis of functional data structures.

The idea of amortized analysis is to prove the complexity of an operation when the average cost per operation is less than the worst case cost of each individual operation.

In this post we’ll perform analysis of 2 data structures and show that their amortized running time is much better than their worst case. There are 2 methods presented in the book, the Banker’s Method and the Physicist Method. I found the Physicist easier to work with, so I’ll only use that. Also, I find it easy to understand concepts with an examples, so we’ll first introduce the data structure and then introduce the analysis.

Queue with lists

An easy way to implement the queue data structure is by using two lists, one representing the beginning (left) and the other the end (right). Inserting/removing an element from the beginning of a list in OCaml is O(1), while doing it at the end is O(n). For this reason, we store the end of the queue reversed in the right list. (In imperative languages the opposite is true, insertions/removals at the end are O(1) and at the beginning O(n), so the left array that is reversed).


Queue using lists representing [1, 2, 3, 4, 5, 6]

Insertions are performed by appending an element to the beginning of the right array. Removing from the front of the queue requires removing the first element of the left array. When the left array is empty, we need to reverse the contents of the right array and move it to the left.

We work with an invariant, in which the left array is only empty if the right array is empty as well. That requires a slight change to the removal, where we perform the reversal and move of the right array when the left array has only one element and is about to become empty.

I’ve implemented this idea in Ocaml.

Analysis – The Physicist Method

If we only look at the worst case, we will get that removing from a queue is an O(n) operation. We can improve this by looking at the amortized cost. One way to perform this analysis is via the physicist method, in which we define a potential (or cost) function \Phi(d) for the data structure d. In the queue case, we can define it to be the size of the right list.

The amortized cost of step i, a_i is defined by

a_i = t_i + \Phi(d_i) - \Phi(d_{i-1})

Where t_i is the actual cost (in number of operations)

The intuition is that some costly operations such as reversing the list, which takes O(n) operations, can only happen after O(n) insertions happened, so the average cost per operation is O(1). Translating it to the potential analogy, each insersion operation increases the potential of the data structure by 1, so by the time a revert happens, the right list is emptied and that causes a loss of potential of n, which had been accumulated and that offsets the cost of the operation t_i.

We can get the accumulated amortized cost of the operations up to a specific step by summing them, and we’ll get

A_i = T_i + \Phi(d_i) - \Phi(d_0)

If we choose \Phi(d) such that \Phi(d_i) \ge \Phi(d_0), for all i, then A_i is an upper bound of T_i. Since we assume d_0 is the empty queue, \Phi(d_0) = 0 and we have \Phi(d_i) \ge 0, this property is satisfied.

One important remark is that this analysis is only applicable to non-persistent data structures, which is not the case in OCaml. We present the OCaml code as exercise, but the correct analysis will done in later chapters.

Splay Trees

Splay Tree is a binary search tree. The insertion of an element e consists in spliting the current tree in 2 subtrees, one of all elements less or equal than e, which we’ll call small and another will all elements larger than e, which we’ll call big. We then make e the root of tree and small the left subtree and big the right subtree.



The partition operation is O(height of tree). The intuition is, when looking for the small subtree, imagine we’re analysing a node x. If x >= e, then we know all elements in the left subtree are also >= e, so are all the elements in the right subtree, so we can discard it and only look at the left subtree. The same idea applies to looking for the big subtree.

In an ideal scenario, the tree will be well balanced in such a way that the height of the tree is O(log n), and so the insertion operation. But that’s not always the case, so we introduce a balancing strategy. Whenever we follow 2 lefts in a row when searching for the big subtree, we perform a right rotation. Similarly, when following 2 rights in a row when searching for the small subtree, we perform a left rotation.

These re-balancing are enough to make partition and also insertion an O(log n) amortized time, as we shall see in the next section.



The operation for finding the minimum element is trivial. To delete the minimum element, we also perform rotations. Since we only traverse the left paths, we only have to do a left rotation.

I’ve implemented these ideas in OCaml.

Amortized Analysis of Splay Trees

We first define a potential function. Given a tree t and a node x \in t, let t(x) be the subtree of t rooted in x. Then \phi(t(x)) = \log(|t(x)| + 1) be a potential function where |t(x)| the number of nodes of the subtree of which x is root, and let the potential of a tree t, \Phi(t), be the sum of the sub-potentials of all nodes of t, that is,

(1) \Phi(t) = \sum_{x \in t} \phi(t(x)).

The amortized equation of calling partition() is the actual cost of the partition, plus the potential of the resulting trees (small and big) minus the potential of the original tree. Let’s denote small(t) as s, and big(t) as b.

(2) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

We can prove that \mathcal{A}(t) \le 1 + 2 \log(|t|) (see appendix), which means the amortized cost of the partition operation is O(\log n). It implies that insertion also has an amortized cost of O(\log n).

The same idea can be used to prove the amortized costs of the deletion of the minimum element. Okasaki claims that splay trees are one of the fastest implementations for heaps when persistence is not needed.


I found amortized analysis very hard to grasp. My feeling is that the amortized analysis is too abstracted from the real problem, making it not very intuitive. It’s also not clear how one picks the potential function and whether there’s a choice of potential that might result in a better than the “actual” amortized cost.


As we can see in code, there are four cases to consider: left-left, left-right, right-left and right-right. The left-* and right-* are symmetric, so let’s focus on the left-left and left-right cases. For the left-left case, suppose we perform the following partition:

Left-left case of partition()

Left-left case of partition()

We start with (2)

(3) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

Since = \mathcal{T} corresponds to the number of recursive calls to partition, \mathcal{T}(t) = 1 + \mathcal{T}(u), so

(4) = 1 + \mathcal{T}(u) + \Phi(s) + \Phi(b) - \Phi(t)

The recursive call to u yields (s’, b’), so we have the equivalence, \mathcal{A}(u) = \mathcal{T}(u) + \Phi(s') + \Phi(b') - \Phi(u),

(5) = 1 + \mathcal{A}(u) - \Phi(s') - \Phi(b') + \Phi(u) + \Phi(s) + \Phi(b) - \Phi(t)

Since, s = s’,

(6) = 1 + \mathcal{A}(u) - \Phi(b') + \Phi(u) + \Phi(b) - \Phi(t)

and \Phi(b') - \Phi(b) = \phi(b(x)) + \phi(b(y)) + \Phi(c) + \Phi(d), and \Phi(t) - \Phi(u) = \phi(t(x)) + \phi(t(y)) + \Phi(c) + \Phi(d), we can cancel out some terms and simplify the equation above to

(7) = 1 + \mathcal{A}(u) - \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

by induction, \mathcal{A}(u) \le 1 + 2 \phi(u),

(8) \le 2 + 2\phi(u) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

Since u is a subtree of t(y), \phi(u) < \phi(t(y)) and since the nodes in b is a subset of t, \phi(b(y)) \le \phi(t(x))

(9) < 2 + \phi(u) + \phi(b(x))

It’s possible to show that, given x, y, z such that y + z \le x, then 1 + \log y + \log z < 2 \log x. We have that |t| = |u| + |c| + |d| + 2, while |b(x)| = |c| + |d| + 1, thus, |u| + |b(x)| < |t| and by the relation above, 1 + \log (|u|) + \log (|b(x)|) + 1 < 2 \log (|t|) + 1, \phi(u) + \phi(b(x)) + 1 < 2 \phi(t), which yields

(10) \mathcal{A}(t) < 1 + 2 \phi(t).

It’s very subtle, but in this last step we used the assumption of the rotation, in particular in |b(x)| = |c| + |d| + 1, if we hadn’t performed the rotation, |b(x)| = |b| + |c| + |d| + 2, and |u| + |b(x)| < |t| wouldn’t hold!

For the left-right case, we can do a similar analysis, except that now we’ll apply the recursion to c, and the resulting partition will look different, as depicted below:

Left-right case of partition()

Left-right case of partition()

In this case, (8) will become

(11) \le 2 + 2\phi(c) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

We then use that \phi(c) < \phi(t(y)) and \phi(t(y)) < \phi(t(x)) this will allow us to cancel out some terms to

(12) < 2 + \phi(b(x)) + \phi(b(y))

Similar to what we've done before, we can show that |b(x)| + |b(y)| = |s| + |b| = |t| since the partitions of |t| are mutually exclusive. We can still use the result from the other case and get (10).


[1] Wikipedia – Amortized Analysis