Mutually Recursive Modules in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes a technique called data structure bootstrapping. It’s a way to reuse existing implementation of data structures to construct (bootstrap) new ones.

In one of the examples he creates a new heap implementation with an efficient merge operation using another heap as basis, but it turns out that to implement this, we need to rely on mutually recursive modules, that is, two modules A and B, where A depends on B, and B depends on A.

In this post we’ll study the bootstrapped heap and learn how to implement mutually recursive modules in OCaml.

Heap with efficient merging

Assume we have a heap implementation with O(1) insert, and O(log n) merge, findMin and deleteMin operations. We’ve seen such an implementation with Skewed Binomial Heaps

We’ll see how to construct a new heap implementation which will improve the merge complexity to O(1).

Let’s call the base heap PrimaryHeap and define our heap type as

this type can be either empty or a node with an element (root) and a primary heap whose element is the bootstrapped heap itself, that is, heap and PrimaryHeap.heap form a mutually recursive types. Note that the above is not a valid OCaml code. We’re using it to explain the theoretical concepts.

We can think of this as a k-ary tree where the element is the root and the children of that node are the subtrees, but these subtrees are stored in a heap instead of an array.

The root element at each node is the smallest among all of the subtrees. Hence, to obtain the minimum element for a heap, findMin, is trivial: we can simply return that element:

Merging two bootstrapped heaps is analogous to linking two trees. The only invariant we need to maintain is that the smallest root continues being the root.

Since the primary heap has O(1) insert, the bootstrapped heap has O(1) merge, which was our goal. Note that we can implement insert using merge by creating a singleton node and merging it with an existing heap.

We need to handle the deletion of the minimum element, which is the more involved operation. It consists in discarding the root of the present node, and finding a new root from the primary heap.

Since each element in the primary heap is a bootstrapped heap, we first obtain the bootstrapped heap containing the smallest element:

then we remove this node from the primaryHeap, and we merge the minPrimaryHeap back into primaryHeap.

finally we make newMinElem the new root element of our top level bootstrapped heap. The complete code is

The only missing part is defining the correct type of the bootstrapped heap.

Mutually Recursive Modules

escher

Drawing Hands by M. C. Escher

Okasaki mentions that recursive structures are not supported in Standard ML (at least at the time my copy of the book was printed), but they are supported in OCaml.

To make modules mutually depend on another, we need to mark it as recursive via the rec keyword, and declaring both modules at the same time by using the and connector. Let’s work with a toy example: two modules Even and Odd, where each depend on the other.

This will lead to a compilation error:

Error: Recursive modules require an explicit module type.

We need to write the signatures explicitly:

This blog post from Jane Street describes a way to define mutually recursive modules by only listing its signatures:

The OCaml compiler can infer the implementation part from the type definitions, but unfortunately this won’t work if the module has function definitions, which is the case for our heap implementation.

Things get more complicated in our case because the primary heap implementation uses a functor to generate a heap taking the element’s module as parameter. In this case the element’s module is our bootstrapped heap. A valid module definition is given below:

Let’s understand what is going on.

type t = Empty | Heap of Element.t * PrimaryHeap.heap

is the definition we presented above. We also implement the methods from the Set.OrderedType interface, namely compare, since this is the interface the heap maker expects. The comparison is based solely on the root element.

Then we declare the PrimaryHeap type at the same time, with type IHeapWithMerge, and because tv is unbound in that interface, we need to bind it to BootstrappedElement.t:

PrimaryHeap: IHeapWithMerge with type tv := BootstrappedElement.t

Finally we provide the implementation, using the result of the SkewBinomialHeap() functor having the BootstrappedElement module as element type:

PrimaryHeap (...) = SkewBinomialHeap(BootstrappedElement)

The syntax is pretty involved, but it accomplishes what we wanted. We can further refine this definition by adding

include Set.OrderedType with type t := t

to the BootstrappedElement signature. This includes all the interface of Set.OrderedType.

These newly defined modules are defined within a functor, the BootstrappedHeap, together with the methods we defined above. Like other heap generators, the functor takes a module representing the element type as parameter. In this case we can also allow the primary heap type to be passed as parameter so we don’t have to use SkewBinomialHeap as implementation. Any heap with merge will do.

The constructors define within BootstrappedElement are visible within BootstrappedHeap but they need qualification, such as BootstrappedElement.Heap. To avoid repeating this qualifier, we can use:

include BootstrappedElement

The complete implementation for BootstrappedHeap can be found on github.

Conclusion

The idea of using implementations of a given data structure to yield improve implementations is amazing! The mutual recursion nature of the bootstrap heap got me at first, but making analogies with a k-ary tree made it easier to understand.

I was struggling a lot to get the syntax right for the recursive modules required for this implementation until I stumbled upon this github repository, from which I learned many new things about OCaml.

References

[1] Purely Function Data Structures, Chapter 10 – Chris Okasaki
[2] Jane Street Tech Blog – A trick: recursive modules from recursive signatures
[3] Github yuga/readpfds: bootStrappedHeap.ml

Advertisements

Polymorphic Recursion in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes recursive types that are non-uniform. In this post we’ll learn more about these types, how to implement them in OCaml and see an example by studying the implementation of Random Access Binary Lists using such a construct.

Uniform recursive type

As an example of uniform recursive data structure we have a simple list

Cons(1, Cons(2, Cons(3, Nil)))

Which has a recursive type, for example

type 'a myList = Nil | Cons of 'a * 'a myList

Each element of the list is either Nil (terminal) or it has a value of a polymorphic type 'a, followed a recursive list also of type 'a.

Non-uniform recursive type

Now, say that the type of the recursive list is not the same as the current list? Then we have a non-uniform polymorphic recursive type, for example:

type 'a seq = Nil | Cons of 'a * ('a * 'a) seq

We’ll name this a sequence. A int seq would have the value in the first node would have type int, but the element from the second node would have type (int, int), the third type ((int, int), (int, int)) and so on. This structure is equivalent to a complete binary tree, where the i-th element of seq represents the i-th level of the tree.

An example of value with this type is:

Cons (1, Cons ((2, 3), Cons (((4, 5), (6, 7)), Nil)))

We need a special syntax to type recursive functions that take recursive non-uniform types, because the type of the function called recursively might be a different polymorphic type than the caller. OCaml by default tries to infer the generic types of the function and bind them to specific instances [2]. For example, in

let f: 'a list -> 'a list = fun x -> 13 :: x

OCaml will assume 'a is int and will compile fine. We can see this by pasting that code in the command line, utop.

utop # let f: 'a list -> 'a list = fun x -> 13 :: x;;
val f : int list -> int list =

The function will then not be polymorphic anymore. To prevent OCaml from auto-binding specific type instances, we can use a special syntax introduced in OCaml 3.12 [3]

utop # let f3: 'a. 'a list -> 'a list = fun x -> 13 :: x;;

This time we’ll get a compilation error:

Error: This definition has type int list -> int list which is less general than ‘a. ‘a list -> ‘a list

The important thing is that this allow us binding the recursive calls with different types. According to the Jane Street Tech Blog [3]:

Note that a polymorphic type annotation holds inside the body of a recursive definition as well as outside, allowing what is known as polymorphic recursion, where a recursive call is made at some non-trivial instantiation of the polymorphic type.

So for example, we can write this function to calculate the size of a sequence:

The problem with this structure is that it can only represent lists of size in the form of 2^k - 1. To work around that, we allow some items to not hold any elements at all, so that each item corresponds to a digit in the binary representation of the size of the list.

type 'a seq = Nil | Zero of ('a * 'a) seq | One of 'a * ('a * 'a) seq

For example, we can now represent a list of 10 elements, as

Zero(One((1, 2), Zero(One(((3, 4), (5, 6)), ((7, 8),(9, 10))), Nil))))

mandelbrot

Sequence binary random access list

We can use a sequence to implement a random access binary access list in a concise way.

Insertion

Inserting an element at the beginning is analogous to incrementing the binary number, that is, starting from the least significant digit, if it’s a zero, we make it one, if it’s one we make it a 0, and carry over a 1, by adding it to the next digit.

The carry over process is simple in this structure. Because the type of an item following an item of type 'a is ('a, 'a), to merge the element to be added with the existing element, we simply make a tuple and pass it to the recursive call.

Head and Tail

Removing or retrieving the first element is analogous to decrementing a binary number. If the digit is one, we make it zero and return the element and the resulting list. If it’s a zero, we make a recursive call to get the next available element. However since the returned element is of type ('a, 'a), and our return type is 'a, we only use the first value of the pair.

Implementing head and tail using popAux is now trivial

Lookup

Finding an element can be done by transforming the problem into smaller instances.

It helps to look at some simple examples. Let’s consider 3 cases.

Case 1. If we had a single element, we either return it if the index is 0, or throw if it’s larger than that.

0: (0)

Case 2. If we have 2 elements,

0: ()
1: (0, 1)

Notice that when we go from the first level to the second, all items “doubled” in size, so we can “transform” this to the single element case by treating pairs as a single element, but since the index has individual elements granularity, we need to transform it by halving it. We reduced it to Case 1.

If our initial index was either 0 or 1, it’s now 0, and we found our element in level 1.


1: (0)

The problem is that we need to return a single element at level 0, not a pair, so we need to undo the transformation. We can use the parity of the original index will to decide which side of the pair to return. If it’s even, we return the first element, otherwise the second one.

Case 3. If we have 3 elements,
0: (0)
1: (1, 2)

and our index is larger than 0, we move to the next level but we need to account for the level we’re skipping, so the transformation of index would look like:
0: ()
1: (0)

which is reduced to Case 2.

These 3 cases can be used to find elements larger than 3. For example, say we have 10 elements and want to find the element at position 6:

0: ()
1: (0, 1)
2: ()
3: (((2, 3), (4, 5)), ((6, 7), (8, 9)))

Step 1. This is Case 2. We need to transform this by treating pairs as elements and halving the index:

1': (0)
2': ()
3': ((1, 2), (3, 4))

Note how this reduced the problem of finding the element at position 3 of a list with size 5. Step 2. We now are in case 3, where we need to skip the current element:

1': ()
2': ()
3': (((0), (1)), ((2), (3)))

Our index is now 2. Step 3. we go one level down

2: ()
3: (0, 1)

With an index of 1. Step 4. Finally, we halve it once again and we finally found the right level that contains our index.

3: (0)

We now need to recurse back to find exactly which element on that level to pick. On Step 4, we can see our index 1 was on the right side of the pair in level 3, so we pick the right side, that is, ((6, 7), (8, 9)).

On Step 3, our index 2 was on the left size of the innermost pair, that is (6, 7). On Step 2, we skipped the current element but didn’t change levels, so there’s no need to choose an element from the pair. Finally, on Step 1, the index 6 was on the left side of the innermost pair, which should return the element with a corresponding index 6.

In general, we can tell which side of the innermost pair to pick by observing that the indexes are ordered from left to right in a given level. And because every level has an even number of elements, we can assume that the first index in the level – or the first element in the first pair – is even. Hence the parity of the index is sufficient to determine which side of the pair to pick.

With this algorithm in mind, the lookup function is quite compact:

The update follows a similar idea as the lookup, but the problem is that we need to return the updated level when returning from the recursion. That is, we need to update the level before returning.

To accomplish that, we can pass a callback, the updater, that encodes which pair we would pick at each level. We start with a function that simply returns the element to be updated

(fun _ -> element)

Then, at each level we create a new updater, which applies the previous updater on the left or right side of the pair, depending on the parity of the index:

When we finally find the level that has our index, we can apply the function, which has the effect of “narrowing” down the elements from a given level to a single element, replacing the value at the target index and then returning the updated elements when returning from the recursion.

After applying the updater, we return the updated level recursively.

Structural Decomposition

Okasaki introduces this implementation in the context of Structural Decomposition, which is a technique for creating data structures from incomplete ones. In this example, the raw implementation of the sequence can only represent lists of size 2^k, but modeling each node in the sequence to be zero or one, zero not having any elements, we can work around the size restriction.

Conclusion

The implementation of random access binary access list using sequences is very neat, but very hard to understand.

One of the major selling points of shorter code is that it tends to contain less bugs and also less corner cases. On the other hand, if the code is also harder to read and understand, it might be harder to spot bugs.

This post helped me understand a bit more about OCaml’s type system. Digging a little also led me to interesting topics such as Parametric Polymorphism [4] and Existential vs. Universally quantified types [5].

References

[1] Purely Function Data Structures – Chris Okasaki
[2] Jane Street – Ensuring that a function is polymorphic
[3] Ensuring that a function is polymorphic in Ocaml 3.12
[4] Wikipedia – Parametric polymorphism
[5] Existential vs. Universally quantified types in Haskell

Numerical Representations as inspiration for Data Structures

Book

In this chapter Okasaki describes a technique for developing efficient data structures through analogies with numerical representations, in particular the binary and its variations.

We’ve seen this pattern arise with Binomial Heaps in the past. Here the author presents the technique in its general form and applies it to another data structure, binary random access lists.

Binary Random Access Lists

These lists allows efficient insertion at/removal from the beginning, and also access and update at a particular index.

The simple version of this structure consists in distributing the elements in complete binary leaf trees. A complete binary leaf tree (CBLF) is one that only stores elements only at the leaves, so a tree with height i, has 2^(i+1)-1 nodes, but only 2^i elements.

Consider an array of size n, and let Bn be the binary representation of n. If the i-th digit of Bn is 1, then we have a tree containing 2^i leaves. We then distribute the elements into these trees, starting with the least significant digit (i.e. the smallest tree) and traversing the tree in
pre-order.

For example, an array of elements (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) has 11 elements, which is 1011 in binary. So we have one tree with a single leave (1), a tree with 2 leaves (2, 3) and another containing 8 leaves (4, 5, 6, 7, 8, 9, 10, 11).

We use a dense representation of the binary number, by having explicit elements for the 0 digit. Here’s a possible signature for this implementation:

Inserting a new element consists in adding a new tree with a single leave. If a tree already exists for a given size, they have to be merged into a tree of the next index. Merging two CBLFs of the same size is straightforward. We just need to make them children of a new root. Since elements are stored in pre-order, the tree being inserted or coming from carry over should be the left child.

Looping back to our example, if we want to insert the element 100, we first insert a tree with a single leaf (100). Since the least significant digit already has an element, we need to merge them into a new tree containing (100, 1) and try to insert at the next position. A conflict will arise with (2, 3), so we again merge them into (100, 1, 2, 3) and try the next position. We finally succeed in inserting at position 2, for a new list containing trees like (100, 1, 2, 3) and (4, 5, 6, 7, 8, 9, 10, 11).

The complexity of inserting an element is O(log n) in the worst case which requires merging tree for all digits (e.g. if Bn = 111...111). Merging two trees is O(1).

Removing the first element is analogous to decrementing the number, borrowing from the next digit if the current digit is 0.

Searching for an index consists in first finding the tree containing the index and then searching within the tree. More specifically, because the elements are sorted beginning from the smallest tree to the largest, we can find the right tree just by inspecting the number of elements in each tree until we find the one whose range includes the desired index. Within a tree, elements are stored in pre-order, so we can find the right index in O(height) of the tree.

After finding the right index, returning the element at that index is trivial. Updating the element at a given index requires rebuilding the tree when returning from the recursive calls.

Okasaki then proposes a few different numbering systems that allow to perform insertion/removal in O(1) time. Here we’ll only discuss the less obvious but more elegant one, using skew binary numbers.

characters

Skew Binary Random Access Lists

A skew binary number representation supports the digits 0, 1 and 2.

The weight of the i-th digit is 2^(i+1) - 1. In its canonical form, it only allows the least significant non-zero digit to be 2.

Examples:

table-v2

Decimal and Skewed Binary

It’s possible to show this number system offers a unique representation for decimal numbers. See the Appendix for a sketch of the proof and an algorithm for converting decimals to skewed binary numbers.

Incrementing a number follows these rules:

  • If there’s a digit 2 in the number, turn it into 0 and increment the next digit. By definition that is either 0 or 1, so we can safely increment it without having to continue carrying over.
  • Otherwise the least significant digit is either 0 or 1, and it can be incremented without carry overs.

The advantage of this number system is that increments (similarly, decrements) never carry over more than once so the complexity O(1), as opposed to the worst-case O(log n) for regular binary numbers.

A skew binary random access list can be implemented using this idea. We use a sparse representation (that is, not including 0s). Each digit one with position i corresponds to a tree with (2^(i+1) - 1) elements, in this case a complete binary tree with height i+1. A digit 2 is represented by two consecutive trees
with same weight.

Adding a new element to the beginning of the list is analogous to incrementing the number, which we saw can be done in O(1). Converting a digit 0 to 1 or 1 to 2, is a matter of prepending a tree to a list. To convert a 2 to 0 and increment the next position, we need to merge two trees representing it with the element to be inserted. Because each tree is traversed in pre-order, we make the element the root of the tree.

Elements are inserted in pre-order in each tree, so when searching for an
index, we can first find the right tree by looking at the tree sizes and within a tree we can do a “binary search” in O(height) of the tree.

Binomial Heaps

In this chapter, this technique is also applied to improve the worst case runtime of insertion of binomial heaps. The implementation, named Skewed Binomial Heap, is on github.

Conclusion

This chapter demonstrated that binary representations are a useful analogy to come up with data structures and algorithms, because they’re simple. This simplicity can lead to inefficient running times, though. Representations such as skewed binary numbers can improve the worst case of some operations with the trade-off of less intuitive and more complex implementations.

Appendix A – Proof

Sketch of the proof. First, it’s obvious that two different decimals cannot map to the same binary representation. Otherwise the same equation with the same weights would result in different values. We need to show that two binary representations do not map to the same decimal.

Suppose it does, and let them be B1 and B2. Let k be the largest position where these number have a different digit. Without loss of generality, suppose that B1[k] > B2[k].

First case. suppose that B1[k] = 1, and B2[k] = 0 and B2 doesn’t have any digit 2. B1 is then at least M + 2^{k+1} - 1, while B2 is at most M + \sum_{i = 1}^{k} (2^{i} - 1) which is M + 2^{k + 1} - k (M corresponds to the total weight of digits in positions > k). This implies that B2 < B1, a contradiction.

Second case. suppose that B1[k] = 1, but now B2 does have a digit 2 at position j. It has to be that j < k. Since only zeros follow it, we can write B2‘s upper bound as

M + \sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1

Since 2(2^{j + 1} - 1) < 2^{j + 2} - 1, we have

\sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1 < \sum_{i = j + 2}^{k} (2^{i} - 1) + 2^{j + 2} - 1

We can continue this argument until we get that B2 is less than M + 2(2^{k} - 1) which is less than M + 2^{k + 1} - 1, B1.

Third case. Finally, suppose we have B1' such that B1'[k] = 2, and B2'[k] = 1. We can subtract 2^{k+1} - 1 from both and reduce to the previous case. ▢

Appendix B – Conversion algorithm

Converting from a decimal representation to a binary one is straightforward, but it’s more involved to do so for skewed binary numbers.

Suppose we allow trailing zeros and have all the numbers with k-digits. For example, if k=2, we have 00, 01, 02, 10, 11, 12 and 20. We can construct the numbers with k+1-digits by either prefixing 0 or 1, and the additional 2 followed by k zeros. For k=3, we have 000, 001, 002, 010, 011, 012, 020, 100, 101, 102, 110, 111, 112, 120 and finally 200.

More generally, we can see there are 2^(k+1) - 1 numbers with k digits. We can construct the k+1 digits by appending 0 or 1 and then adding an extra number which is starts 2 and is followed by k zeros, for a total of 2^(k+1) - 1 + 2^(k+1) - 1 + 1 = 2^(k + 2) - 1, so we can see this invariant holds by induction on k, and we verify that is true for the base, since for k = 1 we enumerated 3 numbers.

This gives us a method to construct the skewed number representation if we know the number of its digits say, k. If the number is the first 2^(k) - 1 numbers, that is, between 0 and 2^k - 2, we know it starts with a 0. If it’s the next 2^(k) - 1, that is, between 2^k - 1 and 2^(k+1) - 3, we know it starts with a 1. If it’s the next one, exactly 2^(k+1) - 2, we know it starts with a 2.

We can continue recursively by subtracting the corresponding weight for this digit until k = 0. We can find out how many digits a number n has (if we’re to exclude leading zeros) by find the smallest k such that 2^(k+1)-1 is greater than n. For example, for 8, the smallest k is 3, since 2^(k+1)-1 = 15, and 2^(k)-1 = 7.

The Python code below uses these ideas to find the skewed binary number representation in O(log n):

One might ask: why not OCaml? My excuse is that I already have a number theory repository in Python, so it seemed like a nice addition. Converting this to functional code, in particular OCaml is easy.

This algorithm requires an additional O(log n) memory, while the conversion to a binary number can be done with constant extra memory. My intuition is that this is possible because the weights for the binary numbers are powers of the same number, 2^k, unlike the skewed numbers’ weights. Is it possible to work around this?

Eliminating Amortization

Book

In this chapter Okasaki presents techniques to convert persistent data structures with amortized bounds to worst-case bounds. A few reasons to want worst-case guarantees include:

  • Real-time systems, where an expensive operation can cause the system to miss a hard deadline, even if it’s efficient when amortized over time.
  • Parallel systems, in which we want avoid one processor to take more time than the other if it happens to execute the expensive operation.

The intuition is to make the theoretical amortized analysis part of the actual implementation. In the amortized analysis, we saw either through the Banker’s method or the Physicist method the idea of paying debt ahead of time so by the time an expensive operation takes place, we could show it’s cost had already been distributed throughout previous operations.

To eliminate the amortization, we can literally pay off these costs when running the algorithm. We do it through a structure called schedule, which contains the unevaluated operations on the data structure. Whenever we perform an operation, we evaluate one or more items of the schedule. Due to memoization, by the time we actually need the evaluated structure, it will be evaluated.

Often the amortized analysis can dictate how the execution of the schedule is performed. For example, if the analysis tells us to pay off 2 units of debt on every action, that can be translated to executing 2 items from the schedule on every action.

The main challenge in this conversion is to modify the data structure in such a way it can be partially executed.

We’ll discuss an example using queues and then binomial heaps.

Real-time Queues

For the queue structure, the costly operation is the rotation that takes place when we need to combine the rear with the front. It’s a monolithic operation, so we need to change that.

Let’s start by defining the structure. It’s similar to the stream queue, except that we don’t need the rear to be a stream and we have a schedule field, which is also a stream. The idea is that whenever a rotation happens, it will be “scheduled” to be executed little by little.

The rotation() function is the most complicated part of the structure. The invariant is that we only call this function when |rear| = |front| + 1.

We construct a stream with the elements of rear reversed, newSchedule and on the return step of the recursion we append the elements of front to that stream.

Note that this is performed lazily, so a call to rotate only executes one step.

Now we have the execution of the schedule. It serves two purposes. One is to execute an item from the schedule and the other is to trigger a rotation when the is schedule empty.

The schedule being empty is a proxy to when |rear| = |front| + 1. Why? Because when the schedule is populated, it has the same size of front (they’re both assigned the same stream), and rear is empty. Whenever we insert an element, increasing the size of rear by one, or remove an element, reducing the size of front by one, we decrease the difference between |front| - |rear| by 1, and so the size of the schedule is decreased by 1.

Implementation-wise, maybe it would be more clear if we checked for the empty schedule outside and only called exec() is it was not empty.

With these in place, the usual operations for queue are straightforward. The ones that mutate the queue, push() and pop(), need to make sure to call exec().

Scheduled Binomial Heaps

We discussed Binomial Heaps in a previous post. The idea is that a binomial heap is a list of binomial trees, an a binomial tree is defined recursively based on it’s rank.

The heap is represented by a binary number (as a list of digits). If the k-th digit of the number is 1, it indicates the existence of a binomial tree of rank k (containing 2^(k-1)). A heap with n elements, has a unique representation, which is the binary representation of n.

For example, if n = 10, then the binary representation of the heap is 1010, meaning it contains a binomial tree of rank 2 (2 elements), and one with rank 4 (8 elements), holding 10 elements in total.

Inserting an element into the heap is analogous to incrementing a binary number by 1. We first create a binomial tree with rank 0 containing that element, and try to insert it into the beginning of the digits list. If the position we want to insert is occupied, we need to merge the trees, to generate a tree with a higher rank, which is analogous to the carry over operation when adding two binary numbers.

The schedule is a list of all the numbers generated when any operation is performed.

The structure for the heap is then the following:

As we discussed above, the insertion is analogous to incrementing the binary number. But because the digits are a stream, we need to deal with lazy evaluation:

Executing the schedule consists in evaluating one digit from the first number on the schedule. The key is to avoid evaluating the part of the number that already has been evaluated. It’s possible to show this happens when we find the first digit one. The intuition is that the trailing zeros from the current number were 1’s before the increment, so there was a mutation (linking) while the remaining digits were not modified by that operation.

For example, if we have the number 101011, an increment causes it to become 101100. The two trailing zeros in 101100 correspond to a linking of the binomial tree.

Finally, inserting a new element consists in creating a binomial tree of ranking 0, insert it, and then execute the schedule.

The full code is available on github.

Lessons learned

Because we now have several different implementations of queues, I wanted to implement tests that were applicable to any queue implementing an “interface”.

One way to do that is to put the interface in a separate module, IQueue.ml and have each implementation require it as their module type:

To make the test work with any module implementing the IQueue interface, we can use a functor (module transformation) and

Other things I’ve learned were matching lazy patterns. Matching a lazily evaluated variable with the keyword lazy forces the evaluation, so we can use a cleaner syntax, for example:

Another syntactic construct that helps with legibility is the record. The examples in Okazaki’s book use tuples for most of composed structs, but I prefer the more verbose and explicit records.

Finally, another lesson learned is that adding an insertion operation to the Stream API is likely wrong. The idea of the stream is that is avoids evaluation of its tail, so the whole block has to lazily evaluated.

For example, in the queue implementation, in the first block, we will evaluate all the rotation and then make the result lazy, which is not what we want.

Conclusion

Eliminating evaluation is a very interesting technique, but it has limited application is practice. It complicates the implementation and, as I learned, it can be tricky to spot bugs (for example, the one in which we’re evaluating the rotate() function) that would only show up if we profile the data structure.

Amortization and Persistence via Lazy Evaluation

In this chapter Okasaki works around the problem of doing amortized analysis with persistent data structures because the amortized analysis assumes in place modification while for persistent data structures (partial) copies are made. The intuition is that lazy evaluation, which comes with memoization and avoids recomputation, solves this problem.

He adapts the Banker’s and Physicists’s methods to work with lazy evaluated operations and applies them to a few structures including Binomial Heaps, Queues and Lazy Pairing Heaps. In this post we’ll only cover the examples of the Queues using both methods.

We’ll first introduce some concepts and terminology, then we’ll present a queue implementation using lazy evaluation that allows us analyzing it under persistence. Following that we’ll explain the Banker’s and Physicist’s methods and prove that the implementation for push/pop has an efficient amortized cost.

Persistence as a DAG

An execution trace is a DAG where nodes represent the operations (e.g. updates to a data structure) and an edge from nodes v to v' indicates that the operation corresponding to v' uses the output of the one corresponding to v.

For example, if we have this set of operations:

let a = push 0 newEmpty
let b = push 1 a
let c = pop b
let d = push 2 b
let e = append c d
let f = pop c
let g = push 3 d 

The corresponding execution graph is:

Execution Graph

The logical history of an operation v is the set of all operations it depends on (directly or indirectly, and including itself). Equivalently, in terms of the DAG, it’s the set of nodes that have a directed path to v.

A logical future of an operation v is any directed path from v to a terminal node.

A framework for analyzing lazy evaluated data structures

We need to introduce a few more concepts and terminology. Note: we’ll use suspension and lazy operation interchangeably, which can be evaluated or forced.

The unshared cost of an operation is the time it would take to execute it if it had already been performed and memoized before, so if the operation involves any expression that is lazy, that expression would be O(1).

The shared cost of an operation is the time it would take it to execute (force) all the suspensions created (but not evaluated) by the operation.

The complete cost is the sum of the shared and unshared costs. Alternatively, the complete cost of an operation is the time it would take to execute the operation if lazy evaluation was replaced by strict. To see why, first we note that the unshared costs have to be paid regardless of laziness. Since we’re assuming no laziness, the operation has to pay the cost associated with the suspension it creates, which corresponds to the shared costs. Note that under this assumption we wouldn’t need to account for the cost of forcing suspensions created by previous operations because in theory they have already been evaluated.

When talking about a sequence of operations, we can break down the shared costs into two types: realized and unrealized costs. The realized costs are the shared costs from suspensions were actually forced by some operation in the sequence. Example: say that operations A and B are in the sequence and A creates a suspension, and then B forces it. The cost for B to force it is included in the realized cost. The unrealized costs are the shared costs for suspensions that were created but never evaluated within the sequence. The total actual cost of a sequence of operations is the sum of the realized costs and the unshared costs.

Throughout a set of operations, we keep track of the accumulated debt, which starts at 0 at the beginning of the sequence. Whenever an operation is performed, we add its shared cost to it. For each operation, we can decide how much of this debt we want to pay. When the debt of a suspension is paid off, we can force it. The amortized cost of an operation is its unshared cost plus the amount of debt it paid (note that it does not include the realized cost). Note that as long as we always pay the cost of a suspension before it’s forced, the amortized cost will be an upper bound on the actual cost.

This framework simplifies the analysis for the case when a suspension is used more than once by assuming that its debt was paid off within the logical history of when it was forced, so we can always analyze a sequence of operations and don’t worry about branching. This might cause the debt being paid multiple times, but it simplifies the analysis.

The author uses the term discharge debit as synonym of pay off debt. I find the latter term easier to grasp, so I’ll stick with it throughout this post.

Let’s introduce an example first and then proceed with the explanation of the Physicist’s method and the corresponding analysis of the example.

The Stream Queue

To allow efficient operations on a queue in the presence of persistence, we can make some of the operations lazy. Recall in a previous post we defined a queue using two lists. To avoid immediate computation, a natural replacement for lists is using its lazy version, the stream data structure, which we also talked about in a previous post.

For the list-based queue, the invariant was that if the front list is empty, then the rear list must be empty as well. For the stream queue, we have a tighter constraint: ‘front’ must be always greater or equal than ‘rear’. This constraint is necessary for the analysis.

The definition of the stream queue is the following:

We store the lengths of the streams explicitly for efficiency.

We’ll be using the Stream developed in the previous chapter, so we’ll refer to the module Stream2 to avoid ambiguity with the standard Stream module.

Inserting an element at the end of the queue is straightforward, since the rear stream represents the end of the queue and is reversed:

The problem is that inserting at rear can cause the invariant of the queue to be violated. check() changes the structure so to conform to the invariant by potentially reversing rear and concatenating with front:

Removing an element from the queue requires us to evaluate the first element of the front stream. Again, the invariant can be violated in this case so we need to invoke check() again:

The complete code for the stream queue is on Github.

Analysis using the Banker’s Method

The idea of the Banker’s Method is basically define an invariant for the accumulated debt and a strategy for paying it off (that is, decide how much debt each operation pays off). Then we show that whenever we need to force a suspension, the invariant guarantees that the accumulated debt has been paid off. One property of the Banker’s method is that it allows associating the debt to specific locations of the data structure. This is particularly interesting for streams, because it contains multiple (nested) suspensions, so we might force parts of this structure before we paid the debt associated with the entire structure.

By inspection, we can see that the unshared cost of both push and pop are O(1). It’s obvious in the case of push, and in the case of pop, in theory check could take O(m) where m is the size of the queue, but since Stream2.concat() and Stream2.reverse() are both lazy, and hence memoized, they are not included in the unshared costs.

To show that the amortized cost of both operations is O(1), we can show that paying off O(1) debt at each operation is enough to pay for the suspension before it is forced. For the queue, we also need to associate the debt with parts of the data structure, so that we could force the suspension of only some parts of it (for example, on the stream we can evaluate only the head, not necessarily the entire structure).

We now define an invariant that should be respected by all operations. Let d(i) be the debt at the i-th node on the front stream, and D(i) = \sum_{j=0}^{i} d(j) the accumulated debt up to node i. The invariant is:

D(i) \le \min(2i, |f| - |r|)

This constraint allows us evaluating the head at any time, because D(0) = 0, which means its debt has been paid off. The second term in min(), guarantees that if |f| = |r| the entire stream can be evaluated, because D(i) = 0 for all i.

The author then proves that by paying off one debt in push() and two debt units in pop() is enough to keep the debt under the constraint.

Queue with suspended lists

Because the Physicist’s method cannot assign costs to specific parts of the data structure, it doesn’t matter if the structure can be partially forced (like streams) or if it’s monolithic. With that in mind, we can come up with a simpler implementation of the queue by working with suspended lists instead of streams. Only the front list has to be suspended because the cost we want to avoid, the reversal of the back list and concatenation to the front list, happens on the front list.

On the other hand, we don’t want to evaluate the front list when we perform a peek or pop, so we keep a evaluated version of the front list too.

The signature of the structure is as follows:

As mentioned in the code above, the invariants we want to enforce is that the front list is never smaller than the rear list

and that the evaluated version of the front list is never empty if the lazy version still has some elements.

The push and pop operations are similar to the other versions of queue, but since we mutate the structure, we might need to adjust it to conform to the invariants:

Finally, because of our second invariant, peeking at the queue is straightforward:

The complete code for the suspended queue is on Github.

Analysis using the Physicist’s Method

We’ve seen the Physicist’s Method in a previous post when we’re ignore the persistence of the data structures. We adapt the method to work with debits instead of credits. To avoid confusion, we’ll use \Psi to represent the potential function. \Psi(i), represents the accumulated debt of the structure at step i. At each operation we may decide to pay off some debit, which will be then included in the amortized cost. We have that \Psi(i) - \Psi(i - 1) is the increase in debt after operation i. Remember that the shared cost of an operation corresponds to the increase in debt if we don’t pay any of the debt. Thus, we can find out how much debt was paid off then by s_i - \Psi(i) - \Psi(i - 1), where s(i) is the shared costs of operation i. Let u(i) and c(i) be the unshared and complete costs of the operation i. Given that, by definition, c(i) = u(i) + s(i), we can then express the amortized cost as:

a_i = cc_i - (\Psi(i) - \Psi(i - 1))

To analyze the suspended queue we need to assign values to the potentials such that by the time we need to evaluate a suspension the potential on the structure is 0 (that is, the debt has been paid off). For the suspended queues we’ll use the following potential function:

\Psi(q) = \min(2|w|, |f| - |r|)

Where w is the forcedFront, f is lazyFront and r is rear.

We now claim that the amortized cost of push is at most 2. If we push and element that doesn’t cause a rotation (i.e. doesn’t violate |f| \ge |r|), then |r| increases by 1, and the potential decreases by 1. No shared is incurred and the unshared cost, inserting an element at the beginning of rear is 1, hence the amortized cost for this case is 1 – (-1) = 2. If it does cause a rotation, then it must be that after the insertion |f| = m and |r| = m + 1. After the rotation we have |f| = 2*m + 1 and |r| = 0, but w hasn’t changed and cannot be larger than the original f, so the potential function is at most 2*m. The reversal of r costs m + 1 and concatenating to a list of size x costs x (discussed previously), plus the cost of initially appending an element to read, so the unshared cost is 2*m + 2. No suspensions were created, so the amortized cost is given by 2*m + 2 - (2*m) = 2.

Our next claim is that the amortized cost of pop is at most 4. Again, if pop doesn’t cause a rotation, |w| decreases by 1, so the potential is reduced by 2. The unshared cost is 1, removing an element from w, and the shared cost, 1 comes from the suspension that lazily removes the head of lazyFront. The amortized cost is 2 – (-2) = 4. Note that if w = 0, we’ll evaluate f, but the ideas is that it has been paid off already. If the pop operation causes a rotation, then the analysis is similar to the push case, except that the complete cost is must account for the shared cost of lazily reming the head of lazyFront, so it’s 2*m + 3, for an amortized cost of 3.

Note that when the suspensions are evaluated, the potential is 0, either when |f| = |r| or |w| = 0.

Conclusion

In this post we covered a simple data structure, the queue, and modified it to be lazy evaluated and can show, with theory, that it allows for efficient amortized costs. The math for proving the costs is not complicated. The hardest part for me to grasp is to get the intuition of how the analysis works. The analogy with debt is very useful.

Meta: Since wordpress code plugin doesn’t support syntax highlighting for OCaml, I’m experimenting with the Gist plugin. Other advantages is that Gists allow comments and has version control!

Persistent Data Structures

Book

In this post we’ll discuss some basic data structures in functional programming and their implementation in OCaml. Our main reference are the first two chapters of Purely Functional Data Structures by Chris Okasaki.

The first chapter explains persistent data structures and the second chapter provides examples of three common data structures in imperative programming and how to implement them in ML (a functional language).

Persistent Data Structures

The core idea behind designing data structures in functional languages is that they’re immutable, that is, if you want to modify it, a new copy has to be made.

If we want to modify a data structure in the naive way, we would need to clone the entire structure with the modifications. Instead, this is done in a smart way so different versions of the data structure share them. Two examples are given to illustrate this idea: lists and binary trees.

Immutable Linked Lists

Lists are a central native data structure in functional languages. In Ocaml, they’re implemented as linked list. From our data structure class, we know that inserting an element at the beginning of a list is an O(1) operation, but this requires modifying pointers in the data structure. As we said data structures are immutable, so we need to make a copy to perform this operation.

Note that we only need to make the new node point to the beginning of the original list, which means we don’t have to clone this list, we just reuse it.

Now, if we’re inserting in the middle of the list will require copying the existing nodes until we reach our newly inserted element, which we can then point to the remaining of the original list. Consider the example below, where we insert the element 4 in the second position of the list.

Inserting a new element (4) in a linked list

Inserting a new element (4) in a linked list

Note that inserting an element at the beginning of a linked list is much more efficient than at the end, both in terms of complexity and space. If we were to insert it at the end, we would require full copies and insertion would be O(length of the list). As an experiment, we can write a function that generates a list representing a range of integers in two ways.

In example 1, we create the list by inserting an element at the beginning of the array,

let rec list_range_fast start_range end_range =
  if start_range == end_range then []
  else start_range :: (list_range_fast (start_range + 1) end_range)
;;

In example 2 we do it at the end:

let rec list_range_slow start_range end_range =
  if start_range == end_range then []
  else (list_range_slow start_range (end_range - 1)) @ [end_range]
;;

If I run the slow version with start_range = 0 and end_range = 50000, it takes over a minute to run on my computer, while the fast version runs in a few milliseconds.

Immutable Binary Trees

Binary trees are a generalized version of a linked list and it works the same way. The key point to notice is that when inserting an element in a (binary) tree only the nodes in the path from the root to the inserted node need to have their pointers modified, so we need to clone a number of nodes that are at most the height of the tree. See the example below:

Inserting an element in a binary tree

Inserting an element in a binary tree. Green nodes are the new ones

With this basic concept understood for simple constructs like linked lists and binary trees, we can construct basic data structures on top of them: heaps and balanced binary trees.

Leftist Heaps

A (minimum) leftist heap is a binary tree with the following properties:

1. The value of a node is smaller or equal the value of its children.

Let the spine of a node be the path defined by following the right children until a leaf is found (i.e. the rightmost path), and the rank of a node the length of its spine.

2. The rank of the left child is always greater or equal than the right child.

It’s possible to show that the rank of a heap of n nodes is less or equal than O(floor(log n + 1)).

Insertion. Before talking about inserting an element in a heap, let’s define the more general merge() function, which merges two leftist heaps, A and B, into one. We compare the root of each heap, say A.val and B.val. If A.val <= B.val, we make A.val the root of the new heap, A.left the left of the new heap and the right of the new heap will be the merge of A.right and B. If A.val >= B.val we do an analogous operation.

Note that the result of the merge might violate property 2, since we’re adding a new node to the right subtree. This is easy to fix: we just to swap the left and right subtree when coming back from the recursion. This is what makeTree() does.

Note that we always perform the merges of the right, which means that the number of such merges will be proportional to the largest rank the original heaps, which implies we can do it in O(log(A.size + B.size)).

In the context of immutable structures, this implementation of heap is efficient because the only nodes whose pointers we need to mutate (even when a swap is needed) are on the spine of the trees, so it only needs to create O(log n) new nodes.

The insertion process consists in creating a heap with one element and merging into the target heap.

Returning the minimum element is trivial: we just need to return the element at the root. Removing the minimum is easy as well: just merge the left and right subtree using merge().

My implementation for the leftist heap is on github.

Binomial Heaps

Binomial heaps are an alternative heap implementation. Before defining a binomial heap, let’s introduce the binomial tree. A binomial tree is a can be defined recursively based on a property called rank. A single node is a binomial tree of rank 0. A tree of rank r > 0, is formed by combining two trees of rank r-1 making one tree the leftmost child of the other. We denote this operation as linking.

Binomial Heap

Examples of binomial trees of different ranks

A binomial heap is a list of binomial trees ordered by increasing rank, with no trees with the same rank.

Insertion. Before talking about insertion of an element into a heap, let’s see how to insert a binomial tree of rank r into the heap. To keep the order of the rank of the tree list, we need to traverse through it to find the right place to insert it. Since we can’t have two trees with the same rank, if we run into this case we need to merge those two trees into one with rank r + 1. This can cascade so we might need to repeat this process with the new tree.

Linking a tree is a direct application of its definition. We just need to decide which of the trees will become child of the other. Since we want a minimum heap, we need to guarantee that the minimum element is always on top, so we always make the tree with smallest root to be on top.

Linking is a constant time operation, since we only need to update the pointer of the root of the top tree, which also means we only need to clone the root node to generate a new immutable binomial tree.

For the complexity of traversing the heap list, we can first node that a heap of rank r has 2^r nodes. Which means that a heap of n elements has at most log(n) trees. In fact, a neat way to represent a binomial heap of size n is by its binary representation. For example, 10(dec) = 1010(bin), so for a heap of size 10, we should have a list of trees with ranks 1 and 3. This shows that we can traverse a list in O(log n) and in the worst case we might need to clone this many number of nodes.

Returning the minimum element requires us to scan the list of trees, because even though we know the minimum element is the root of a tree, we don’t know which tree it is, so this operation is O(log n). For removing this element, we can define an auxiliary function, removeMinTree(), which removes the tree with the minimum element from the tree list. Since we only want to remove the root of this tree, we need to re-insert the subtrees back to the heap.

One key observation is that, in a binomial tree of rank r, the children of the root are also binomial trees of ranks from 0 to r-1, which form a binomial heap. We can then define a merge() function that merges two heaps using an idea similar to a merge sort. If we refer back to the analogy of the binary representation of the heap size, the merge operation is analogous to adding two binary numbers!

My implementation for the binomial heap is on github.

Red Black Trees

A red-black tree is a binary seach tree in which every node is either Red or Black. It respects the following invariants:

1. No red node has a red child
2. Every path from the root to an empty node has the same number of black nodes

Property. the height of a red-black tree of n nodes is at most 2*floor(log(n + 1))

Proof sketch. If the tree has a path to an empty node of length greater than 2*floor(log(n + 1)), this must have more than floor(log(n + 1)) black nodes in the path because we cannot have two consecutive red nodes (invariant 1). Now, by removing all the red nodes, we must have a complete tree of height >= floor(log(n) + 1) + 1, which means 2^(floor(log(n + 1)) + 1) - 1 which is greater than n.

Membership. Since a Red-Black tree is a binary search tree, search can be done in O(height of the tree) which is O(log n) by the property above.

Insertion. inserting an element in a Red Black tree is similar to inserting it into a binary search tree. The challenge is that it may violate one of the invariants. To avoid that we must perform re-balancing along the path we follow to insert the node in the tree.

We always color the inserted node Red. This doesn’t violate the Black nodes constraint, but it might violate the Red nodes one, in case its parent is also Red. If that’s the grandparent of the inserted is necessarily Black. We now have 4 possible scenarios, depicted at the top, right, bottom and left trees:

Unbalanced Red-Black trees and the result of the balancing operation

Unbalanced Red-Black trees and the result of the balancing operation

We assume that the subtrees a, b, c and d are balanced. For all these 4 cases we can perform a “rotation” to achieve the tree at the center.

The Black nodes constraint is not violated, because the path from the root still has the same number of Black nodes as before, and we fixed the consecutive Reds. Notice that y might violate the Red node constraints with its parent, so we need to do it recursively.

In terms of complexity, the insertion can be done in O(log n) and rebalancing takes a constant amount of time. Regarding the immutable structure, notice we only need to change nodes from the insertion path, so only O(log n) nodes have to be cloned.

My implementation for the Red Black tree in Ocaml is on github.

Conclusion

The first two chapters from this book are really interesting. I have seen the binomial heap and Red-Black trees before in data structure classes and also implemented some data structures such as AVL trees in functional programming in the past (Lisp) during college.

I wasn’t aware of the immutability of data in functional programming until much later, when I was learning Haskell. Okasaki first introduces this concept in Chapter 2, so it allow us keeping that in mind when studying the implementation of functional data structures.

He doesn’t make it explicit on Chapter 3 that the data structures presented are efficient in terms of extra memory necessary to clone them, but they are easy to see.

The ML syntax is very similar to OCaml, but it was a good exercise implementing the code on my own. I tried making it more readable with comments and longer variable names. This also led me to learn a few OCaml constructs and libraries, including:

* How to perform assertions (See the binomial heap implementation)
* Unit testing (using OUnit2 library)
* The “or matching” pattern (see the balance() function in red_black_tree.ml)