Lazy Rebuilding

Book

In this chapter Okasaki starts with a common technique data structures use to achieve efficient complexity times. A classic example are balanced binary search trees, which, on one hand, need to be balanced to avoid degenerated cases (think of a binary tree which can degenerate to a path), but on the other hand, balancing is usually too expensive to perform at every operation.

The tradeoff is to allow a certain degree of “imbalance”, such that the big-O complexity of operations does not degenerate, and to let us perform re-balances only every so often, in a way that leads to an efficient amortized complexity. Structures such as AVL trees and Red-black trees make use of this technique.

The general technique is named batched rebuilding. The main issue with it though is that we saw that amortized analysis does not play well persistent data structures.

To address that problem, a technique was developed by Mark Overmars, called global rebuilding.

Global Rebuilding

The basic idea is to keep two copies of a structure in parallel, one of which we perform incremental updates (write) and the other is used for read operations. After a few operations, we copy the write version into the read-only version. The idea is that the incremental updates are cheap, but might put the structure in a state that is not suitable for reading. This allows for distributing the costs across operations such that each operation has worst case efficient bounds.

The major downside of this technique is the complexity and corner cases, because we also need to account for changes to the structure that renders the write copy of the structure invalid.

We’ll now describe yet another implementation of queues that use this technique, developed by Robert Hood and Robert Melville. Similar to other queue implementations, this structure maintains a front and rear lists, the latter in reverse order. It also has the invariant that the rear list cannot be larger than the front.

When that happens we must perform a rotation, which consists in reversing the rear queue and appending it to the end of the front queue. We cannot perform this operation piecemeal, because we only have efficient access to the first element of a list. The operation we can do partially is concatenating the reverse of a list to the beginning of another list, that is, given xs and ys, we can do ~xs + ys, where ~ represents the reverse of a list. Note we can also reverse a list, that is xs to ~xs piecemeal by having ys = [].

Now, to achieve our goal which is xs + ~ys, we reverse both xs and ys, then concatenate the reverse of ~xs to ~ys:

1) Start with xs and ys
2) Reverse both: ~xs and ~ys
3) Then (~(~xs)) + ~ys which is xs + ~ys

We can perform these operations one step at a time by using a “state machine”, we first start with a state Reversing(xs, ys) which will reverse xs and ys at the same time into ~xs and ~ys, at which point we switch to the state Appending(~xs, ~yx) which will concatenate the xs into ~ys, so then we get to Done(xs + ~ys).

The problem of keeping a separate state is that it might not be accurate by the time it’s done. In particular, if we remove an element from the front, the Appending step can be shortcut. We can keep a count of how many of the elements in ~xs of Appending are still present. If, by the time we’re appending the number of present elements goes to 0, the last elements of ~xs (that is, the first elements of xs) have been removed, so they do not need to be transferred to ~ys.

A possible implementation of the states is:

The Idle is just a placeholder to make sure the state machine can keep going to the next state even when we’re not currently performing a rotation. The state transition definition is given by:

One important detail here is that Appending ({validCount = 0; front; rear}) must appear before the other matching rule for Appending, because the other one includes this.

When we remove an element from the front of the queue we need to update the number of valid elements in the front of the rotation state:

Similar to the Real-time queues, we can guarantee constant time worst case for all the operations if we execute the state machine twice for each operation. The check() function verifies if we need a rotation and also executes the nextStep() function twice.

The other operations are then straightforward. The only thing is that pop() must call the invalidate() function because it’s decreasing the size of front:

The full, up-to-date implementation in on github.

Lazy Rebuilding

As we can see, the Hood-Melville implementation is quite involved. Okasaki argues that making use of lazy evaluation and memoization simplifies the implementation. We can indeed see that the implementation of Real-time queues is much cleaner.

One simplification is that we don’t have to sync the write copy with the read copy. We just need to make sure to execute items from the schedule by the time they are needed. Memoization will take care of the rest.

Another advantage in this case is that reversing and concatenating lazily evaluated lists does not require an intermediate reversal of the front, which means that we can remove the front element without the need to invalidate any items in the schedule.

The author provided more examples of lazy rebuilding for deques (double-ended queues) by first presenting an amortized version using the Banker’s method and then a lazy rebuilding version. The Banker’s deque and Real-time deque are also on my github.

Conclusion

In this chapter the technique behind Real-time queues was formalized. I really enjoy the organization of the book in which the author presents a data structure and from there he extracts a generic pattern of technique that can be applicable to other cases.

When I was studying data structures I don’t recall learning about general techniques underlying existing data structures, such as batched rebuilding for AVL trees and Red-black trees. That would have been interesting.

Eliminating Amortization

Book

In this chapter Okasaki presents techniques to convert persistent data structures with amortized bounds to worst-case bounds. A few reasons to want worst-case guarantees include:

  • Real-time systems, where an expensive operation can cause the system to miss a hard deadline, even if it’s efficient when amortized over time.
  • Parallel systems, in which we want avoid one processor to take more time than the other if it happens to execute the expensive operation.

The intuition is to make the theoretical amortized analysis part of the actual implementation. In the amortized analysis, we saw either through the Banker’s method or the Physicist method the idea of paying debt ahead of time so by the time an expensive operation takes place, we could show it’s cost had already been distributed throughout previous operations.

To eliminate the amortization, we can literally pay off these costs when running the algorithm. We do it through a structure called schedule, which contains the unevaluated operations on the data structure. Whenever we perform an operation, we evaluate one or more items of the schedule. Due to memoization, by the time we actually need the evaluated structure, it will be evaluated.

Often the amortized analysis can dictate how the execution of the schedule is performed. For example, if the analysis tells us to pay off 2 units of debt on every action, that can be translated to executing 2 items from the schedule on every action.

The main challenge in this conversion is to modify the data structure in such a way it can be partially executed.

We’ll discuss an example using queues and then binomial heaps.

Real-time Queues

For the queue structure, the costly operation is the rotation that takes place when we need to combine the rear with the front. It’s a monolithic operation, so we need to change that.

Let’s start by defining the structure. It’s similar to the stream queue, except that we don’t need the rear to be a stream and we have a schedule field, which is also a stream. The idea is that whenever a rotation happens, it will be “scheduled” to be executed little by little.

The rotation() function is the most complicated part of the structure. The invariant is that we only call this function when |rear| = |front| + 1.

We construct a stream with the elements of rear reversed, newSchedule and on the return step of the recursion we append the elements of front to that stream.

Note that this is performed lazily, so a call to rotate only executes one step.

Now we have the execution of the schedule. It serves two purposes. One is to execute an item from the schedule and the other is to trigger a rotation when the is schedule empty.

The schedule being empty is a proxy to when |rear| = |front| + 1. Why? Because when the schedule is populated, it has the same size of front (they’re both assigned the same stream), and rear is empty. Whenever we insert an element, increasing the size of rear by one, or remove an element, reducing the size of front by one, we decrease the difference between |front| - |rear| by 1, and so the size of the schedule is decreased by 1.

Implementation-wise, maybe it would be more clear if we checked for the empty schedule outside and only called exec() is it was not empty.

With these in place, the usual operations for queue are straightforward. The ones that mutate the queue, push() and pop(), need to make sure to call exec().

Scheduled Binomial Heaps

We discussed Binomial Heaps in a previous post. The idea is that a binomial heap is a list of binomial trees, an a binomial tree is defined recursively based on it’s rank.

The heap is represented by a binary number (as a list of digits). If the k-th digit of the number is 1, it indicates the existence of a binomial tree of rank k (containing 2^(k-1)). A heap with n elements, has a unique representation, which is the binary representation of n.

For example, if n = 10, then the binary representation of the heap is 1010, meaning it contains a binomial tree of rank 2 (2 elements), and one with rank 4 (8 elements), holding 10 elements in total.

Inserting an element into the heap is analogous to incrementing a binary number by 1. We first create a binomial tree with rank 0 containing that element, and try to insert it into the beginning of the digits list. If the position we want to insert is occupied, we need to merge the trees, to generate a tree with a higher rank, which is analogous to the carry over operation when adding two binary numbers.

The schedule is a list of all the numbers generated when any operation is performed.

The structure for the heap is then the following:

As we discussed above, the insertion is analogous to incrementing the binary number. But because the digits are a stream, we need to deal with lazy evaluation:

Executing the schedule consists in evaluating one digit from the first number on the schedule. The key is to avoid evaluating the part of the number that already has been evaluated. It’s possible to show this happens when we find the first digit one. The intuition is that the trailing zeros from the current number were 1’s before the increment, so there was a mutation (linking) while the remaining digits were not modified by that operation.

For example, if we have the number 101011, an increment causes it to become 101100. The two trailing zeros in 101100 correspond to a linking of the binomial tree.

Finally, inserting a new element consists in creating a binomial tree of ranking 0, insert it, and then execute the schedule.

The full code is available on github.

Lessons learned

Because we now have several different implementations of queues, I wanted to implement tests that were applicable to any queue implementing an “interface”.

One way to do that is to put the interface in a separate module, IQueue.ml and have each implementation require it as their module type:

To make the test work with any module implementing the IQueue interface, we can use a functor (module transformation) and

Other things I’ve learned were matching lazy patterns. Matching a lazily evaluated variable with the keyword lazy forces the evaluation, so we can use a cleaner syntax, for example:

Another syntactic construct that helps with legibility is the record. The examples in Okazaki’s book use tuples for most of composed structs, but I prefer the more verbose and explicit records.

Finally, another lesson learned is that adding an insertion operation to the Stream API is likely wrong. The idea of the stream is that is avoids evaluation of its tail, so the whole block has to lazily evaluated.

For example, in the queue implementation, in the first block, we will evaluate all the rotation and then make the result lazy, which is not what we want.

Conclusion

Eliminating evaluation is a very interesting technique, but it has limited application is practice. It complicates the implementation and, as I learned, it can be tricky to spot bugs (for example, the one in which we’re evaluating the rotate() function) that would only show up if we profile the data structure.

Largest sets of subsequences in OCaml

I’ve noticed that there is this set of words in English that look very similar: tough, though, through, thought, thorough, through and trough. Except thought, they have one property in common: they’re all subsequence of thorough. It made me wonder if there are interesting sets of words that are subsequences of other words.

Word cloud made with Jason Davie’s tool (https://www.jasondavies.com/wordcloud/)

This post is an attempt to answer a more general question: given a list of words, what is the largest set of these words such that they’re subsequences of a given word?

A word A is a subsequence of a word B if A can be obtained by removing zero or more characters from B. For example, “ac” is a subsequence of “abc”, so is “bc” and even “abc”, but not “ba” nor “aa”.

A simple algorithm to determine if a word A is a subsequence of another is to start with 2 pointers at the beginning of each word, pA and pB. We move pB forward until pA and pB point to the same character. In that case we move pA forward. A is a subsequence of B if and only if we reach the end of A before B. We could then iterate over each word W and find all the words that are subsequences of W. If the size of the dictionary is n, and the size of the largest word is w, this algorithm would run in O(n^2 w).

For English words, we can use entries from /usr/share/dict/words. In this case, n is around 235k (wc -l /usr/share/dict/words), so a quadratic algorithm will take a while to run (around 5e10 operations).

Another approach is to generate all subsequences of words for a given word W and search the dictionary for the generated word. There are O(2^w) subsequences of a word of length w. If we use a hash table, we can then do it in O(n w 2^w). In /usr/share/dict/words, the length of the largest word, w, is 24.

Running a calculation with the numbers (R script), the number of high-level operations is 4e10, about the same order of magnitude as the quadratic algorithm.

Distribution using ggplot2

A third approach is to use a trie. This data structure allows us to store the words from the dictionary in a space-efficient way and we can search for all subsequences using this structure. The trie will have at most 2e6 characters (sum of characters of all words), less because of shared prefixes. Since any valid subsequence has to be a node in the trie, the cost of search for a given word cannot be more than the size of the trie t, so the complexity per word is O(\min(2^w, t)). A back of envelope calculation gives us 2e9. But we’re hoping that the size of the trie will be much less than 2e6.

Before implementing the algorithm, let’s define out trie data structure.

The Trie data structure

A trie is a tree where each node has up to |E| children, where |E| is the size of the alphabet in consideration. For this problem, we’ll use lower case ascii only, so it’s 26. The node has also a flag telling whether there’s a word ending at this node.

Notice that in this implementation of trie, the character is in the edge of the trie, not in the node. The Map structure from the stlib uses a tree underneath so get and set operations are O(log |E|).

The insertion is the core method of the structure. At a given node we have a string we want to insert. We look at the first character of the word. If a corresponding edge exists, we keep following down that path. If not, we first create a new node.

To decide whether a trie has a given string, we just need to traverse the trie until we either can’t find an edge to follow or after reaching the end node it doesn’t have the hasEntry flag set to true:

This and other trie methods are available on github.

The search algorithm

Given a word W, we can search for all its subsequences in a trie with the following recursive algorithm: given a trie and a string we perform two searches: 1) for all the subsequences that contain the first character of current string, in which case we “consume” the first character and follow the corresponding node and 2) for all the subsequences that do not contain the first character of the current string, in which case we “consume” the character but stay at the current node. In pseudo-code:

Search(t: TrieNode, w: string):
    Let c be the first character of w.
    Let wrest be w with the first character removed

    If t contains a word, it's a subsequence of the
    original word. Save it.

    // Pick character c
    Search(t->child[c], wrest)

    // Do not pick character c
    Search(t, wrest)

The implementation in OCaml is given below:

Experiments

Our experiment consists in loading the words from /usr/share/dict/words into a trie, and then, for each word in the dictionary, look for its subsequences. The full code is available on github.

The code takes 90 seconds to run on my laptop. Not too bad but I’m still exploring ways to improve the performance. One optimization I tried is to, instead of returning an explicit list of strings as mentioned in the search implementation, return them encoded in a trie, since we can save some operations due to shared prefixes. I have that version on github, but unfortunately that takes 240 seconds to run and requires more memory.

Another way is to parallelize the code. The search for subsequences is independent for each word, so it’s an embarrassingly parallel case. I haven’t tried this path yet.

The constructed trie has 8e5 nodes or ~40% of the size of sum of characters.

Subsequences of “thorough”

The question that inspired this analysis was finding all the subsequences of thorough. It turns out it has 44 subsequences, but most of them are boring, that is, single letter or small words that look completely unrelated to the original word. The most interesting ones are those that start with t and have at least three letters. I selected some of them here:

  • tho
  • thoo
  • thoro
  • thorough
  • thou
  • though
  • thro
  • throu
  • through
  • thug
  • tog
  • tou
  • toug
  • tough
  • trough
  • trug
  • tug

The word with most subsequences is pseudolamellibranchiate, 1088! The word cloud at the beginning of the post contains the 100 words with the largest number of subsequences. I tried to find interesting words among these, but they’re basically the largest words – large words have exponentially more combination of subsequences, and hence the chance of them existing in the dictionary is greater. I tried to come up with penalization for the score:

1) Divide the number of subsequences by the word’s length. This is not enough, the largest words still show on top.
2) Apply log2 to the number of subsequences and divide by the word’s length. In theory this should account for the exponential number of subsequences of a word. This turns out to be too much of a penalization and the smallest word fare too well in this scenario.

I plotted the distribution of number of subsequences by word lengths. We can see a polynomial curve but with increased variance:

Generated with this ggplot2

In the chart above, we’d see all points with the same x-value in a single vertical line. One neat visualization trick is to add noise (jitter) so we also get a sense of density.

If we use a box plot instead, we can see a quadratic pattern more clearly by looking at the median for each length.

Generated with this ggplot2

Given this result, I tried a final scoring penalization, by dividing the number of subsequences by the square of the length of the word, but it’s still not enough to surface too many interesting words. Among the top 25, streamlined is the most common word, and it has 208 subsequences.

One interesting fact is that the lowest scoring words are those with repeated patterns, for example: kivikivi, Mississippi, kinnikinnick, curucucu and deedeed. This is basically because we only count unique subsequences.

Conclusion

This was a fun problem to think about and even though it didn’t have very interesting findings, I learned more about OCaml and R. After having to deal with bugs, compilation and execution errors, I like OCaml more than before and I like R less than before.

R has too many ways of doing the same thing and the API is too lenient. That works well for the 80% of the cases which it supports, but finding what went wrong in the other 20% is a pain. OCaml on the other hand is very strict. It doesn’t even let you add an int and a float without an explicit conversion.

I learned an interesting syntax that allows to re-use the qualifier/namespace between several operations when chaining them, for example:

I also used the library Batteries for the first time. It has a nice extension for the rather sparse String module. It allows us to simply do open Batteries but that overrides a lot of the standard modules and that can be very confusing. I was scratching my head for a long time to figure out why the compiler couldn’t find the union() function in the Map module, even though I seemed to have the right version, until I realized it was being overridden by Batteries. From now on, I’ll only use the specific modules, such as BatString, so it’s easy to tell which method is coming from which module.

References

OCaml

  • [1] OCaml Tutorials > Map
  • [2] Strings – Batteries included
  • [3] Using batteries when compiling

R

  • [1] R Tutorial – Histogram
  • [2] Creating plots in R using ggplot2 – part 10: boxplots
  • [3] My R Cheat sheet

Amortization and Persistence via Lazy Evaluation

In this chapter Okasaki works around the problem of doing amortized analysis with persistent data structures because the amortized analysis assumes in place modification while for persistent data structures (partial) copies are made. The intuition is that lazy evaluation, which comes with memoization and avoids recomputation, solves this problem.

He adapts the Banker’s and Physicists’s methods to work with lazy evaluated operations and applies them to a few structures including Binomial Heaps, Queues and Lazy Pairing Heaps. In this post we’ll only cover the examples of the Queues using both methods.

We’ll first introduce some concepts and terminology, then we’ll present a queue implementation using lazy evaluation that allows us analyzing it under persistence. Following that we’ll explain the Banker’s and Physicist’s methods and prove that the implementation for push/pop has an efficient amortized cost.

Persistence as a DAG

An execution trace is a DAG where nodes represent the operations (e.g. updates to a data structure) and an edge from nodes v to v' indicates that the operation corresponding to v' uses the output of the one corresponding to v.

For example, if we have this set of operations:

let a = push 0 newEmpty
let b = push 1 a
let c = pop b
let d = push 2 b
let e = append c d
let f = pop c
let g = push 3 d 

The corresponding execution graph is:

Execution Graph

The logical history of an operation v is the set of all operations it depends on (directly or indirectly, and including itself). Equivalently, in terms of the DAG, it’s the set of nodes that have a directed path to v.

A logical future of an operation v is any directed path from v to a terminal node.

A framework for analyzing lazy evaluated data structures

We need to introduce a few more concepts and terminology. Note: we’ll use suspension and lazy operation interchangeably, which can be evaluated or forced.

The unshared cost of an operation is the time it would take to execute it if it had already been performed and memoized before, so if the operation involves any expression that is lazy, that expression would be O(1).

The shared cost of an operation is the time it would take it to execute (force) all the suspensions created (but not evaluated) by the operation.

The complete cost is the sum of the shared and unshared costs. Alternatively, the complete cost of an operation is the time it would take to execute the operation if lazy evaluation was replaced by strict. To see why, first we note that the unshared costs have to be paid regardless of laziness. Since we’re assuming no laziness, the operation has to pay the cost associated with the suspension it creates, which corresponds to the shared costs. Note that under this assumption we wouldn’t need to account for the cost of forcing suspensions created by previous operations because in theory they have already been evaluated.

When talking about a sequence of operations, we can break down the shared costs into two types: realized and unrealized costs. The realized costs are the shared costs from suspensions were actually forced by some operation in the sequence. Example: say that operations A and B are in the sequence and A creates a suspension, and then B forces it. The cost for B to force it is included in the realized cost. The unrealized costs are the shared costs for suspensions that were created but never evaluated within the sequence. The total actual cost of a sequence of operations is the sum of the realized costs and the unshared costs.

Throughout a set of operations, we keep track of the accumulated debt, which starts at 0 at the beginning of the sequence. Whenever an operation is performed, we add its shared cost to it. For each operation, we can decide how much of this debt we want to pay. When the debt of a suspension is paid off, we can force it. The amortized cost of an operation is its unshared cost plus the amount of debt it paid (note that it does not include the realized cost). Note that as long as we always pay the cost of a suspension before it’s forced, the amortized cost will be an upper bound on the actual cost.

This framework simplifies the analysis for the case when a suspension is used more than once by assuming that its debt was paid off within the logical history of when it was forced, so we can always analyze a sequence of operations and don’t worry about branching. This might cause the debt being paid multiple times, but it simplifies the analysis.

The author uses the term discharge debit as synonym of pay off debt. I find the latter term easier to grasp, so I’ll stick with it throughout this post.

Let’s introduce an example first and then proceed with the explanation of the Physicist’s method and the corresponding analysis of the example.

The Stream Queue

To allow efficient operations on a queue in the presence of persistence, we can make some of the operations lazy. Recall in a previous post we defined a queue using two lists. To avoid immediate computation, a natural replacement for lists is using its lazy version, the stream data structure, which we also talked about in a previous post.

For the list-based queue, the invariant was that if the front list is empty, then the rear list must be empty as well. For the stream queue, we have a tighter constraint: ‘front’ must be always greater or equal than ‘rear’. This constraint is necessary for the analysis.

The definition of the stream queue is the following:

We store the lengths of the streams explicitly for efficiency.

We’ll be using the Stream developed in the previous chapter, so we’ll refer to the module Stream2 to avoid ambiguity with the standard Stream module.

Inserting an element at the end of the queue is straightforward, since the rear stream represents the end of the queue and is reversed:

The problem is that inserting at rear can cause the invariant of the queue to be violated. check() changes the structure so to conform to the invariant by potentially reversing rear and concatenating with front:

Removing an element from the queue requires us to evaluate the first element of the front stream. Again, the invariant can be violated in this case so we need to invoke check() again:

The complete code for the stream queue is on Github.

Analysis using the Banker’s Method

The idea of the Banker’s Method is basically define an invariant for the accumulated debt and a strategy for paying it off (that is, decide how much debt each operation pays off). Then we show that whenever we need to force a suspension, the invariant guarantees that the accumulated debt has been paid off. One property of the Banker’s method is that it allows associating the debt to specific locations of the data structure. This is particularly interesting for streams, because it contains multiple (nested) suspensions, so we might force parts of this structure before we paid the debt associated with the entire structure.

By inspection, we can see that the unshared cost of both push and pop are O(1). It’s obvious in the case of push, and in the case of pop, in theory check could take O(m) where m is the size of the queue, but since Stream2.concat() and Stream2.reverse() are both lazy, and hence memoized, they are not included in the unshared costs.

To show that the amortized cost of both operations is O(1), we can show that paying off O(1) debt at each operation is enough to pay for the suspension before it is forced. For the queue, we also need to associate the debt with parts of the data structure, so that we could force the suspension of only some parts of it (for example, on the stream we can evaluate only the head, not necessarily the entire structure).

We now define an invariant that should be respected by all operations. Let d(i) be the debt at the i-th node on the front stream, and D(i) = \sum_{j=0}^{i} d(j) the accumulated debt up to node i. The invariant is:

D(i) \le \min(2i, |f| - |r|)

This constraint allows us evaluating the head at any time, because D(0) = 0, which means its debt has been paid off. The second term in min(), guarantees that if |f| = |r| the entire stream can be evaluated, because D(i) = 0 for all i.

The author then proves that by paying off one debt in push() and two debt units in pop() is enough to keep the debt under the constraint.

Queue with suspended lists

Because the Physicist’s method cannot assign costs to specific parts of the data structure, it doesn’t matter if the structure can be partially forced (like streams) or if it’s monolithic. With that in mind, we can come up with a simpler implementation of the queue by working with suspended lists instead of streams. Only the front list has to be suspended because the cost we want to avoid, the reversal of the back list and concatenation to the front list, happens on the front list.

On the other hand, we don’t want to evaluate the front list when we perform a peek or pop, so we keep a evaluated version of the front list too.

The signature of the structure is as follows:

As mentioned in the code above, the invariants we want to enforce is that the front list is never smaller than the rear list

and that the evaluated version of the front list is never empty if the lazy version still has some elements.

The push and pop operations are similar to the other versions of queue, but since we mutate the structure, we might need to adjust it to conform to the invariants:

Finally, because of our second invariant, peeking at the queue is straightforward:

The complete code for the suspended queue is on Github.

Analysis using the Physicist’s Method

We’ve seen the Physicist’s Method in a previous post when we’re ignore the persistence of the data structures. We adapt the method to work with debits instead of credits. To avoid confusion, we’ll use \Psi to represent the potential function. \Psi(i), represents the accumulated debt of the structure at step i. At each operation we may decide to pay off some debit, which will be then included in the amortized cost. We have that \Psi(i) - \Psi(i - 1) is the increase in debt after operation i. Remember that the shared cost of an operation corresponds to the increase in debt if we don’t pay any of the debt. Thus, we can find out how much debt was paid off then by s_i - \Psi(i) - \Psi(i - 1), where s(i) is the shared costs of operation i. Let u(i) and c(i) be the unshared and complete costs of the operation i. Given that, by definition, c(i) = u(i) + s(i), we can then express the amortized cost as:

a_i = cc_i - (\Psi(i) - \Psi(i - 1))

To analyze the suspended queue we need to assign values to the potentials such that by the time we need to evaluate a suspension the potential on the structure is 0 (that is, the debt has been paid off). For the suspended queues we’ll use the following potential function:

\Psi(q) = \min(2|w|, |f| - |r|)

Where w is the forcedFront, f is lazyFront and r is rear.

We now claim that the amortized cost of push is at most 2. If we push and element that doesn’t cause a rotation (i.e. doesn’t violate |f| \ge |r|), then |r| increases by 1, and the potential decreases by 1. No shared is incurred and the unshared cost, inserting an element at the beginning of rear is 1, hence the amortized cost for this case is 1 – (-1) = 2. If it does cause a rotation, then it must be that after the insertion |f| = m and |r| = m + 1. After the rotation we have |f| = 2*m + 1 and |r| = 0, but w hasn’t changed and cannot be larger than the original f, so the potential function is at most 2*m. The reversal of r costs m + 1 and concatenating to a list of size x costs x (discussed previously), plus the cost of initially appending an element to read, so the unshared cost is 2*m + 2. No suspensions were created, so the amortized cost is given by 2*m + 2 - (2*m) = 2.

Our next claim is that the amortized cost of pop is at most 4. Again, if pop doesn’t cause a rotation, |w| decreases by 1, so the potential is reduced by 2. The unshared cost is 1, removing an element from w, and the shared cost, 1 comes from the suspension that lazily removes the head of lazyFront. The amortized cost is 2 – (-2) = 4. Note that if w = 0, we’ll evaluate f, but the ideas is that it has been paid off already. If the pop operation causes a rotation, then the analysis is similar to the push case, except that the complete cost is must account for the shared cost of lazily reming the head of lazyFront, so it’s 2*m + 3, for an amortized cost of 3.

Note that when the suspensions are evaluated, the potential is 0, either when |f| = |r| or |w| = 0.

Conclusion

In this post we covered a simple data structure, the queue, and modified it to be lazy evaluated and can show, with theory, that it allows for efficient amortized costs. The math for proving the costs is not complicated. The hardest part for me to grasp is to get the intuition of how the analysis works. The analogy with debt is very useful.

Meta: Since wordpress code plugin doesn’t support syntax highlighting for OCaml, I’m experimenting with the Gist plugin. Other advantages is that Gists allow comments and has version control!

Amortization Analysis

In Chapter 5 of Purely Functional Data Structures, Okasaki introduces the amortization analysis as preparation for the complexity analysis of functional data structures.

The idea of amortized analysis is to prove the complexity of an operation when the average cost per operation is less than the worst case cost of each individual operation.

In this post we’ll perform analysis of 2 data structures and show that their amortized running time is much better than their worst case. There are 2 methods presented in the book, the Banker’s Method and the Physicist Method. I found the Physicist easier to work with, so I’ll only use that. Also, I find it easy to understand concepts with an examples, so we’ll first introduce the data structure and then introduce the analysis.

Queue with lists

An easy way to implement the queue data structure is by using two lists, one representing the beginning (left) and the other the end (right). Inserting/removing an element from the beginning of a list in OCaml is O(1), while doing it at the end is O(n). For this reason, we store the end of the queue reversed in the right list. (In imperative languages the opposite is true, insertions/removals at the end are O(1) and at the beginning O(n), so the left array that is reversed).

x

Queue using lists representing [1, 2, 3, 4, 5, 6]

Insertions are performed by appending an element to the beginning of the right array. Removing from the front of the queue requires removing the first element of the left array. When the left array is empty, we need to reverse the contents of the right array and move it to the left.

We work with an invariant, in which the left array is only empty if the right array is empty as well. That requires a slight change to the removal, where we perform the reversal and move of the right array when the left array has only one element and is about to become empty.

I’ve implemented this idea in Ocaml.

Analysis – The Physicist Method

If we only look at the worst case, we will get that removing from a queue is an O(n) operation. We can improve this by looking at the amortized cost. One way to perform this analysis is via the physicist method, in which we define a potential (or cost) function \Phi(d) for the data structure d. In the queue case, we can define it to be the size of the right list.

The amortized cost of step i, a_i is defined by

a_i = t_i + \Phi(d_i) - \Phi(d_{i-1})

Where t_i is the actual cost (in number of operations)

The intuition is that some costly operations such as reversing the list, which takes O(n) operations, can only happen after O(n) insertions happened, so the average cost per operation is O(1). Translating it to the potential analogy, each insersion operation increases the potential of the data structure by 1, so by the time a revert happens, the right list is emptied and that causes a loss of potential of n, which had been accumulated and that offsets the cost of the operation t_i.

We can get the accumulated amortized cost of the operations up to a specific step by summing them, and we’ll get

A_i = T_i + \Phi(d_i) - \Phi(d_0)

If we choose \Phi(d) such that \Phi(d_i) \ge \Phi(d_0), for all i, then A_i is an upper bound of T_i. Since we assume d_0 is the empty queue, \Phi(d_0) = 0 and we have \Phi(d_i) \ge 0, this property is satisfied.

One important remark is that this analysis is only applicable to non-persistent data structures, which is not the case in OCaml. We present the OCaml code as exercise, but the correct analysis will done in later chapters.

Splay Trees

Splay Tree is a binary search tree. The insertion of an element e consists in spliting the current tree in 2 subtrees, one of all elements less or equal than e, which we’ll call small and another will all elements larger than e, which we’ll call big. We then make e the root of tree and small the left subtree and big the right subtree.

Partition

Partition

The partition operation is O(height of tree). The intuition is, when looking for the small subtree, imagine we’re analysing a node x. If x >= e, then we know all elements in the left subtree are also >= e, so are all the elements in the right subtree, so we can discard it and only look at the left subtree. The same idea applies to looking for the big subtree.

In an ideal scenario, the tree will be well balanced in such a way that the height of the tree is O(log n), and so the insertion operation. But that’s not always the case, so we introduce a balancing strategy. Whenever we follow 2 lefts in a row when searching for the big subtree, we perform a right rotation. Similarly, when following 2 rights in a row when searching for the small subtree, we perform a left rotation.

These re-balancing are enough to make partition and also insertion an O(log n) amortized time, as we shall see in the next section.

Rotation

Rotation

The operation for finding the minimum element is trivial. To delete the minimum element, we also perform rotations. Since we only traverse the left paths, we only have to do a left rotation.

I’ve implemented these ideas in OCaml.

Amortized Analysis of Splay Trees

We first define a potential function. Given a tree t and a node x \in t, let t(x) be the subtree of t rooted in x. Then \phi(t(x)) = \log(|t(x)| + 1) be a potential function where |t(x)| the number of nodes of the subtree of which x is root, and let the potential of a tree t, \Phi(t), be the sum of the sub-potentials of all nodes of t, that is,

(1) \Phi(t) = \sum_{x \in t} \phi(t(x)).

The amortized equation of calling partition() is the actual cost of the partition, plus the potential of the resulting trees (small and big) minus the potential of the original tree. Let’s denote small(t) as s, and big(t) as b.

(2) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

We can prove that \mathcal{A}(t) \le 1 + 2 \log(|t|) (see appendix), which means the amortized cost of the partition operation is O(\log n). It implies that insertion also has an amortized cost of O(\log n).

The same idea can be used to prove the amortized costs of the deletion of the minimum element. Okasaki claims that splay trees are one of the fastest implementations for heaps when persistence is not needed.

Conclusion

I found amortized analysis very hard to grasp. My feeling is that the amortized analysis is too abstracted from the real problem, making it not very intuitive. It’s also not clear how one picks the potential function and whether there’s a choice of potential that might result in a better than the “actual” amortized cost.

Appendix

As we can see in code, there are four cases to consider: left-left, left-right, right-left and right-right. The left-* and right-* are symmetric, so let’s focus on the left-left and left-right cases. For the left-left case, suppose we perform the following partition:

Left-left case of partition()

Left-left case of partition()

We start with (2)

(3) \mathcal{A}(t) = \mathcal{T}(t) + \Phi(s) + \Phi(b) - \Phi(t)

Since = \mathcal{T} corresponds to the number of recursive calls to partition, \mathcal{T}(t) = 1 + \mathcal{T}(u), so

(4) = 1 + \mathcal{T}(u) + \Phi(s) + \Phi(b) - \Phi(t)

The recursive call to u yields (s’, b’), so we have the equivalence, \mathcal{A}(u) = \mathcal{T}(u) + \Phi(s') + \Phi(b') - \Phi(u),

(5) = 1 + \mathcal{A}(u) - \Phi(s') - \Phi(b') + \Phi(u) + \Phi(s) + \Phi(b) - \Phi(t)

Since, s = s’,

(6) = 1 + \mathcal{A}(u) - \Phi(b') + \Phi(u) + \Phi(b) - \Phi(t)

and \Phi(b') - \Phi(b) = \phi(b(x)) + \phi(b(y)) + \Phi(c) + \Phi(d), and \Phi(t) - \Phi(u) = \phi(t(x)) + \phi(t(y)) + \Phi(c) + \Phi(d), we can cancel out some terms and simplify the equation above to

(7) = 1 + \mathcal{A}(u) - \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

by induction, \mathcal{A}(u) \le 1 + 2 \phi(u),

(8) \le 2 + 2\phi(u) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

Since u is a subtree of t(y), \phi(u) < \phi(t(y)) and since the nodes in b is a subset of t, \phi(b(y)) \le \phi(t(x))

(9) < 2 + \phi(u) + \phi(b(x))

It’s possible to show that, given x, y, z such that y + z \le x, then 1 + \log y + \log z < 2 \log x. We have that |t| = |u| + |c| + |d| + 2, while |b(x)| = |c| + |d| + 1, thus, |u| + |b(x)| < |t| and by the relation above, 1 + \log (|u|) + \log (|b(x)|) + 1 < 2 \log (|t|) + 1, \phi(u) + \phi(b(x)) + 1 < 2 \phi(t), which yields

(10) \mathcal{A}(t) < 1 + 2 \phi(t).

It’s very subtle, but in this last step we used the assumption of the rotation, in particular in |b(x)| = |c| + |d| + 1, if we hadn’t performed the rotation, |b(x)| = |b| + |c| + |d| + 2, and |u| + |b(x)| < |t| wouldn’t hold!

For the left-right case, we can do a similar analysis, except that now we’ll apply the recursion to c, and the resulting partition will look different, as depicted below:

Left-right case of partition()

Left-right case of partition()

In this case, (8) will become

(11) \le 2 + 2\phi(c) + \phi(b(x)) + \phi(b(y)) - \phi(t(x)) - \phi(t(y))

We then use that \phi(c) < \phi(t(y)) and \phi(t(y)) < \phi(t(x)) this will allow us to cancel out some terms to

(12) < 2 + \phi(b(x)) + \phi(b(y))

Similar to what we've done before, we can show that |b(x)| + |b(y)| = |s| + |b| = |t| since the partitions of |t| are mutually exclusive. We can still use the result from the other case and get (10).

References

[1] Wikipedia – Amortized Analysis

Streams and Lazy Evaluation in OCaml

In this short post we’ll discuss lazy evaluation in OCaml and study a data structure called Stream. It’s based mainly on chapter 4 of Purely Functional Data Structures and it’s part of a series of study notes on that book.

Lazy Evaluation

Lazy evaluation is a property in which an expression is not evaluated immediately (suspended) and when it’s evaluated the first time, the subsequent calls are cached (memoized). Functional languages like Haskell are lazy evaluated, but not OCaml, which is eagerly evaluated. Because the results are memoized, expressions that are lazily evaluated must always return the same value given the same inputs. In Haskell it’s easy to enforce because functions are pure, that is, they do not rely on side effects.

Lazy?

Lazy? Source: Flickr – Brian Gratwicke

In the book the author defines a notation for lazy evaluation:

datatype a susp = $ of a

In OCaml, we can work with lazily evaluated expressions through the Lazy module. The definition of a suspension is similar:

type 'a t = 'a lazy_t

and we can use the lazy construct. Let’s define a simple expensive function, a naive Fibonacci, which runs at O(2^n) time:

let rec fibo n =
  if n <= 1 then 1
  else (fibo (n - 1)) + (fibo (n - 2))
;;

We can create a lazy evaluated version of it:

let lazy_fibo n = lazy (fibo n);;

We can see that by assigning it to a variable, it doesn’t cause the function to be executed:

let r = lazy_fibo 42;;

The author defines a matching operator ($) that causes a lazy expression to be evaluated, but I couldn’t find a corresponding operator in OCaml. Nevertheless, the Lazy module has the force() function, which does exactly that:

Lazy.force r;; // It might take a while!

Note that if we execute the same expression again, the second time it returns much faster, because of the memoization.

We are now ready to introduce the stream data structure.

Streams

A stream is a lazy version of a linked list. Recall that a linked list is composed of nodes which point to the next node, or equivalently, to the remaining of the list. The usual definition of a linked list is:

type 'a node = Nil | Node of 'a * 'a node

If we want to be explicit that a node is actually pointing to a sublist, we could use an intermediate type, listType:

type 'a node = Nil | Node of 'a * 'a list
and 'a list = 'a node

Note that node and list are mutually recursive (they depend on each other), so we have to define them together by using the and construct.

In a stream the pointer to the remaining of the list is lazily evaluated, so the type is:

type 'a streamCell = Nil | StreamCell of 'a * 'a stream
and
'a stream = ('a streamCell) Lazy.t

With this basic structure we can implement many of the list functions for streams.

Concatenation

Let’s start with the concat operator (++):

let rec (++) (streamA: 'a stream) (streamB: 'a stream): ('a stream) =
  let computedStreamA = Lazy.force streamA in
  match computedStreamA with
    | Nil -> streamB
    | StreamCell (elem, rest) -> lazy (StreamCell (elem, rest ++ streamB))
;;

Note that it never evaluates streamB and it only evaluates the first cell of streamA.

To help us testing, we can define function to convert from a list:

let rec fromList (l: 'a list): ('a stream) = match l with
  | [] -> lazy Nil
  | x :: xs -> lazy (StreamCell (x, fromList xs))
;;

and a function that forces the evaluation of the entire stream, essentially converting it back to a list:

let rec toList (stream: 'a stream): ('a list) =
  let computedStream = Lazy.force stream in
  match computedStream with
    | Nil -> []
    | StreamCell (elem, rest) -> elem :: (toList rest)
;;

Take

The take(n) function returns the first n elements from a stream. Like the concat function, only the first node of the stream is evaluated. The recursive call is suspended.

let rec take (n: int) (stream: 'a stream) : ('a stream) =
  if n == 0 then lazy Nil
  else
    let computedStream = Lazy.force stream in
    match computedStream with
      | Nil -> lazy Nil
      | StreamCell (elem, rest) -> lazy (StreamCell (elem, (take (n - 1) rest)))
;;

Drop

The drop(n) function removes the first n elements from a stream and returns the result. In this case, we need to evaluate all the n recursive calls:

let rec drop (n: int) (stream: 'a stream): ('a stream) =
  if n == 0 then stream
  else
    let computedStream = Lazy.force stream in
    match computedStream with
      | Nil -> lazy Nil
      | StreamCell (_, rest) -> drop (n - 1) rest
;;

take and drop look very similar but one is lazy while the other is not. That’s because the head of the stream is not suspended, but the tail is. In the drop case we need to find the (n+1)-th element that will be the new head of the stream. In the take case, we’re not changing the head, and since the tail is suspended, it can wait.

Reverse

The reverse function reverts the order the elements in a stream. In this case it’s more obvious that since we’re changing the location of the head, it must be eagerly evaluated.

let reverse (stream: 'a stream): ('a stream) =
  let rec reverse' = fun oldStream newStream ->
    let computedStream = Lazy.force oldStream in
    match computedStream with
      | Nil -> newStream
      | StreamCell (elem, rest) -> reverse' rest  (lazy (StreamCell (elem, newStream)))
  in reverse' stream (lazy Nil)
;;

Conclusion

In this post we saw that OCaml is not lazy evaluated but we can rely on the Lazy module to accomplish that. We also learned a new data structure, stream, which is recursively lazily evaluated and operations like concat and take play well with laziness, while other like drop and reverse do not.

The full implementation with comments is available on github.

References

[1] OCaml Module Lazy
[2] cyocum – Mutually Recursive Types
[3] Implementing lazy from scratch in OCaml

Persistent Data Structures

Book

In this post we’ll discuss some basic data structures in functional programming and their implementation in OCaml. Our main reference are the first two chapters of Purely Functional Data Structures by Chris Okasaki.

The first chapter explains persistent data structures and the second chapter provides examples of three common data structures in imperative programming and how to implement them in ML (a functional language).

Persistent Data Structures

The core idea behind designing data structures in functional languages is that they’re immutable, that is, if you want to modify it, a new copy has to be made.

If we want to modify a data structure in the naive way, we would need to clone the entire structure with the modifications. Instead, this is done in a smart way so different versions of the data structure share them. Two examples are given to illustrate this idea: lists and binary trees.

Immutable Linked Lists

Lists are a central native data structure in functional languages. In Ocaml, they’re implemented as linked list. From our data structure class, we know that inserting an element at the beginning of a list is an O(1) operation, but this requires modifying pointers in the data structure. As we said data structures are immutable, so we need to make a copy to perform this operation.

Note that we only need to make the new node point to the beginning of the original list, which means we don’t have to clone this list, we just reuse it.

Now, if we’re inserting in the middle of the list will require copying the existing nodes until we reach our newly inserted element, which we can then point to the remaining of the original list. Consider the example below, where we insert the element 4 in the second position of the list.

Inserting a new element (4) in a linked list

Inserting a new element (4) in a linked list

Note that inserting an element at the beginning of a linked list is much more efficient than at the end, both in terms of complexity and space. If we were to insert it at the end, we would require full copies and insertion would be O(length of the list). As an experiment, we can write a function that generates a list representing a range of integers in two ways.

In example 1, we create the list by inserting an element at the beginning of the array,

let rec list_range_fast start_range end_range =
  if start_range == end_range then []
  else start_range :: (list_range_fast (start_range + 1) end_range)
;;

In example 2 we do it at the end:

let rec list_range_slow start_range end_range =
  if start_range == end_range then []
  else (list_range_slow start_range (end_range - 1)) @ [end_range]
;;

If I run the slow version with start_range = 0 and end_range = 50000, it takes over a minute to run on my computer, while the fast version runs in a few milliseconds.

Immutable Binary Trees

Binary trees are a generalized version of a linked list and it works the same way. The key point to notice is that when inserting an element in a (binary) tree only the nodes in the path from the root to the inserted node need to have their pointers modified, so we need to clone a number of nodes that are at most the height of the tree. See the example below:

Inserting an element in a binary tree

Inserting an element in a binary tree. Green nodes are the new ones

With this basic concept understood for simple constructs like linked lists and binary trees, we can construct basic data structures on top of them: heaps and balanced binary trees.

Leftist Heaps

A (minimum) leftist heap is a binary tree with the following properties:

1. The value of a node is smaller or equal the value of its children.

Let the spine of a node be the path defined by following the right children until a leaf is found (i.e. the rightmost path), and the rank of a node the length of its spine.

2. The rank of the left child is always greater or equal than the right child.

It’s possible to show that the rank of a heap of n nodes is less or equal than O(floor(log n + 1)).

Insertion. Before talking about inserting an element in a heap, let’s define the more general merge() function, which merges two leftist heaps, A and B, into one. We compare the root of each heap, say A.val and B.val. If A.val <= B.val, we make A.val the root of the new heap, A.left the left of the new heap and the right of the new heap will be the merge of A.right and B. If A.val >= B.val we do an analogous operation.

Note that the result of the merge might violate property 2, since we’re adding a new node to the right subtree. This is easy to fix: we just to swap the left and right subtree when coming back from the recursion. This is what makeTree() does.

Note that we always perform the merges of the right, which means that the number of such merges will be proportional to the largest rank the original heaps, which implies we can do it in O(log(A.size + B.size)).

In the context of immutable structures, this implementation of heap is efficient because the only nodes whose pointers we need to mutate (even when a swap is needed) are on the spine of the trees, so it only needs to create O(log n) new nodes.

The insertion process consists in creating a heap with one element and merging into the target heap.

Returning the minimum element is trivial: we just need to return the element at the root. Removing the minimum is easy as well: just merge the left and right subtree using merge().

My implementation for the leftist heap is on github.

Binomial Heaps

Binomial heaps are an alternative heap implementation. Before defining a binomial heap, let’s introduce the binomial tree. A binomial tree is a can be defined recursively based on a property called rank. A single node is a binomial tree of rank 0. A tree of rank r > 0, is formed by combining two trees of rank r-1 making one tree the leftmost child of the other. We denote this operation as linking.

Binomial Heap

Examples of binomial trees of different ranks

A binomial heap is a list of binomial trees ordered by increasing rank, with no trees with the same rank.

Insertion. Before talking about insertion of an element into a heap, let’s see how to insert a binomial tree of rank r into the heap. To keep the order of the rank of the tree list, we need to traverse through it to find the right place to insert it. Since we can’t have two trees with the same rank, if we run into this case we need to merge those two trees into one with rank r + 1. This can cascade so we might need to repeat this process with the new tree.

Linking a tree is a direct application of its definition. We just need to decide which of the trees will become child of the other. Since we want a minimum heap, we need to guarantee that the minimum element is always on top, so we always make the tree with smallest root to be on top.

Linking is a constant time operation, since we only need to update the pointer of the root of the top tree, which also means we only need to clone the root node to generate a new immutable binomial tree.

For the complexity of traversing the heap list, we can first node that a heap of rank r has 2^r nodes. Which means that a heap of n elements has at most log(n) trees. In fact, a neat way to represent a binomial heap of size n is by its binary representation. For example, 10(dec) = 1010(bin), so for a heap of size 10, we should have a list of trees with ranks 1 and 3. This shows that we can traverse a list in O(log n) and in the worst case we might need to clone this many number of nodes.

Returning the minimum element requires us to scan the list of trees, because even though we know the minimum element is the root of a tree, we don’t know which tree it is, so this operation is O(log n). For removing this element, we can define an auxiliary function, removeMinTree(), which removes the tree with the minimum element from the tree list. Since we only want to remove the root of this tree, we need to re-insert the subtrees back to the heap.

One key observation is that, in a binomial tree of rank r, the children of the root are also binomial trees of ranks from 0 to r-1, which form a binomial heap. We can then define a merge() function that merges two heaps using an idea similar to a merge sort. If we refer back to the analogy of the binary representation of the heap size, the merge operation is analogous to adding two binary numbers!

My implementation for the binomial heap is on github.

Red Black Trees

A red-black tree is a binary seach tree in which every node is either Red or Black. It respects the following invariants:

1. No red node has a red child
2. Every path from the root to an empty node has the same number of black nodes

Property. the height of a red-black tree of n nodes is at most 2*floor(log(n + 1))

Proof sketch. If the tree has a path to an empty node of length greater than 2*floor(log(n + 1)), this must have more than floor(log(n + 1)) black nodes in the path because we cannot have two consecutive red nodes (invariant 1). Now, by removing all the red nodes, we must have a complete tree of height >= floor(log(n) + 1) + 1, which means 2^(floor(log(n + 1)) + 1) - 1 which is greater than n.

Membership. Since a Red-Black tree is a binary search tree, search can be done in O(height of the tree) which is O(log n) by the property above.

Insertion. inserting an element in a Red Black tree is similar to inserting it into a binary search tree. The challenge is that it may violate one of the invariants. To avoid that we must perform re-balancing along the path we follow to insert the node in the tree.

We always color the inserted node Red. This doesn’t violate the Black nodes constraint, but it might violate the Red nodes one, in case its parent is also Red. If that’s the grandparent of the inserted is necessarily Black. We now have 4 possible scenarios, depicted at the top, right, bottom and left trees:

Unbalanced Red-Black trees and the result of the balancing operation

Unbalanced Red-Black trees and the result of the balancing operation

We assume that the subtrees a, b, c and d are balanced. For all these 4 cases we can perform a “rotation” to achieve the tree at the center.

The Black nodes constraint is not violated, because the path from the root still has the same number of Black nodes as before, and we fixed the consecutive Reds. Notice that y might violate the Red node constraints with its parent, so we need to do it recursively.

In terms of complexity, the insertion can be done in O(log n) and rebalancing takes a constant amount of time. Regarding the immutable structure, notice we only need to change nodes from the insertion path, so only O(log n) nodes have to be cloned.

My implementation for the Red Black tree in Ocaml is on github.

Conclusion

The first two chapters from this book are really interesting. I have seen the binomial heap and Red-Black trees before in data structure classes and also implemented some data structures such as AVL trees in functional programming in the past (Lisp) during college.

I wasn’t aware of the immutability of data in functional programming until much later, when I was learning Haskell. Okasaki first introduces this concept in Chapter 2, so it allow us keeping that in mind when studying the implementation of functional data structures.

He doesn’t make it explicit on Chapter 3 that the data structures presented are efficient in terms of extra memory necessary to clone them, but they are easy to see.

The ML syntax is very similar to OCaml, but it was a good exercise implementing the code on my own. I tried making it more readable with comments and longer variable names. This also led me to learn a few OCaml constructs and libraries, including:

* How to perform assertions (See the binomial heap implementation)
* Unit testing (using OUnit2 library)
* The “or matching” pattern (see the balance() function in red_black_tree.ml)