CSS Layout

Introduction

In a previous post we studied some major components of the browser, including the rendering engine. In this post we’ll dig a bit further on the layout of the render tree and take a look at an important piece of this process: the cascading style sheets or CSS.

To recap, every DOM element is usually represented by one rectangle. The job of the rendering engine is to determine two properties of these rectangles: their size (height, width), position (top, left) and stacking order when they overlap. In this post we’ll see how different CSS properties can affect the resulting rectangle.

Note: we’ll use box and rectangle interchangeably in this post.

CSS

History

hakon

CSS stands for cascading style sheets. It was proposed by Håkon Wium Lie in 1994.

The CSS 1 specification was finished in 1996. CSS 2 was created to address some issues with the previous version in 1997. CSS 3 was started in 1998 but hasn’t been concluded yet! The thing is that CSS 3 is subdivided in modules and each is fairly independent of each other, which resulted in different modules having different phases.

The diagram below provides a good overview on the different modules from CSS and their stage:

css-modules-status

CSS Modules and levels

Wikipedia has an interesting history of the development of CSS, including the initial lack of compliance to the specification which caused a lot of headaches to front-end developers, especially when working with early versions of the Internet Explorer.

It’s interesting to take a look at the early days of CSS because it plays a big role in how it looks today (due to back-compatibility). It’s useful to remember that back in the days when CSS first came around, web pages were generally pure HTML containing mostly text, so a lot of CSS was designed around concepts such as paragraphs and simple images.

In this post we’ll focus on the CSS modules that affect the layout of the DOM elements, in particular the Visual formatting model. According to the spec, these are factors that influence the layout of a box:

* Box dimensions (height and width properties)
* Type of box (display property)
* Positional schemes (position, left and top properties)
* Relationship between elements (hierarchy of DOM tree)
* External information (e.g. window size)

We’ll briefly cover some of these properties and then play with a few examples.

The display property

There are many possible values for the display property. The main ones are none, block, inline and inline-block. I’ve been using flex increasingly but that deserves a post in itself (this article is a great reference).

Display none removes the element from the layout calculation so it’s effectively invisible.

A value of block causes the element to be visually formatted as a block [5] (other values like list-item and table do to). In general a block is a box that starts at a new line and takes the entire width of the parent.

An inline box on the other hand starts from the left of the previous box. It also ignores explicit width and height values and any explicit vertical spacing (i.e.
top/bottom of margin/padding).

The main difference between inline and inline-box is that the latter does account for width and height, and vertical spacing [6].

The position property

There are 5 possible values for the position property: static, relative, absolute, fixed, sticky.

static is the default positioning schema and it follows the normal positioning flow of the page.

A relative positioned element accounts for top and left properties. These are in relation to the parent.

An absolute positioned element is similar to a relative, except that it is removed from the normal layout flow (i.e. other elements ignore its existence when being positioned) and its top and left are in relation to the first positioned ancestor in the DOM tree (or the document if none is). A positioned element is any with position != static.

Here is an example where we only change the inner (red) div to static, relative and absolute, respectively.

Screen Shot 2018-01-13 at 4.25.01 PM

In the first example, A ignores the top/left properties. In the third example is “crosses” the boundary of the middle (blue) box because it’s not positioned.

Note that in the code above we have set overflow to auto. This is a hack to prevent margin-collapsing.

An element with position:fixed is similar to position:absolute, except that instead of having its offset relative to an ancestor with position:relative, usually it’s relative to the viewport. That implies that such element scrolls with the page. The special case happens when one of its ancestors has transform, perspective, or filter set to something other than none, in which case it behaves much like an absolute positioned element.

Finally, as described in [2] an element with position:sticky is treated as relatively positioned until it crosses a specified threshold, at which point it is treated as fixed until it reaches the boundary of its parent.

The float property

When a element is floated, it is taken out of the normal layout flow of the document. It is shifted to the left (assuming a float:left) until it touches the edge of its containing box, or another floated element.

Because float implies the use of the block layout, it modifies the computed value of most of the display values to block.

Floats are better understood in relationship with other types of boxes. Let’s check some examples.

Float + Block

Whether the renderer accounts for a floated element when positioning a new block depends on the block formatting context. For example:

Screen Shot 2018-01-12 at 10.20.07 AM

The green block ignores the presence of the blue block but its child (yellow) does not. That’s because blue and yellow are in different block formatting contexts.

Float + Inline

Analogous to a block context, an inline formatting context is a set of inline elements. In such context, the boxes are laid out horizontally. Each “row” of inline elements is called a line box. In the presence of floats, the spec states the following:

In general, the left edge of a line box touches the left edge of its containing block and the right edge touches the right edge of its containing block. However, floating boxes may come between the containing block edge and the line box edge. Thus, although line boxes in the same inline formatting context generally have the same width (that of the containing block), they may vary in width if available horizontal space is reduced due to floats.

We can see an example of that in here. In this example the first three line boxes have a shorter width than the fourth one due to the presence of the float element.

Screen Shot 2018-01-19 at 11.20.05 PM

Clear

When applied to a floating element, clear moves the margin edge of the element below the margin edge of all relevant floats (depending on whether it’s clear left, right or both). Here’s an example:

Screen Shot 2018-01-15 at 7.49.52 PM

The z-index property

Besides determining the size and position of boxes, the layout engine needs to determine how to handle overlapping. The boxes ordering is transitive, meaning that if a box A is under B, and B is under C, A has to be under C.

The main attribute to control the stack order of elements is the z-index property (in reference of the z-axis, commonly used as the “depth” dimension in 3D). But this number only applies for boxes under the same class. As we’ll see now, it’s more complicated than it seems.

First, we need to define the concept of stacking context. A stacking context encompasses a set of DOM elements which can be compared to each other. The order of the stack context always take precedence over individual orders of elements within a stack context. For example, imagine that we have 2 stacking contexts:

Stack context 1: [A -> B -> C -> D]
Stack context 2: [E -> F -> G]
Stack context 1 -> Stack context 2

The arrow (->) represents that the element on the left is on top of the element on the right. Because A, B, C and D belong to context 1, all of them are placed over E, F and G, no matter how big the z-index of elements in the second group are.

Another way to see it is that a stacking context defines an “atomic position”, meaning that elements from outside it cannot be placed in between its elements. It has to be either above or below.

In the DOM tree, if a node E satisfies some conditions, it starts a new stack context, which means that all elements in the DOM subtree of that element will be under that stack context (we say E is the root of such stacking context). Note that a stack context can contain other stack contexts. A few properties that cause a stacking context to be formed are:

* position: absolute or relative and z-index other than auto
* position: fixed or sticky
* opacity less than 1.
* others

Note that position: static ignores z-indexes, so a corollary is that the use of z-index effectively creates a new stacking context.

Within a stacking context, the order of elements is defined by the spec. Here’s a simplified version, from bottom to top:

* The background and borders of the element forming the stacking context;
* Stacking contexts with negative z-indexes;
* Non-positioned block boxes;
* Non-positioned floats;
* Non-positioned inlined boxes;
* Stacking contexts with positive z-indexes;

Another corollary is that a parent is always rendered below its children due to the first rule above.

Here is an example with elements of each of these categories.

Screen Shot 2018-01-18 at 8.51.44 PM


Here is an interesting example with stacking contexts.

Screen Shot 2018-01-19 at 8.47.39 PM

The red box is the parent, so it has to be under the green box. The green, purple and black boxes are in the same stacking-context so they are ordered based on the z-indexes (since red doesn’t start a stack context, green belongs to the top-level stack context). Finally, note how pink has the highest z-index, but is still placed under green because it belongs to the stack context of purple, which is placed under green.

References

[1] MDN Web Docs: CSS display
[2] MDN Web Docs: CSS Position
[3] MDN Web Docs: CSS Float
[4] A Complete Guide to Flexbox
[5] MDN Web Docs: Block formatting context
[6] StackOverflow: display: inline vs inline-block
[7] All about floats
[8] CSS 2.1 Specification: Visual formatting model

Conclusion

My main goal with this study was to learn more about the layout algorithm used by render engines but this turned out to be an analysis of some CSS properties. It was an interesting exercise anyway. I work with CSS on a daily basis but often times I don’t take the time to read the spec or understanding why a given set of properties behave in a certain way.

Advertisements

2017 in Review

This is a meta-post to review what happened in 2017.

I focused on finishing reading Purely Functional Data Structures, by Chris Okasaki, which was one of my goals from last year, so it’s no surprise that more than half of my posts were notes on it. In the process, I learned more about OCaml’s features. Speaking of OCaml, I used it to answer a question on subsequences.

I also completed a course on Scala, which was another of my 2017’s resolutions.

I finally took some time to play with my raspberry Pi, to develop a web server and a simple monitoring system.

Other things included learning more about how JavaScript works, about Google’s Spanner database. I’ve also attended a conference, the OpenVis Conf.

Personal

The end of the year is a good time to look back and remember all the things I’ve done besides work and the technical blog.

Trips

As in 2016, 2017 was very busy with trips.

I visited quite a few cities in the Eastern US this year. This included Washington DC, Baltimore, Boston and Chicago. I must say they have better museums than the West coast. My favorite ones were the Smithsonian and the Boston MFA.

cities2

Top: Lincoln Memorial in Washington DC, Downtown Boston. Bottom: Marina City Towers in Chicago, MIT in Cambridge.

I love National Parks and was glad to be able to visit a few of them. This included trips to Sequoia and Kings Canyon National Parks, here in California. In Seattle, I made a quick trip to the Olympic National Park. And more recently, Carlsbad Caverns and Big Bend (New Mexico and Texas).

nparks

Top: Trail in Kings Canyon National Park; Cape Flattery, near Olympic National Park.
Bottom: Santa Elena Canyon in Big Bend National Park; White Sands National Monument

For a long time I wished to visit Japan and I was finally able to make it. I spent 2 weeks there, exploring the main cities like Tokyo, Kyoto, Nara, Osaka, Komatsu, Hiroshima, Himeji.

japan

Top: 1. Tokyo, 2. Temple in Nara, 3. Fushimi Inari-taisha.
Bottom: 4. Itsukushima Shrine, 5. Himeji Castle, 6. Mount Hakusan National Park

Books

I read a lot of fiction this year.

War and Peace took me a long time to get through, but it was a very rewarding read. It’s a mix between history, philosophy and fiction, set during the 1812 invasion or Russia by Napoleon. Slaughterhouse-Five is also about war, the WWII, with a dark humor take.

If on a winter’s night a traveler by Italo Calvino is sort of a meta-book, full of self-references. Pretty interesting read.

Hard-Boiled Wonderland and the End of the World, Strange Weather in Tokyo and Snow Country all have their background in Japan. I enjoyed the idea of reading books related to a country before visiting it.

I’m usually not into science-fiction but I enjoyed reading Red Mars.

books

For non-fiction, Survival of the Sickest was my favorite science book of the year. Homo Deus was also very interesting. It’s from the same author of Homo Sapiens, Yuval Harari. Even though it’s about the future of humanity, I liked his discussions of the present better – especially about humanism.

The Blog in 2017

Like last year, the most popular post was the Introduction to the Parsec Library, with 1.6k visits. From this year, the notes on Google’s Spanner database paper got 250 views. Overall it had 9.5k visitors.

I kept the resolution to post at least once a month. The blog completed 5 years with 66 posts.

Resolutions for 2018

For this year, I plan to learn Rust. I haven’t decided which technical book to read yet. I also want to learn mobile development and read more papers.

Generalized Tries in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes a generalized trie. In this post we’ll see how tries can be seen as maps over a specific type of keys and how we can construct maps over any type of keys.

We discussed tries in OCaml in a previous post. At first glance, it looks like a special data structure but we’ll now see it’s basically a map.

The basic form of a map is a pair of (key, value). In most programming languages the key has to be a scalar (e.g. int, string) or some complex that can be converted to one (think of hashCode() in Java).

For simplicity, let’s assume our scalar is an int and our value is of generic type T. Also, note that the syntax we present below is not valid OCaml (I find OCaml’s type syntax a bit hard to read). A map with int keys can be represented as:

map<int, T>

If we want a two dimensional key, for example (int, int), we can have a map of maps:

map<int, map<int, T>>

and if the dimensions are not fixed, for example, a list of integers, then we can define a recursive structure for our map

map<list<int>, T> = 
  Node of (option<T> * map<int, mapOverList>)

If we think of strings as lists of characters, a trie is then a map where the key is a list of characters. From the type above we can simply change the key of our map above to a list of characters and for a basic trie T is a boolean

map<string, bool> = 
  Node of (bool * map<char, mapOverList>)

Note that option<bool> is redundant, so we can use bool only.

Maps over trees

tree2

We can now generalize the key to more complex recursive types such as trees. Suppose we fave the following type for tree:

tree<int> = (int * tree<int> * tree<int>)

The outermost map is indexed by the root of the tree. The inner maps are indexed by the left and right subtrees respectively:

map<tree<int>, T> =
  map<int, map<tree<int>, map<tree<int>, T>>>

If we expand the key of the second map we get the following:

map<tree<int>, map<tree<int>, T>> =
  map<int, map<tree<int>, map<tree<int>, map<tree<int>, T>>>

It gets pretty involved very quickly but because we traverse these types recursively, the implementation is still simple. Let’s see how to implement these in OCaml. The type of a map over an int tree can be defined as follows:

Note that the outermost map is a regular IntMap, which uses the root element of the map, a scalar, as key.

The search function takes a tree representing the key, and the map. The base case is when the tree is empty, when we’re just past a leaf. The recursive case consists in obtaining the maps in order, first using the root element, then using the left tree and finally using the right tree:

Note that the recursive call is non-uniform, so we need explicit annotations, as we discussed previously.

The insertion is similar, expect that when we fail to find a key in any of the maps, we need to first insert it with an empty element before recursing.

Because the Map.find() implementation throws exceptions when a key doesn’t exist, we can wrap the call with a try and if an exception occurs, we can insert the missing key (alternatively we could use Map.mem()).

Maps over products and sums

There are two ways a structure can branch out recursively: through combination or through alternative. For a combination, refer to our definition of Node:

Node of 'a * 'a tree * 'a tree

In theory, any combination of values for 'a, 'a tree, 'a tree are possible values for a Node. The * in between the components in fact represent the cartesian product operator. This is also known as product.

For alternative, the type of the tree itself can be either Empty or Node:

type 'a tree = Empty | Node of (...)

In this case, the valid values for a tree is the sum of values of each alternative. Hence, this is also known as sum.

We can generalize the map with keys of any types by looking at their definition. If it’s a product, we end up with nested maps. For example, if

Tk = (Tk1 * Tk2)

then the map over Tk can be defined as

map<Tk, T> = map<Tk1, map<Tk2, T>>

In our example, this came the nested maps 'a trie trie IntMap.t.

For sums, if the type is

Tk = Tk1 | Tk2

the map over Tk would end up as a product of maps:

map<Tk, T> = (map<Tk1, T> * map<Tk2, T>)

In our example, this came the product ('a option) and ('a trie trie IntMap.t). option can be thought as a one-dimensional map.

Conclusion

In this post we saw how a trie can be modeled as a map using the same principles as the ones we use to construct matrices, that is, two-dimensional maps (nested maps). We then generalized the string keys to trees, and implemented the insert/find functions in OCaml. I found it pretty hard to reason about these structures.

We then went a step further and saw how to construct maps based on the key structure. And we learned about product and sum of types when discussing recursive types.

Mutually Recursive Modules in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes a technique called data structure bootstrapping. It’s a way to reuse existing implementation of data structures to construct (bootstrap) new ones.

In one of the examples he creates a new heap implementation with an efficient merge operation using another heap as basis, but it turns out that to implement this, we need to rely on mutually recursive modules, that is, two modules A and B, where A depends on B, and B depends on A.

In this post we’ll study the bootstrapped heap and learn how to implement mutually recursive modules in OCaml.

Heap with efficient merging

Assume we have a heap implementation with O(1) insert, and O(log n) merge, findMin and deleteMin operations. We’ve seen such an implementation with Skewed Binomial Heaps

We’ll see how to construct a new heap implementation which will improve the merge complexity to O(1).

Let’s call the base heap PrimaryHeap and define our heap type as

this type can be either empty or a node with an element (root) and a primary heap whose element is the bootstrapped heap itself, that is, heap and PrimaryHeap.heap form a mutually recursive types. Note that the above is not a valid OCaml code. We’re using it to explain the theoretical concepts.

We can think of this as a k-ary tree where the element is the root and the children of that node are the subtrees, but these subtrees are stored in a heap instead of an array.

The root element at each node is the smallest among all of the subtrees. Hence, to obtain the minimum element for a heap, findMin, is trivial: we can simply return that element:

Merging two bootstrapped heaps is analogous to linking two trees. The only invariant we need to maintain is that the smallest root continues being the root.

Since the primary heap has O(1) insert, the bootstrapped heap has O(1) merge, which was our goal. Note that we can implement insert using merge by creating a singleton node and merging it with an existing heap.

We need to handle the deletion of the minimum element, which is the more involved operation. It consists in discarding the root of the present node, and finding a new root from the primary heap.

Since each element in the primary heap is a bootstrapped heap, we first obtain the bootstrapped heap containing the smallest element:

then we remove this node from the primaryHeap, and we merge the minPrimaryHeap back into primaryHeap.

finally we make newMinElem the new root element of our top level bootstrapped heap. The complete code is

The only missing part is defining the correct type of the bootstrapped heap.

Mutually Recursive Modules

escher

Drawing Hands by M. C. Escher

Okasaki mentions that recursive structures are not supported in Standard ML (at least at the time my copy of the book was printed), but they are supported in OCaml.

To make modules mutually depend on another, we need to mark it as recursive via the rec keyword, and declaring both modules at the same time by using the and connector. Let’s work with a toy example: two modules Even and Odd, where each depend on the other.

This will lead to a compilation error:

Error: Recursive modules require an explicit module type.

We need to write the signatures explicitly:

This blog post from Jane Street describes a way to define mutually recursive modules by only listing its signatures:

The OCaml compiler can infer the implementation part from the type definitions, but unfortunately this won’t work if the module has function definitions, which is the case for our heap implementation.

Things get more complicated in our case because the primary heap implementation uses a functor to generate a heap taking the element’s module as parameter. In this case the element’s module is our bootstrapped heap. A valid module definition is given below:

Let’s understand what is going on.

type t = Empty | Heap of Element.t * PrimaryHeap.heap

is the definition we presented above. We also implement the methods from the Set.OrderedType interface, namely compare, since this is the interface the heap maker expects. The comparison is based solely on the root element.

Then we declare the PrimaryHeap type at the same time, with type IHeapWithMerge, and because tv is unbound in that interface, we need to bind it to BootstrappedElement.t:

PrimaryHeap: IHeapWithMerge with type tv := BootstrappedElement.t

Finally we provide the implementation, using the result of the SkewBinomialHeap() functor having the BootstrappedElement module as element type:

PrimaryHeap (...) = SkewBinomialHeap(BootstrappedElement)

The syntax is pretty involved, but it accomplishes what we wanted. We can further refine this definition by adding

include Set.OrderedType with type t := t

to the BootstrappedElement signature. This includes all the interface of Set.OrderedType.

These newly defined modules are defined within a functor, the BootstrappedHeap, together with the methods we defined above. Like other heap generators, the functor takes a module representing the element type as parameter. In this case we can also allow the primary heap type to be passed as parameter so we don’t have to use SkewBinomialHeap as implementation. Any heap with merge will do.

The constructors define within BootstrappedElement are visible within BootstrappedHeap but they need qualification, such as BootstrappedElement.Heap. To avoid repeating this qualifier, we can use:

include BootstrappedElement

The complete implementation for BootstrappedHeap can be found on github.

Conclusion

The idea of using implementations of a given data structure to yield improve implementations is amazing! The mutual recursion nature of the bootstrap heap got me at first, but making analogies with a k-ary tree made it easier to understand.

I was struggling a lot to get the syntax right for the recursive modules required for this implementation until I stumbled upon this github repository, from which I learned many new things about OCaml.

References

[1] Purely Function Data Structures, Chapter 10 – Chris Okasaki
[2] Jane Street Tech Blog – A trick: recursive modules from recursive signatures
[3] Github yuga/readpfds: bootStrappedHeap.ml

Polymorphic Recursion in OCaml

In Chapter 10 of Purely Functional Data Structures, Okasaki describes recursive types that are non-uniform. In this post we’ll learn more about these types, how to implement them in OCaml and see an example by studying the implementation of Random Access Binary Lists using such a construct.

Uniform recursive type

As an example of uniform recursive data structure we have a simple list

Cons(1, Cons(2, Cons(3, Nil)))

Which has a recursive type, for example

type 'a myList = Nil | Cons of 'a * 'a myList

Each element of the list is either Nil (terminal) or it has a value of a polymorphic type 'a, followed a recursive list also of type 'a.

Non-uniform recursive type

Now, say that the type of the recursive list is not the same as the current list? Then we have a non-uniform polymorphic recursive type, for example:

type 'a seq = Nil | Cons of 'a * ('a * 'a) seq

We’ll name this a sequence. A int seq would have the value in the first node would have type int, but the element from the second node would have type (int, int), the third type ((int, int), (int, int)) and so on. This structure is equivalent to a complete binary tree, where the i-th element of seq represents the i-th level of the tree.

An example of value with this type is:

Cons (1, Cons ((2, 3), Cons (((4, 5), (6, 7)), Nil)))

We need a special syntax to type recursive functions that take recursive non-uniform types, because the type of the function called recursively might be a different polymorphic type than the caller. OCaml by default tries to infer the generic types of the function and bind them to specific instances [2]. For example, in

let f: 'a list -> 'a list = fun x -> 13 :: x

OCaml will assume 'a is int and will compile fine. We can see this by pasting that code in the command line, utop.

utop # let f: 'a list -> 'a list = fun x -> 13 :: x;;
val f : int list -> int list =

The function will then not be polymorphic anymore. To prevent OCaml from auto-binding specific type instances, we can use a special syntax introduced in OCaml 3.12 [3]

utop # let f3: 'a. 'a list -> 'a list = fun x -> 13 :: x;;

This time we’ll get a compilation error:

Error: This definition has type int list -> int list which is less general than ‘a. ‘a list -> ‘a list

The important thing is that this allow us binding the recursive calls with different types. According to the Jane Street Tech Blog [3]:

Note that a polymorphic type annotation holds inside the body of a recursive definition as well as outside, allowing what is known as polymorphic recursion, where a recursive call is made at some non-trivial instantiation of the polymorphic type.

So for example, we can write this function to calculate the size of a sequence:

The problem with this structure is that it can only represent lists of size in the form of 2^k - 1. To work around that, we allow some items to not hold any elements at all, so that each item corresponds to a digit in the binary representation of the size of the list.

type 'a seq = Nil | Zero of ('a * 'a) seq | One of 'a * ('a * 'a) seq

For example, we can now represent a list of 10 elements, as

Zero(One((1, 2), Zero(One(((3, 4), (5, 6)), ((7, 8),(9, 10))), Nil))))

mandelbrot

Sequence binary random access list

We can use a sequence to implement a random access binary access list in a concise way.

Insertion

Inserting an element at the beginning is analogous to incrementing the binary number, that is, starting from the least significant digit, if it’s a zero, we make it one, if it’s one we make it a 0, and carry over a 1, by adding it to the next digit.

The carry over process is simple in this structure. Because the type of an item following an item of type 'a is ('a, 'a), to merge the element to be added with the existing element, we simply make a tuple and pass it to the recursive call.

Head and Tail

Removing or retrieving the first element is analogous to decrementing a binary number. If the digit is one, we make it zero and return the element and the resulting list. If it’s a zero, we make a recursive call to get the next available element. However since the returned element is of type ('a, 'a), and our return type is 'a, we only use the first value of the pair.

Implementing head and tail using popAux is now trivial

Lookup

Finding an element can be done by transforming the problem into smaller instances.

It helps to look at some simple examples. Let’s consider 3 cases.

Case 1. If we had a single element, we either return it if the index is 0, or throw if it’s larger than that.

0: (0)

Case 2. If we have 2 elements,

0: ()
1: (0, 1)

Notice that when we go from the first level to the second, all items “doubled” in size, so we can “transform” this to the single element case by treating pairs as a single element, but since the index has individual elements granularity, we need to transform it by halving it. We reduced it to Case 1.

If our initial index was either 0 or 1, it’s now 0, and we found our element in level 1.


1: (0)

The problem is that we need to return a single element at level 0, not a pair, so we need to undo the transformation. We can use the parity of the original index will to decide which side of the pair to return. If it’s even, we return the first element, otherwise the second one.

Case 3. If we have 3 elements,
0: (0)
1: (1, 2)

and our index is larger than 0, we move to the next level but we need to account for the level we’re skipping, so the transformation of index would look like:
0: ()
1: (0)

which is reduced to Case 2.

These 3 cases can be used to find elements larger than 3. For example, say we have 10 elements and want to find the element at position 6:

0: ()
1: (0, 1)
2: ()
3: (((2, 3), (4, 5)), ((6, 7), (8, 9)))

Step 1. This is Case 2. We need to transform this by treating pairs as elements and halving the index:

1': (0)
2': ()
3': ((1, 2), (3, 4))

Note how this reduced the problem of finding the element at position 3 of a list with size 5. Step 2. We now are in case 3, where we need to skip the current element:

1': ()
2': ()
3': (((0), (1)), ((2), (3)))

Our index is now 2. Step 3. we go one level down

2: ()
3: (0, 1)

With an index of 1. Step 4. Finally, we halve it once again and we finally found the right level that contains our index.

3: (0)

We now need to recurse back to find exactly which element on that level to pick. On Step 4, we can see our index 1 was on the right side of the pair in level 3, so we pick the right side, that is, ((6, 7), (8, 9)).

On Step 3, our index 2 was on the left size of the innermost pair, that is (6, 7). On Step 2, we skipped the current element but didn’t change levels, so there’s no need to choose an element from the pair. Finally, on Step 1, the index 6 was on the left side of the innermost pair, which should return the element with a corresponding index 6.

In general, we can tell which side of the innermost pair to pick by observing that the indexes are ordered from left to right in a given level. And because every level has an even number of elements, we can assume that the first index in the level – or the first element in the first pair – is even. Hence the parity of the index is sufficient to determine which side of the pair to pick.

With this algorithm in mind, the lookup function is quite compact:

The update follows a similar idea as the lookup, but the problem is that we need to return the updated level when returning from the recursion. That is, we need to update the level before returning.

To accomplish that, we can pass a callback, the updater, that encodes which pair we would pick at each level. We start with a function that simply returns the element to be updated

(fun _ -> element)

Then, at each level we create a new updater, which applies the previous updater on the left or right side of the pair, depending on the parity of the index:

When we finally find the level that has our index, we can apply the function, which has the effect of “narrowing” down the elements from a given level to a single element, replacing the value at the target index and then returning the updated elements when returning from the recursion.

After applying the updater, we return the updated level recursively.

Structural Decomposition

Okasaki introduces this implementation in the context of Structural Decomposition, which is a technique for creating data structures from incomplete ones. In this example, the raw implementation of the sequence can only represent lists of size 2^k, but modeling each node in the sequence to be zero or one, zero not having any elements, we can work around the size restriction.

Conclusion

The implementation of random access binary access list using sequences is very neat, but very hard to understand.

One of the major selling points of shorter code is that it tends to contain less bugs and also less corner cases. On the other hand, if the code is also harder to read and understand, it might be harder to spot bugs.

This post helped me understand a bit more about OCaml’s type system. Digging a little also led me to interesting topics such as Parametric Polymorphism [4] and Existential vs. Universally quantified types [5].

References

[1] Purely Function Data Structures – Chris Okasaki
[2] Jane Street – Ensuring that a function is polymorphic
[3] Ensuring that a function is polymorphic in Ocaml 3.12
[4] Wikipedia – Parametric polymorphism
[5] Existential vs. Universally quantified types in Haskell

Numerical Representations as inspiration for Data Structures

Book

In this chapter Okasaki describes a technique for developing efficient data structures through analogies with numerical representations, in particular the binary and its variations.

We’ve seen this pattern arise with Binomial Heaps in the past. Here the author presents the technique in its general form and applies it to another data structure, binary random access lists.

Binary Random Access Lists

These lists allows efficient insertion at/removal from the beginning, and also access and update at a particular index.

The simple version of this structure consists in distributing the elements in complete binary leaf trees. A complete binary leaf tree (CBLF) is one that only stores elements only at the leaves, so a tree with height i, has 2^(i+1)-1 nodes, but only 2^i elements.

Consider an array of size n, and let Bn be the binary representation of n. If the i-th digit of Bn is 1, then we have a tree containing 2^i leaves. We then distribute the elements into these trees, starting with the least significant digit (i.e. the smallest tree) and traversing the tree in
pre-order.

For example, an array of elements (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) has 11 elements, which is 1011 in binary. So we have one tree with a single leave (1), a tree with 2 leaves (2, 3) and another containing 8 leaves (4, 5, 6, 7, 8, 9, 10, 11).

We use a dense representation of the binary number, by having explicit elements for the 0 digit. Here’s a possible signature for this implementation:

Inserting a new element consists in adding a new tree with a single leave. If a tree already exists for a given size, they have to be merged into a tree of the next index. Merging two CBLFs of the same size is straightforward. We just need to make them children of a new root. Since elements are stored in pre-order, the tree being inserted or coming from carry over should be the left child.

Looping back to our example, if we want to insert the element 100, we first insert a tree with a single leaf (100). Since the least significant digit already has an element, we need to merge them into a new tree containing (100, 1) and try to insert at the next position. A conflict will arise with (2, 3), so we again merge them into (100, 1, 2, 3) and try the next position. We finally succeed in inserting at position 2, for a new list containing trees like (100, 1, 2, 3) and (4, 5, 6, 7, 8, 9, 10, 11).

The complexity of inserting an element is O(log n) in the worst case which requires merging tree for all digits (e.g. if Bn = 111...111). Merging two trees is O(1).

Removing the first element is analogous to decrementing the number, borrowing from the next digit if the current digit is 0.

Searching for an index consists in first finding the tree containing the index and then searching within the tree. More specifically, because the elements are sorted beginning from the smallest tree to the largest, we can find the right tree just by inspecting the number of elements in each tree until we find the one whose range includes the desired index. Within a tree, elements are stored in pre-order, so we can find the right index in O(height) of the tree.

After finding the right index, returning the element at that index is trivial. Updating the element at a given index requires rebuilding the tree when returning from the recursive calls.

Okasaki then proposes a few different numbering systems that allow to perform insertion/removal in O(1) time. Here we’ll only discuss the less obvious but more elegant one, using skew binary numbers.

characters

Skew Binary Random Access Lists

A skew binary number representation supports the digits 0, 1 and 2.

The weight of the i-th digit is 2^(i+1) - 1. In its canonical form, it only allows the least significant non-zero digit to be 2.

Examples:

table-v2

Decimal and Skewed Binary

It’s possible to show this number system offers a unique representation for decimal numbers. See the Appendix for a sketch of the proof and an algorithm for converting decimals to skewed binary numbers.

Incrementing a number follows these rules:

  • If there’s a digit 2 in the number, turn it into 0 and increment the next digit. By definition that is either 0 or 1, so we can safely increment it without having to continue carrying over.
  • Otherwise the least significant digit is either 0 or 1, and it can be incremented without carry overs.

The advantage of this number system is that increments (similarly, decrements) never carry over more than once so the complexity O(1), as opposed to the worst-case O(log n) for regular binary numbers.

A skew binary random access list can be implemented using this idea. We use a sparse representation (that is, not including 0s). Each digit one with position i corresponds to a tree with (2^(i+1) - 1) elements, in this case a complete binary tree with height i+1. A digit 2 is represented by two consecutive trees
with same weight.

Adding a new element to the beginning of the list is analogous to incrementing the number, which we saw can be done in O(1). Converting a digit 0 to 1 or 1 to 2, is a matter of prepending a tree to a list. To convert a 2 to 0 and increment the next position, we need to merge two trees representing it with the element to be inserted. Because each tree is traversed in pre-order, we make the element the root of the tree.

Elements are inserted in pre-order in each tree, so when searching for an
index, we can first find the right tree by looking at the tree sizes and within a tree we can do a “binary search” in O(height) of the tree.

Binomial Heaps

In this chapter, this technique is also applied to improve the worst case runtime of insertion of binomial heaps. The implementation, named Skewed Binomial Heap, is on github.

Conclusion

This chapter demonstrated that binary representations are a useful analogy to come up with data structures and algorithms, because they’re simple. This simplicity can lead to inefficient running times, though. Representations such as skewed binary numbers can improve the worst case of some operations with the trade-off of less intuitive and more complex implementations.

Appendix A – Proof

Sketch of the proof. First, it’s obvious that two different decimals cannot map to the same binary representation. Otherwise the same equation with the same weights would result in different values. We need to show that two binary representations do not map to the same decimal.

Suppose it does, and let them be B1 and B2. Let k be the largest position where these number have a different digit. Without loss of generality, suppose that B1[k] > B2[k].

First case. suppose that B1[k] = 1, and B2[k] = 0 and B2 doesn’t have any digit 2. B1 is then at least M + 2^{k+1} - 1, while B2 is at most M + \sum_{i = 1}^{k} (2^{i} - 1) which is M + 2^{k + 1} - k (M corresponds to the total weight of digits in positions > k). This implies that B2 < B1, a contradiction.

Second case. suppose that B1[k] = 1, but now B2 does have a digit 2 at position j. It has to be that j < k. Since only zeros follow it, we can write B2‘s upper bound as

M + \sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1

Since 2(2^{j + 1} - 1) < 2^{j + 2} - 1, we have

\sum_{i = j + 1}^{k} (2^{i} - 1) + 2^{j + 1} - 1 < \sum_{i = j + 2}^{k} (2^{i} - 1) + 2^{j + 2} - 1

We can continue this argument until we get that B2 is less than M + 2(2^{k} - 1) which is less than M + 2^{k + 1} - 1, B1.

Third case. Finally, suppose we have B1' such that B1'[k] = 2, and B2'[k] = 1. We can subtract 2^{k+1} - 1 from both and reduce to the previous case. ▢

Appendix B – Conversion algorithm

Converting from a decimal representation to a binary one is straightforward, but it’s more involved to do so for skewed binary numbers.

Suppose we allow trailing zeros and have all the numbers with k-digits. For example, if k=2, we have 00, 01, 02, 10, 11, 12 and 20. We can construct the numbers with k+1-digits by either prefixing 0 or 1, and the additional 2 followed by k zeros. For k=3, we have 000, 001, 002, 010, 011, 012, 020, 100, 101, 102, 110, 111, 112, 120 and finally 200.

More generally, we can see there are 2^(k+1) - 1 numbers with k digits. We can construct the k+1 digits by appending 0 or 1 and then adding an extra number which is starts 2 and is followed by k zeros, for a total of 2^(k+1) - 1 + 2^(k+1) - 1 + 1 = 2^(k + 2) - 1, so we can see this invariant holds by induction on k, and we verify that is true for the base, since for k = 1 we enumerated 3 numbers.

This gives us a method to construct the skewed number representation if we know the number of its digits say, k. If the number is the first 2^(k) - 1 numbers, that is, between 0 and 2^k - 2, we know it starts with a 0. If it’s the next 2^(k) - 1, that is, between 2^k - 1 and 2^(k+1) - 3, we know it starts with a 1. If it’s the next one, exactly 2^(k+1) - 2, we know it starts with a 2.

We can continue recursively by subtracting the corresponding weight for this digit until k = 0. We can find out how many digits a number n has (if we’re to exclude leading zeros) by find the smallest k such that 2^(k+1)-1 is greater than n. For example, for 8, the smallest k is 3, since 2^(k+1)-1 = 15, and 2^(k)-1 = 7.

The Python code below uses these ideas to find the skewed binary number representation in O(log n):

One might ask: why not OCaml? My excuse is that I already have a number theory repository in Python, so it seemed like a nice addition. Converting this to functional code, in particular OCaml is easy.

This algorithm requires an additional O(log n) memory, while the conversion to a binary number can be done with constant extra memory. My intuition is that this is possible because the weights for the binary numbers are powers of the same number, 2^k, unlike the skewed numbers’ weights. Is it possible to work around this?

Lazy Rebuilding

Book

In this chapter Okasaki starts with a common technique data structures use to achieve efficient complexity times. A classic example are balanced binary search trees, which, on one hand, need to be balanced to avoid degenerated cases (think of a binary tree which can degenerate to a path), but on the other hand, balancing is usually too expensive to perform at every operation.

The tradeoff is to allow a certain degree of “imbalance”, such that the big-O complexity of operations does not degenerate, and to let us perform re-balances only every so often, in a way that leads to an efficient amortized complexity. Structures such as AVL trees and Red-black trees make use of this technique.

The general technique is named batched rebuilding. The main issue with it though is that we saw that amortized analysis does not play well persistent data structures.

To address that problem, a technique was developed by Mark Overmars, called global rebuilding.

Global Rebuilding

The basic idea is to keep two copies of a structure in parallel, one of which we perform incremental updates (write) and the other is used for read operations. After a few operations, we copy the write version into the read-only version. The idea is that the incremental updates are cheap, but might put the structure in a state that is not suitable for reading. This allows for distributing the costs across operations such that each operation has worst case efficient bounds.

The major downside of this technique is the complexity and corner cases, because we also need to account for changes to the structure that renders the write copy of the structure invalid.

We’ll now describe yet another implementation of queues that use this technique, developed by Robert Hood and Robert Melville. Similar to other queue implementations, this structure maintains a front and rear lists, the latter in reverse order. It also has the invariant that the rear list cannot be larger than the front.

When that happens we must perform a rotation, which consists in reversing the rear queue and appending it to the end of the front queue. We cannot perform this operation piecemeal, because we only have efficient access to the first element of a list. The operation we can do partially is concatenating the reverse of a list to the beginning of another list, that is, given xs and ys, we can do ~xs + ys, where ~ represents the reverse of a list. Note we can also reverse a list, that is xs to ~xs piecemeal by having ys = [].

Now, to achieve our goal which is xs + ~ys, we reverse both xs and ys, then concatenate the reverse of ~xs to ~ys:

1) Start with xs and ys
2) Reverse both: ~xs and ~ys
3) Then (~(~xs)) + ~ys which is xs + ~ys

We can perform these operations one step at a time by using a “state machine”, we first start with a state Reversing(xs, ys) which will reverse xs and ys at the same time into ~xs and ~ys, at which point we switch to the state Appending(~xs, ~yx) which will concatenate the xs into ~ys, so then we get to Done(xs + ~ys).

The problem of keeping a separate state is that it might not be accurate by the time it’s done. In particular, if we remove an element from the front, the Appending step can be shortcut. We can keep a count of how many of the elements in ~xs of Appending are still present. If, by the time we’re appending the number of present elements goes to 0, the last elements of ~xs (that is, the first elements of xs) have been removed, so they do not need to be transferred to ~ys.

A possible implementation of the states is:

The Idle is just a placeholder to make sure the state machine can keep going to the next state even when we’re not currently performing a rotation. The state transition definition is given by:

One important detail here is that Appending ({validCount = 0; front; rear}) must appear before the other matching rule for Appending, because the other one includes this.

When we remove an element from the front of the queue we need to update the number of valid elements in the front of the rotation state:

Similar to the Real-time queues, we can guarantee constant time worst case for all the operations if we execute the state machine twice for each operation. The check() function verifies if we need a rotation and also executes the nextStep() function twice.

The other operations are then straightforward. The only thing is that pop() must call the invalidate() function because it’s decreasing the size of front:

The full, up-to-date implementation in on github.

Lazy Rebuilding

As we can see, the Hood-Melville implementation is quite involved. Okasaki argues that making use of lazy evaluation and memoization simplifies the implementation. We can indeed see that the implementation of Real-time queues is much cleaner.

One simplification is that we don’t have to sync the write copy with the read copy. We just need to make sure to execute items from the schedule by the time they are needed. Memoization will take care of the rest.

Another advantage in this case is that reversing and concatenating lazily evaluated lists does not require an intermediate reversal of the front, which means that we can remove the front element without the need to invalidate any items in the schedule.

The author provided more examples of lazy rebuilding for deques (double-ended queues) by first presenting an amortized version using the Banker’s method and then a lazy rebuilding version. The Banker’s deque and Real-time deque are also on my github.

Conclusion

In this chapter the technique behind Real-time queues was formalized. I really enjoy the organization of the book in which the author presents a data structure and from there he extracts a generic pattern of technique that can be applicable to other cases.

When I was studying data structures I don’t recall learning about general techniques underlying existing data structures, such as batched rebuilding for AVL trees and Red-black trees. That would have been interesting.