In Chapter 10 of Purely Functional Data Structures, Okasaki describes a generalized trie. In this post we’ll see how tries can be seen as maps over a specific type of keys and how we can construct maps over any type of keys.
The basic form of a map is a pair of (key, value). In most programming languages the key has to be a scalar (e.g. int, string) or some complex that can be converted to one (think of
hashCode() in Java).
For simplicity, let’s assume our scalar is an
int and our value is of generic type
T. Also, note that the syntax we present below is not valid OCaml (I find OCaml’s type syntax a bit hard to read). A map with int keys can be represented as:
If we want a two dimensional key, for example
(int, int), we can have a map of maps:
map<int, map<int, T>>
and if the dimensions are not fixed, for example, a list of integers, then we can define a recursive structure for our map
map<list<int>, T> = Node of (option<T> * map<int, mapOverList>)
If we think of strings as lists of characters, a trie is then a map where the key is a list of characters. From the type above we can simply change the key of our map above to a list of characters and for a basic trie T is a boolean
map<string, bool> = Node of (bool * map<char, mapOverList>)
option<bool> is redundant, so we can use
Maps over trees
We can now generalize the key to more complex recursive types such as trees. Suppose we fave the following type for
tree<int> = (int * tree<int> * tree<int>)
The outermost map is indexed by the root of the tree. The inner maps are indexed by the left and right subtrees respectively:
map<tree<int>, T> = map<int, map<tree<int>, map<tree<int>, T>>>
If we expand the key of the second map we get the following:
map<tree<int>, map<tree<int>, T>> = map<int, map<tree<int>, map<tree<int>, map<tree<int>, T>>>
It gets pretty involved very quickly but because we traverse these types recursively, the implementation is still simple. Let’s see how to implement these in OCaml. The type of a map over an int tree can be defined as follows:
|module IntMap = Map.Make(Int64)|
|type 'a tree = Empty | Node of 'a * 'a tree * 'a tree|
|type key = IntMap.key tree|
|type 'a trie = Trie of 'a option * 'a trie trie IntMap.t|
Note that the outermost map is a regular
IntMap, which uses the root element of the map, a scalar, as key.
The search function takes a tree representing the key, and the map. The base case is when the tree is empty, when we’re just past a leaf. The recursive case consists in obtaining the maps in order, first using the root element, then using the left tree and finally using the right tree:
|let rec find: 'a. key -> 'a trie -> 'a =|
|fun tree trie -> match (tree, trie) with|
|| (Empty, Trie (None, children)) -> raise Not_found|
|| (Empty, Trie (Some value, children)) -> value|
|| (Node (root, left, right), Trie (_, children)) ->|
|let trieForRoot = M.find root children in|
|let trieForLeft = find left trieForRoot in|
|find right trieForLeft|
Note that the recursive call is non-uniform, so we need explicit annotations, as we discussed previously.
The insertion is similar, expect that when we fail to find a key in any of the maps, we need to first insert it with an empty element before recursing.
Map.find() implementation throws exceptions when a key doesn’t exist, we can wrap the call with a
try and if an exception occurs, we can insert the missing key (alternatively we could use
|let rec insert: 'a. key -> 'a -> 'a trie -> 'a trie =|
|fun tree value trie -> match (tree, trie) with|
|| (Empty, Trie (_, children)) -> Trie (Some value, children)|
|| (Node (root, left, right), Trie (trieValue, children)) ->|
|let trieForRoot = try|
|M.find root children|
|Not_found -> empty|
|let trieForLeft = try|
|find left trieForRoot|
|Not_found -> empty|
|let newTrieForLeft = insert right value trieForLeft in|
|let newTrieForRoot = insert left newTrieForLeft trieForRoot in|
|let newChildren = M.add root newTrieForRoot children in|
|Trie (trieValue, newChildren)|
Maps over products and sums
There are two ways a structure can branch out recursively: through combination or through alternative. For a combination, refer to our definition of
Node of 'a * 'a tree * 'a tree
In theory, any combination of values for
'a tree are possible values for a Node. The
* in between the components in fact represent the cartesian product operator. This is also known as product.
For alternative, the type of the tree itself can be either Empty or Node:
type 'a tree = Empty | Node of (...)
In this case, the valid values for a tree is the sum of values of each alternative. Hence, this is also known as sum.
We can generalize the map with keys of any types by looking at their definition. If it’s a product, we end up with nested maps. For example, if
Tk = (Tk1 * Tk2)
then the map over
Tk can be defined as
map<Tk, T> = map<Tk1, map<Tk2, T>>
In our example, this came the nested maps
'a trie trie IntMap.t.
For sums, if the type is
Tk = Tk1 | Tk2
the map over
Tk would end up as a product of maps:
map<Tk, T> = (map<Tk1, T> * map<Tk2, T>)
In our example, this came the product
('a option) and
('a trie trie IntMap.t). option can be thought as a one-dimensional map.
In this post we saw how a trie can be modeled as a map using the same principles as the ones we use to construct matrices, that is, two-dimensional maps (nested maps). We then generalized the string keys to trees, and implemented the insert/find functions in OCaml. I found it pretty hard to reason about these structures.
We then went a step further and saw how to construct maps based on the key structure. And we learned about product and sum of types when discussing recursive types.