Software transactional memory in Haskell


In this post we’ll discuss transactional memory in Haskell. This post is a set of notes from Chapter 28 of Real World Haskell and Chapter 20 of Parallel and Concurrent Programming in Haskell.

We’ll first introduce the concept of software transactional memories, then explain what problem it tries to solve and then go through some toy examples in Haskell to learn about the API provided by the Control.Concurrent.STM library.


What is Software Transactional Memory? According to [1]:

Software Transactional Memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. STM is strategy implemented in software, rather than as a hardware component.

It’s one of many alternatives to deal with shared memory among several concurrent threads in concurrent environments. We’ve discussed parallel and concurrent programming in a previous post. In that case, our choice for thread synchronization was using locks implemented with mutable variables.

The Problem

One problem with using locks to grant exclusive access to memory is that we need to careful with pitfalls like deadlocks. For example, consider the following function that implements transferring values between two variables by using locks:

import Control.Concurrent

transfer :: MVar Int -> MVar Int -> Int -> IO ()
transfer a b val = do
  valA <- takeMVar a
  valB <- takeMVar b
  putMVar a (valA - val)
  putMVar b (valB + val)

First, lets recap how MVars behave: an MVar either has a value or is empty. If a thread calls takeMVar() on an empty MVar, it blocks until another thread puts a value there. On the other hand, a thread blocks if it tries to putMVar() on a MVar that is not empty.

In our example, we’re basically acquiring a lock to the variables a and b, so no other thread can read/write to them while we’re performing our operations. After we’re done, we write the new values back to the variables, releasing the acquired locks.

The problem here is the potential of a deadlock. Imagine a scenario in which two threads, T1 and T2, are calling transfer() at the same time, but T1 wants to transfer funds from a1 to a2 (calling transfer a1 a2) while T2 do the opposite (calling transfer a2 a1). It could happen that T1 acquires the lock to a1, but before it can acquire the lock to a2, T2 gets it. Now, T1 is blocked waiting for the lock to a2 and T2 waiting for a1, causing a deadlock. A proposed solution is to always acquire the locks in a consistent order. For example by assigning IDs to the variables and acquiring the locks ordered by IDs. In this case, both T1 and T2 would try to acquire a1 and then a2.

This issue might not be as obvious if we’re dealing with a complex real-world application, so we want models to prevent such cases. One such solution is using Software Transactional Memory (STM). In this model, the reading and writing to shared variables can be done atomically by using transactions, that is, either all operations succeed or none of them are executed (we rollback to the initial state).

The STM Library

We can draw some parallels between MVars and the STM library (I recommend reading this post for an introduction).

Transaction variable. or TVar, is a parametrized type from the Control.Concurrent.STM library, similar to MVar. To create a new TVar, we can use the function newTVar. Let’s take a look at the type interface for this function:

> :t newTVar
newTVar :: a -> STM (TVar a)

We see it return a TVar wrapped in a STM monad (remember the analogous function for MVars, newMVar(), returns it wrapped in the IO monad). Before talking about this monad, let’s describe the ways to access the contents of a TVar:

> :t readTVar
readTVar :: TVar a -> STM a
> :t writeTVar
writeTVar :: TVar a -> a -> STM ()

The STM monad., similar to the IO monad, allow us to perform side effect actions, but STM limits side-effects to only TVars. [1] provides a great explanation about the difference between these two monads:

Why is STM a different monad from IO? The STM implementation relies on being able to roll back the effects of a transaction in the event of a conflict with another transaction (…). A transaction can be rolled back only if we can track exactly what effects it has, and this would not be possible if arbitrary I/O were allowed inside a transaction—we might have performed some I/O that cannot be undone, like making a noise or launching some missiles. For this reason, the STM monad permits only side effects on TVars, and the STM implementation tracks these effects to ensure the correct transaction semantics.

Composability. One feature of STM is composability, since we can combine two or more STM action in another STM. This enables better reuse, something we can’t do easily if we use mechanisms like locks.

For an example, imagine we have a function that uses MVars to modify a variable atomically. Now suppose we want to modify two variables atomically. Even though we have the function to do this independently to each variable, we can’t combine those in order to get a new function that modifies both variables atomically.

Since STM are modelled as monads, we can simply combine them using the do notation. For example, suppose we have a function bump(), which increments the content of a TVar by a given amount:

import Control.Concurrent.STM

bump :: TVar Int -> Int -> STM ()
bump var amount = do
  val <- readTVar var
  writeTVar var (val + amount)

We can rewrite the transfer function we had for MVar in terms of bump():

transfer :: TVar Int -> TVar Int -> Int -> STM ()
transfer a b val = do
  bump a val
  bump b (-val)

Because we’ll execute all these steps in a single transaction, we don’t have to worry about acquiring the locks for both variables before hand, so we were able to combine independent functions but still have consistency guarantees.

Executing a STM. Even though we’re combining smaller STM actions into more complex ones, we’re still restricted to the STM land, but eventually we’ll have to interface with the real-world, which means we’ll have to convert a STM action to an IO action. That’s exactly what the combinator atomically does:

>:t atomically
atomically :: STM a -> IO a 

This will execute the entire STM action in a transaction.

We also might want to create TVars in the IO space instead of the STM space, for example if we’re interfacing with IO code that will execute the STM action. For example, if we want to test our composite transfer() function, we can define:

testTransfer :: IO ()
testTransfer = do
  a <- newTVarIO 1
  b <- newTVarIO 2
  atomically $ incrementBoth a b
  printTVar a
  printTVar b

printTVar :: TVar Int -> IO ()
printTVar var = do
  val <- readTVarIO var
  putStrLn $ show val

Here we create two sample TVars, with 1 and 2 as values. Since we’re inside an IO do block, we need to return the TVars in the IO space, so that’s why we use the newTVarIO() variant. We can then execute the transfer() function atomically and finally print the contents of the variables afterwards. Since testTransfer() returns an IO action we can try this out on GHCI:

> testTransfer

Rollback. How does STM implements the transaction? What happens when two threads try to access the same TVar at the same time? Consider two threads T1 and T2 calling bump() on the same variable at the same time. It might happen that both will execute the first line of the function:

val <- readTVar var 

before writing the incremented value back which, if we didn’t have a transaction in place, it would cause a data consistency problem (the thread to write last would overwrite the results of the other). Because we’re executing using a transaction, if either of the threads realizes the contents of their variables changed since the transaction began, it will rollback and restart the transaction. In this case, suppose T2 manages to write back the the variable first. T1 will have to rollback because the state it had at beginning of the transaction changed. T1 will then keep retrying the transaction until it succeeds.

Custom rollbacks. We just saw that STM implements the automatic rollback to guarantee data consistency. But what if we want to force a rollback when some condition is not met? For example, in our transfer example, we could rollback if the balance of one of the variables would become negative. There is another function, retry, which if called will cause the entire transaction to rollback.

Let’s change our bump() function to rollback if the resulting value is less than 0.

bumpIfValid :: TVar Int -> Int -> STM ()
bumpIfValid var amount = do
  val <- readTVar var
  if val + amount < 0
    then retry
    else writeTVar var (val + amount)

To test the code above, we can have one thread trying to decrement a variable by an invalid amount, while another thread will increment the content. The decrementing thread will keep trying to run the transaction until the increment thread is done:

bumpAndPrint :: TVar Int -> Int -> MVar () -> IO ()
bumpAndPrint var amount lock = do
  atomically $ bumpIfValid var amount
  val <- readTVarIO var
  withMVar lock $
    \_ -> putStrLn ("New value: " ++ (show val))

testValidBumps :: Int -> IO ()
testValidBumps valA = do
  a <- newTVarIO valA
  lock <- newMVar ()
  forkIO $ bumpAndPrint a (-1) lock
  threadDelay 1000000 -- sleep 1 second
  bumpAndPrint a 3 lock

In this implementation, we fork a new thread that will try to decrement the TVar a by 1, if it doesn’t succeed, it will try until the condition is met. We’ve added a delay to the main thread to make sure the child thread has a chance to execute first. Also, we’re using MVars as a lock to make sure the putStrLn() operation happens atomically, so the output doesn’t come out all scrambled (we discussed this technique before).

Let’s test calling testValidBumps() in GHCI with 0:

> testValidBumps
New value: 3
New value: 2

The child transaction didn’t succeed at first. After a second, the main thread will increment the TVar by 3, after which the child thread will be able to proceed. Now, if we start with an amount enough for the child thread to succeed, say 1, we get:

> testValidBumps
New value: 0
New value: 3

OR’ing STMs. Note that when combining multiple STM actions in a do block, we’re essentially chaining them using the bind Monad operator. If any STMs in that block can triggers a retry, the entire composed STM will rollback. In a sense, the success of a composed STM is a “AND” of the success of the sub-STMs.

Another way to combine multiple STMs is to “OR” such that an STM succeeds if any of its sub-STMs succeed. We can do that by using the orElse combinator,

> :t orElse
orElse :: STM a -> STM a -> STM a

It tries to execute the first STM and if that is rolled back via retry, then the second action is then executed. If the second one also triggers a retry, then the composite STM retries. Let’s extend our previous example, by trying to bump two variables. If we fail to bump the first, we try the second.

bumpEitherAndPrint :: TVar Int -> TVar Int -> Int -> MVar () -> IO ()
bumpEitherAndPrint a b amount lock = do
  atomically $ 
    (bumpIfValid a amount) `orElse` (bumpIfValid b amount)
  valA <- readTVarIO a
  valB <- readTVarIO b
  withMVar lock $ do
    let valStr = (show valA) ++ ", " ++ (show valB)
    \_ -> putStrLn ("New value: " ++ valStr)

testOrElse :: Int -> Int -> IO ()
testOrElse valA valB = do
  a <- newTVarIO valA
  b <- newTVarIO valB
  lock <- newMVar ()
  forkIO $ bumpEitherAndPrint a b (-1) lock
  threadDelay 1000000
  bumpEitherAndPrint a b 3 lock

We have two variables now and the second thread will try first to decrement a and if it doesn’t succeed it tries b. Let’s try with a few cases:

> testOrElse 1 1
New value: 0, 1
New value: 3, 1

In the code above, it succeed in decrementing the first variable.

> testOrElse 0 1
New value: 0, 0
New value: 3, 0

This time, it failed to decrement the first variable, but succeeded on the second.

> testOrElse 0 0
New value: 3, 0
New value: 2, 0

Finally, in this case, it failed decrementing both variables, so the whole transaction was rolled out, until the main thread eventually bumped the first variable, allowing it to be then decremented.


In this post we learned a new technique to deal with shared memory in a concurrent environment. The STM monad provides a neat design that allows composability and simplifies the work of using transactions in Haskell.

The main references I used for this post offers more practical and complex examples. I personally understand concepts better with toy examples, so I tried to use those in this post. The books offer interesting examples, which provide the power of this design.

After reading/writing about STMs, I feel like I’ve improved my understanding of the IO monad by seeing the differences between it and STM.

Further Reading. [4] offers a good introduction to the subject and describes the dinning philosophers problem and implements a solution using STM.

Jones’ paper [3] is very digestible and instructive. It offers a solution to a concurrency exercise, the Santa Claus problem [5].

Last but not least, Marlow [1] offers a really nice discussion about performance, based on the current implementation of transactions by STM. The takeaway is to always minimize the amount of work we do inside an atomic block, since it increases the chance of rollbacks and those are not free.

Worth noting, in the same chapter [1], Marlow implements a deque data structure (a list-like structure which allows inserting/removing elements from either ends in O(1) – amortized in this case) based on Okasaki’s Purely Functional Data Structures, which I’m planning to start reading soon.


[1] Parallel and Concurrent Programming in Haskell – Chapter 10. Software Transactional Memory
[2] Real World Haskell – Chapter 28. Software transactional memory
[3] Beautiful concurrency, Simon Peyton Jones
[4] Computational Thoughts – Some examples of Software Transactional Memory in Haskell
[5] A New Exercise in Concurrency, John A. Trono


Haskell Basic Networking

This post is a set of notes from Chapter 27 of Real World Haskell. In this chapter, the authors discuss basic network programming using Haskell. It presents two simple client-server communication examples: one using UDP and the other TCP.

In this post, we’ll start by revising some basic computer network concepts and then will comment on different parts of the examples presented in the book.


The Transport Layer

The communication between two computers is often organized in multiple layers, following the OSI model standard. One of the layers is the transport layer. This layer is responsible for transferring data from a source to a destination, offering different levels of guarantees. The most famous transport layer protocols are UDP and TCP.

UDP stands for User Datagram Protocol and TCP for Transmission Control Protocol.


UDP provides a lightweight abstraction to send data from one host to another, by sending pieces of information, called Datagram, one at a time. According to [2]:

A datagram is an independent, self-contained message sent over the network whose arrival, arrival time, and content are not guaranteed.

Because of this, we have no guarantee the packet will arrive in order or that the packets will arrive at all. UDP uses checksum to verify whether a given packets arrived to the host was corrupted.

TCP offers more guarantees than UDP, but is less performant. It first establishes a connection between the client and the server and then sends TCP segments. Within a connection, TCP in the server is able to sort the segments in the order they were sent by the client. Also, it can retransmit segments if it doesn’t receive confirmation.

Network sockets

A network socket is the endpoint of inter-process communication between computers in a network.

The sockets types include:

* Datagram which uses the User Datagram Protocol (UDP)
* Stream, which uses the Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP).
* Raw sockets, which bypass the transport layer.

Unix-based systems use the Berkeley sockets API which uses file descriptors (integers) to identify a socket.

Client-server using UDP

Let’s study the code. As the authors mention [1], the functions provided by the Network.Socket module, are corresponding to the low-level functions in C, so we can refer to those for documentation.

The getaddrinfo() function

The getaddrinfo() function takes a node (hostname), a service (port) and a set of hints flags as inputs and returns a list of structures called addrinfo as output. It will try to find all the addresses matching the constraints provided from the inputs.

There are two modes we’re interested in here: listening and publishing. For the listening mode, we can provide a flag AI_PASSIVE to the hints flags and a null value to node. According to the man page:

If the AI_PASSIVE flag is specified in hints.ai_flags, and node is NULL, then the returned socket addresses will be suitable for bind(2)ing a socket that will accept(2) connections

In Haskell we’re doing exactly that for the server:

addrinfos <- getAddrInfo
               (Just (defaultHints {addrFlags = [AI_PASSIVE]}))
               (Just port)

For the publishing mode, the docs say:

If the AI_PASSIVE flag is not set in hints.ai_flags, then the returned socket addresses will be suitable for use with connect(2), sendto(2), or sendmsg(2)

In our client code we then do:

addrinfos <- getAddrInfo 
               (Just hostname) 
               (Just port) 

The socket() function

A socket is like a network file descriptor. The socket() function takes the family domain, type of socket and the protocol. It’s not clear from the docs what this protocol refers to, expect that 0 is the default protocol and it’s dependent of the address family (first parameter). [3] Suggests it’s the application layer protocol (e.g. HTTP, POP3).

Since we’re going to use UDP, the arguments passed to the socket function in our Haskell code are:

sock <- socket 
          (addrFamily serveraddr) 

Server: Listening

With the socket file descriptor, we can bind an address to it using the bind function. It takes a socket file descriptor and the address and returns 0 on success or -1 on error.

To receive the messages, we use the recvfrom() function, which takes the socket, the maximum size of the packet and will return the message and the address of the sender. In the Haskell version, we have recvFrom implemented in Network.Socket. The documentation has the following warning though:

Do not use the send and recv functions defined in this module in new code, as they incorrectly represent binary data as a Unicode string. As a result, these functions are inefficient and may lead to bugs in the program. Instead use the send and recv functions defined in the ByteString module.

We can use the ByteString version by doing

import Network.Socket hiding (send, sendTo, recv, recvFrom)
import Network.Socket.ByteString

We also need to update all the places we use Strings with ByteString.

Client: Sending data

From the client side, we can use the sendto() function, providing the socket file descriptor, the data and the address of the server. The function will return the number of bytes sent.

In our Haskell code, we have

sendTo (slSocket syslogh) omsg (slAddress syslogh)

Where slSocket gets the socket, osmg is the message, and slAddress the host address. This call might not send the entire message at once, so we have to keep calling this function until the message is completely sent.


After trying running the code above for the client and server, I was not able to have the server print out the messages sent from the client in a Mac OS X. My first suspicion was that the server code had some missing configuration or bug.

I’ve tried using netcat, a tool for reading or writing to network connections via UDP or TCP. To listen to port 1514 using UDP we can do it by running:

nc -u -l -k 1514

The u flag indicates we’re using UDP (default is TCP). The l flag indicates we’re listening instead of sending, and k tells netcat not to disconnect after the client disconnects. So we now basically have a simple server on localhost:1514.

I’ve made a binary for the syslogclient.hs code example, by simply adding a main function and compiling it using ghc:

main = do
  message <- getLine
  h <- openlog "localhost" "1514" "syslogclient"
  syslog h USER INFO message
  closelog h

When running:

$ ghc syslogclient.hs
$ ./syslogclient 
hello world

I didn’t see any output from the netcat side. The next test was verifying if the client code had an issue. I took a similar approach with the syslogserver.hs code, adding the main function and generating a binary:

main = do
  putStrLn "Starting server...\n"
  serveLog "1514" plainHandler

Then started the server up:

$ ghc syslogserver.hs
$ ./syslogserver 

This time I used netcat to send the message using UDP. The command I ended up using was

echo "hello world" | nc -4u localhost 1514

As in the listening mode, the u flag here tells netcat to use the UDP protocol and 4 forces it to use IPv4 only. And this finally worked!

At this point there were a couple of questions hanging: what configuration is missing from the client code and why the server only displays the message if I force it to use IPv4 addresses?

Trace. One tool I’ve been missing from Haskell was the ability to print variables values at specific points in code. I’ve found on StackOverflow an interesting discussion which points out Debug.Trace.trace as a simple function to do this.

It’s an impure function and also messes up with lazy evaluation, so it’s recommended only for debugging purposes. It can be used in a neat way. Say we have a function

someFunction x y = x + y 

and we want to print the contents of x and y during runtime. We can just add one line, with minimal modification to existing code:

someFunction | trace ("Value of x: " ++ x ++ " and y: " ++ y) False = undefined
someFunction x y = x + y

Because trace prints its first argument and returns the second, we basically using this syntax

someFunction x y | False = undefined
someFunction x y = x + y

We’ll try to do the pattern matching with the first form, but since it returns False, we’ll end up executing the second form of someFunction(). Another option is to create a standalone print() function to print a given value. For example,

print x = trace ("Value of x: " ++ x) x

With this trick in our toolkit, we can inspect which addresses returned by getAddrInfo() in the client by adding

traceAddrs :: [AddrInfo] -> [AddrInfo]
traceAddrs addrs = trace (intercalate ", " (map (show . addrAddress) addrs)) addrs

When running the client code again, we get the following output:

[::1]:1514, [::1]:1514,,

The first two values, “::1“, represent an IPV6 address (0000:0000:0000:0000:0000:0000:0000:0001). According to Wikipedia,

Consecutive sections of zeroes are replaced with a double colon (::). The double colon may only be used once in an address, as multiple use would render the address indeterminate

Since we pick up the first address returned by getAddrInfo, we’re using IPv6 to connect to the server. We can force it to use IPv4 by passing the AF_INET flag:

addrinfos <- getAddrInfo
    -- set it to use IPv4
    (Just (defaultHints {addrFamily = AF_INET}))
    (Just port)

We can now run the client and send a message, and it will successfully be sent to the server.

Doing a similar investigation on the server code, we get:,, [::1]:1514, [::1]:1514, 

Since we’re picking the head of the list, the server is actually listening on an IPv4 address. We can force it to use IPv6 by passing the AF_INET6 flag.

addrinfos <- getAddrInfo
    (Just (defaultHints {addrFlags = [AI_PASSIVE], addrFamily = AF_INET6}))
    Nothing (Just port)

Now the server can listen to requests of both IPv4 and IPv6 clients. Mystery solved!

Client-server using TCP

Server: Multi-threaded Listening

There are a couple of differences between the TCP and UDP server.

1. The socket type we use is Stream instead of a Datagram.

2. Second, we call the listen function, which marks the socks as accepting connections. The second argument is the maximum size of the connection queue:

listen sock 5

3. Instead of recvFrom(), we then call accept, which picks the first of the pending connections in the queue, and creates a new socket. The server then spawns a new thread to handle that socket, so that the main thread can continue processing more connections.

procRequests :: MVar () -> Socket -> IO ()
procRequests lock mastersock = do
  (connsock, clientaddr) <- accept mastersock
  forkIO $ procMessages lock connsock clientaddr
  procRequests lock mastersock

4. Use a file handle instead of a socket. Because we keep a stick connection, we can use a file handle to abstract the reading from the socket.

Each thread reads the message from the connection

-- Converts a socket (connsock) to a handle
connhdl <- socketToHandle connsock ReadMode
-- Set handle to buffering mode
hSetBuffering connhdl LineBuffering
-- Read contents
messages <- hGetContents connhdl
-- Print messages
mapM_ (handle lock clientaddr) (lines messages)
-- Close connection
hClose connhdl

Here we use an MVar as a lock to guarantee that at most one thread is writing to stdout at a time. Otherwise we would see messages from different threads mixed up. This is the exact same approach we used in our Haskell Concurrent Programming post, when talking about using MVar as a lock.

Client: Sticky connection

Our TCP client also looks similar to the UDP counterpart, with a couple of differences.

1. As we did with the TCP server, we use Stream instead of Datagram.

2. We also mark the socket as keep-alive:

setSocketOption sock KeepAlive 1

which is basically telling the OS to periodically send packages to probe the server we’re connected to. This serves both as a check to see if the server is still alive or to prevent the connection from being dropped due to inactivity [4].

3. We establish a stick connection with the server:

connect sock (addrAddress serveraddr)

4. As in the TCP server, we use a file handle instead of a socket:

h <- socketToHandle sock WriteMode

which provides us using common IO file functions like hPutStrLn().

Every time we type a line, we want to send that string to the server. In the code below, we write a line to your file and flush it so it is sent to the server immediately.

hPutStrLn (slHandle syslogh) sendmsg
hFlush (slHandle syslogh)

5. Keep sending read lines from stdio until EOF

I’ve added a simple main function to the code so we can compile the client code into a binary, and also added a function, readData(), to read lines from stdio until we send an EOF character:

import Control.Monad
readData :: SyslogHandle -> IO ()
readData h = do
  done <- isEOF
  unless done readLine
    readLine = do
                 message <- getLine
                 syslog h USER INFO message
         	 readData h

main = do
  h <- openlog "localhost" "1514" "syslogtcpclient"
  readData h
  closelog h


Given that we have our binaries, I’ve started a server first and then ran two client binaries. I was able to type messages in each of the clients and verified the server was handling them properly.


Writing this post, I’ve learned about network programming and debugging in Haskell. I’ve had classes about network programming back in college, but it didn’t seem fun at the time. When we study things for our own curiosity, it’s much more interesting.

Also, in studying this chapter, I’ve tried using a more “curious mindset”, always questioning why things are this way or another, and this forced me to do more research and learning things beyond those the book provided.


[1] Real World Haskell – Chapter 27. Sockets and Syslog
[2] Oracle Java – What Is a Datagram?
[3] StackOverflow – Socket Protocol Fundamentals
[4] TCP Keepalive HOWTO

Haskell Concurrent Programming



In this post we’ll talk about concurrency and parallelism in Haskell. This post is a set of notes from Chapter 24 of Real World Haskell and Chapter 2 and Chapter 3 from Parallel and Concurrent Programming in Haskell.

Concurrency and parallelism are similar concepts, and it’s hard to draw a line between their definitions. In general, concurrency represents the ability of an application handling multiple (concurrent) tasks at the same time. It doesn’t require the underlying hardware to have multiple cores, so even applications running on single core can implement concurrency via threads.

Parallelism, in other hand, usually means breaking a slow task in smaller pieces that can be computed in parallel in order to speed the elapsed time. To be worth it, it requires multiple-cores or multiple machines.

In an operating system context, a process represents an instance of a program. It has a self-contained environment (e.g. memory space) and contains one or more threads. Threads can be thought as lightweight processes that share the environment with other threads in the same process [4].

Haskell threads

Forking. The module Control.Concurrent implements a thread system. The program starts out with a single thread (referred as main thread) and we create new threads from the current thread using a fork.

We can use the forkIO function, which takes an IO action and runs that in a separate thread. The remaining code continues running on the main thread. Because of the concurrent nature of threads, the system doesn’t guarantee order of execution between those. In the example below, we fork the main thread to print “hello” and the the main thread prints “world”.

import Control.Concurrent

forkExample = do
  forkIO $ print "hello"
  print "world"

If we run this code many times we’ll get varying results from "hello""world" to "wor"lhde"llo" (printing the whole sentence is not atomic because Haskell uses lazy evaluation – we’ll see next how to avoid this behavior).

Mutable variable or MVar is a structure provided by the Control.Concurrent module. It can either have a value or be empty. If we try to write (putMVar) to a MVar that already has a value, the writing thread blocks until some other thread extracts (takeMVar) the current value from that MVar. On the other hand, if a thread tries to read from an empty MVar, it blocks until another thread puts a value into it.

We can use MVars to synchronize threads. In the fork example, there were no guarantees in the order of execution, but in the following example, we can guarantee the child thread will only execute putStrLn after putMVar is executed.

mvarExample = do
  m <- newEmptyMVar
  forkIO $ do
    v <- takeMVar m
    putStrLn ("received " ++ show v)
  putStrLn "sending"
  putMVar m "wake up!"

The MVar provides a mutex (mutual exclusion) for a piece of data (the "wake up" string), since only one thread can read the value from the MVar. They also provide a communication mechanism between two threads.

Channel or Chan, can be viewed as a generalization of a MVar. In a channel, a thread doesn’t block when inserting a value. It rather enqueues that value. When a thread reads from the channel, it pops an element from the front of the queue, or blocks if the queue is empty.

We can think of Chan as containing a queue of infinite size, whereas MVar has a queue of size 1.

In the following example, we have two children threads writing to a channel and the main thread reading from them. We guarantee the program will only finish after but “hello world” and “now i quit” are printed, but the order is not guaranteed.

chanExample = do
  ch <- newChan
  forkIO $ do
    putStrLn "thread 1"
    writeChan ch "hello world"
  forkIO $ do
    putStrLn "thread 2"
    writeChan ch "now i quit"
  putStrLn "main thread"
  readChan ch >>= print
  readChan ch >>= print

MVar as a lock. If we run the code above, chances are the output will come out mingled, like

thrtmehaariden a 1dt
"hello world"
"now i quit"

We can make the print function atomic by using an MVar and require the threads to “acquire” it in order to invoke the print function. The following example creates an atomic version of putStrLn using the MVar lock:

atomicPutStrLn :: MVar a -> String -> IO ()
atomicPutStrLn lock msg = withMVar lock $ (\_ -> putStrLn msg)

The function withMVar() takes an MVar and a function. It extracts the value of the MVar (lock in the example above), executes the function with that value and puts the value back into the MVar. In this case, since we’re using it just for locking, we don’t care about the value inside the MVar.

We can replace putStrLn with atomicPutStrLn in our previous code:

chanExampleStrict = do
  ch <- newChan
  lock <- newMVar ()
  forkIO $ do
    atomicPutStrLn lock "thread 1"
    writeChan ch "hello world"
  forkIO $ do
    atomicPutStrLn lock "thread 2"
    writeChan ch "now i quit"
  atomicPutStrLn lock "main thread"
  readChan ch >>= print
  readChan ch >>= print

Note how we create the MVar lock with a dummy value and use it for the atomicPutStrLn calls [5].

Parallel Programming

Setup. By default Haskell only uses a single core, even if more are available. To turn on multi-threaded support, we must use the -threaded flag during compilation.

Then, in runtime, we can provide the -Nx flag to the RTS (Run Time System) when running the program (where x is a natural number representing the number of cores).

One challenge in parallelizing Haskell code is due to lazy evaluation. We have less control on when part of the code will be actually evaluated so we need to impose some strictness to guarantee it will be executed in parallel.

Weak head normal form vs. normal form

This answer on StackOverflow gives a very nice explanation between Weak Head Normal Form (WHNF) and Normal Form (NF). Copying parts of it here.

An expression in normal form is fully evaluated, and no sub-expression could be evaluated any further (i.e. it contains no un-evaluated thunks).

These expressions are all in normal form:

(2, "hello")
\x -> (x + 1)

These expressions are not in normal form:

1 + 2                 -- we could evaluate this to 3
(\x -> x + 1) 2       -- we could apply the function
"he" ++ "llo"         -- we could apply the (++)
(1 + 1, 2 + 2)        -- we could evaluate 1 + 1 and 2 + 2

An expression in weak head normal form has been evaluated to the outermost data constructor or lambda abstraction (the head). Sub-expressions may or may not have been evaluated.

(1 + 1, 2 + 2)       -- the outermost part is the data constructor (,)
\x -> 2 + 2          -- the outermost part is a lambda abstraction
'h' : ("e" ++ "llo") -- the outermost part is the data constructor (:)

These expressions are not in weak head normal form:

1 + 2                -- the outermost part here is an application 
                     -- of (+)
(\x -> x + 1) 2      -- the outermost part is an application of 
                     -- (\x -> x + 1)
"he" ++ "llo"        -- the outermost part is an application of (++)

We can analyze whether a given expression has been evaluated in ghci using the :sprint command. It prints the contents of a variable if it was already evaluated of “_” otherwise. For example:

> let a = 1 + 2
> :sprint a
a = _
> print a  -- evaluates a
> :sprint a
a = 3

This is useful to understand how things get evaluated. Another interesting example is working with lists.

> let a = [1, 2, 3, 4, 5]
> :sprint a
a = _
> length a
> :sprint a
-- We know the length but don't have to evaluate the contents
a = [_, _, _, _, _]
> sum a
> :sprint a
-- To perform a sum, we have to go over all elements
a = [1, 2, 3, 4, 5]

Your results may vary, since Haskell might decide to perform more than the bare minimum evaluation.

The seq function takes two arguments (seq x y) and before y is evaluated to the WHFN, x is also evaluated to WHFN. The par function (from the Control.Parallel module) is similar to seq but it also tries to evaluate the first argument in parallel.

The compiler might decide to evaluate the second argument of seq before the first if it thinks it would improve performance. pseq is a stricter version in which the first argument is always evaluated first.

WHFN might not be enough for parallelizing. To make sure we’re splitting an expensive task among cores, we need to force full evaluation (normal form). To see why, consider the following example, in which we want to parallelize the map function:

parallelMap :: (a -> b) -> [a] -> [b]
parallelMap f (x:xs) = let r = f x
		       in r `par` r : parallelMap f xs
parallelMap _ _      = []

If b has a nested structure, it’s not guaranteed that calling r par will fully evaluate r, since it only guarantees to evaluate the outermost constructor. To overcome that, one option is to use evaluation strategies.

Evaluation Strategies

Before talking about strategies, let’s introduce the Eval monad, provided by Control.Parallel.Strategies:

data Eval a
instance Monad Eval

runEval :: Eval a -> a

rpar :: a -> Eval a
rseq :: a -> Eval a

The rpar and rseq are the counterpart of par and pseq respectively. rpar indicates its argument can be evaluated in parallel (non-blocking), while rseq forces the evaluation of its arguments before continuing (blocking).

Using this monad, our parallelMap function would become:

parallelMap :: (a -> b) -> [a] -> Eval [b]
parallelMap f (x:xs) = do
  r  <- rpar  (f x)
  rs <- parMap f xs
  return (r:rs)
parallelMap _ _ = return []

Strategy is a design pattern in which we make the code independent of algorithms, so different algorithms can be used interchangeably. The Control.Parallel module uses it to separate the evaluation strategy from the application code.

In this context, a strategy is basically a function that takes in a type and defines a way to evaluate that type. More specifically,

type Strategy a = a -> Eval a

A simple example is defining a strategy for a pair. We can evaluate each element in parallel:

parPair :: Strategy (a,b)
parPair (a,b) = do
  a' <- rpar a
  b' <- rpar b
  return (a',b')

Note that a strategy “wraps” expression in the Eval monad. We can use runEval to “extract” that and evaluate the code. For the example with pairs, we could do:

runEval (parPair (fib 35, fib 36))

The using function expects a value and a strategy as parameter, apply the strategy over the value and then evaluates it using runEval.

using :: a -> Strategy a -> a
x `using` s = runEval (s x)

This syntax is easier to read and is more explicit about the separation of the evaluation strategy and the actual code. The pair example with using would be:

(fib 35, fib 36) `using` parPair

The problem with rpar and rseq is that they only force evaluation to Weak Head Normal Form. Thus, if the elements within the pair are nested in another constructor, they might not be fully evaluated. To solve that, we can use different evaluation strategies other than rpar.

In order to do that, we can step up one level of abstraction and generalize parPair by defining a function that defines which strategies to evaluate to each pair:

evalPair :: Strategy a -> Strategy b -> Strategy (a,b)
evalPair sa sb (a,b) = do
  a' <- sa a
  b' <- sb b
  return (a',b')

We can redefine now parPair in terms of evalPair

parPair :: Strategy (a,b)
parPair = evalPair rpar rpar

We can then use the rdeepseq which expect types implementing the NFData (NF is for normal form) interface. This strategy evaluates a structure to normal form, by traversing it recursively, by calling force.

import Control.DeepSeq

rdeepseq :: NFData a => Strategy a
rdeepseq x = rseq (force x)

parPairDeepSeq :: NFData a, NFData b) => Strategy (a,b)
parPairDeepSeq = evalPair rdeepseq rdeepseq


In this post we covered concurrency and parallelism in Haskell. We learned how to work with multi-threads using the Control.Concurrent module, which also provides mechanism for mutual exclusion and communication between threads.

In the second part of the post, we saw how to make use of multiple cores to speed up expensive parts of the code by dividing the task in smaller pieces, that can be then executed in parallel. We learned that one of the main difficulties in doing parallel work in Haskell is due to lazy evaluation. We’ve covered ways to address that problem using evaluation strategies, which are designed in such a way that they are decoupled from the actual code being parallelized.

I’ve heard about Parallel and Concurrent Programming in Haskell, written by Simon Marlow before, but didn’t have a chance to check it out. Having read the first 3 chapters so far, I think it’s really well written and easy to follow. I’m excited to read more.


[1] Real World Haskell, Chapter 24
[2] Parallel and Concurrent Programming in Haskell, Chapter 2, 3
[3] Design Patterns: Elements of Reusable Object-Oriented Software – Strategy
[4] Oracle – Processes and Threads
[5] StackOverflow – Can I ensure that Haskell performs atomic IO?

Haskell Profiling and Optimization

I’ve been practicing my Haskell skills by solving programming challenges. During my ACM-ICPC competitor days, I used to practice a lot on SPOJ. The good thing about SPOJ is that it accepts submissions in many many languages (most other sites are limited to C, C++, Pascal and Java).

There’s a Brazilian fork of SPOJ, called SPOJ-Br. I preferred using this one because it contains problems of Brazil’s national high school contests, which are usually easier.

The problem

One of the problem I was just trying to solve boils down to: Given a list of numbers, one in each line from stdin, read the first line as N. Then return the sum of the next N lines. Repeat until N = 0.

This problem is pretty straightforward to do in C++, but for the Haskell version I started getting time limit exceeded (when you program takes more time than the problem setters expected). I had no clue on what was causing this, but I happened to read the Profiling and Optimization chapter from Real World Haskell.

This chapter is particularly well-written and introduces a lot of new concepts and tool, which will be the focus of this post.

The program

The code to solve the aforementioned problem is quite simple: we convert all lines to an array of int’s (using (map read) . lines,), read the first line as n, take the first n entries and recurse for the remaining of the entries:

main = interact $ unlines . f . (map read) . lines

f::[Int] -> [String]
f (n:ls)
  | n == 0    = []
  | otherwise = [show rr] ++ (f rest)
     where (xs, rest) = splitAt n ls
           rr = sum xs
f _ = []

For this code, I’ve generated a random input of 10 chunks of 100k entries, for a total of 1M lines. Running it with the following command:

time ./test_unline_line arq.out

Resulted in:

real 0m23.852s
user 0m23.699s
sys 0m0.147s

Now, let’s profile to get more details.

Setting up profiling

To generate profiling traces from our program, we need to compile it using some extra flags. Suppose our source code is named program.hs. We can run:

ghc -O2 prog.hs -prof -auto-all -caf-all -fforce-recomp -rtsopts

Like in gcc, ghc has different levels of optimization, and in this case we’re using O2. The -prof tells ghc to turn on profiling.

When profiling our code, we need to specify the cost centers, that is, pieces of code we want to inspect. A way to do that is annotating functions. For example, we could annotate an existing function as follows:

foo xs = {-# SCC "foo" #-} ...

alternatively, we can use the -auto-all flag for adding automatic annotations to all functions, which is also less intrusive (no code modifications).

Haskell memoizes functions with no arguments, so even if we invoke then multiple times in the code, they’re only evaluated once. These functions are called Constant Applicative Forms, or CAF

We can turn on the profiling for CAF’s using the option -caf-all.

-fforce-recomp will force recompilation. ghc might not compile the file again if the source code didn’t change, but if we’re playing with different flags, then we want to force it.

Haskell compiled code is linked to a run time system (RTS) which offers a lot of options. To be able to provide this options in when running a program, we have to use the flag -rtsopts.

Static report

A quick way to gather some statistic from the running program is by passing the -sstderr flag to the RTS, so we can do

time ./prog +RTS -sstderr <

Which generated:

  18,691,649,984 bytes allocated in the heap
   1,300,073,768 bytes copied during GC
       7,155,200 bytes maximum residency (240 sample(s))
         335,312 bytes maximum slop
              21 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 35414 collections,     0 parallel,  0.76s,  0.81s elapsed
  Generation 1:   240 collections,     0 parallel,  0.35s,  0.39s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   11.89s  ( 11.97s elapsed)
  GC    time    1.10s  (  1.20s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   12.99s  ( 13.18s elapsed)

  %GC time       8.5%  (9.1% elapsed)

  Alloc rate    1,572,435,294 bytes per MUT second

  Productivity  91.5% of total user, 90.2% of total elapsed

With this report, we can see things like the amount of memory used and the time spent by the garbage collector.

Time and Allocation Profiling Report

If we want a break-down by cost centers, we can run our program with the -p argument. This will generate a file:

time ./prog +RTS -p <

	Sat Sep 27 16:25 2014 Time and Allocation Profiling Report  (Final)

	   test_unline_line +RTS -p -RTS

	total time  =        0.34 secs   (17 ticks @ 20 ms)
	total alloc = 10,736,138,528 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

main                           Main                 100.0   98.8
f                              Main                   0.0    1.2

                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
 main                    Main                                                 242           2 100.0   98.8   100.0  100.0
  f                      Main                                                 243          11   0.0    1.2     0.0    1.2
 CAF                     Text.Read.Lex                                        204           4   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Handle.FD                                     174           3   0.0    0.0     0.0    0.0
 CAF                     GHC.IO.Encoding.Iconv                                135           2   0.0    0.0     0.0    0.0
 CAF                     GHC.Conc.Signal                                      128           1   0.0    0.0     0.0    0.0

From the above, we can see most of the time is being spent on the main function.

Graphic Allocation Profiling Report

The report above is useful for profiling the overall time and memory allocation in the program. We can also see a time-series of heap allocation. We can break down by different dimensions, one common is by cost-center, which is done simply by adding the -hc flag:

time ./prog +RTS -p -hc <

This will generate a prog.hp file (heap profile). In our case, the file contained only a few samples (points), which might not give a good picture of the memory behavior. We can provide another parameter -iP, where P is the period of sampling. Doing it with P=0.01,

time ./prog +RTS -p -hc -i0.01 <

We get much more samples. Now we can use a tool to parse this data into a chart using gnuplot. The output format is post script. In my case, I find it better to use a pdf viewer, so I’ve used another tool, ps2pdf :)

$ hp2ps -c prog.hp
$ ps2pdf

The -c option tells hp2ps to use colors in the charts (it’s grayscale by default). After opening the generated prog.pdf in our favorite pdf reader we get:

Figure 1: Heap allocation

Figure 1: Heap allocation

The spikes in the chart represent the chunks of lists. Ideally haskell could be completely lazy and stream the lists instead of loading it up into memory.

Optimizing our code: Bang Patterns

One way to avoid loading up the entire list in memory is to write the sum function as an accumulator. In the example below, g' will recursively accumulate the sum of each chunk and append the result with the results of the remaining chunks:

main = interact $ unlines . g . (map read) . lines

g::[Int] -> [String]
g (n:ls)
  | n == 0    = []
  | otherwise = g' n ls 0
g _ = []

g' n (l:ls) cnt
  | n == 0 = [show cnt] ++ (g (l:ls))
  | otherwise = g' (n-1) ls (cnt + l)

If we run this code this won’t actually improve the memory footprint comparing to the previous attempt. The reason it that at each recursion call, because we’re doing it lazily, we keep a reference to the variable cnt until we reach the end of the chunk to compute the sum.

A way to force strictness for cnt is using a language extension, the Bang Patterns. We just need to do two changes in the code above: add the macro at the top of the file and in the g' function definition, use !cnt. This will force cnt to be evaluated strictly (instead of lazily).

{-# LANGUAGE BangPatterns #-}
main = interact $ unlines . g . (map read) . lines

g::[Int] -> [String]
g (n:ls)
  | n == 0    = []
  | otherwise = g' n ls 0
g _ = []

g' n (l:ls) !cnt
  | n == 0 = [show cnt] ++ (g (l:ls))
  | otherwise = g' (n-1) ls (cnt + l)

The resulting heap graph shows the benefits of doing this:

Figure 2: Heap allocation - optimized version

Figure 2: Heap allocation – optimized version

Note that now the maximum memory usage is 45K, 2 orders of magnitude less than the 3M of our initial version. Even though changing the code we managed to use less memory, it didn’t improve the runtime significantly (both were around 4 secs). It’s time to investigate other tools and strategies.

The core

Another idea to improve the performance of a program is to get the optimized code that is generated by the ghc compiler before transforming it into imperative machine code, which is know as the core. It’s still a valid Haskell syntax, but it’s very hard to read. In [1], the authors suggest investigating the core to identify where the compiler is not optimizing properly and then modify the original code to help it.

We can generate the core code by compiling with the following instructions:

$ ghc -O2 -ddump-simpl prog.hs

The problem is that this generates a inlined code, which makes it hard to understand what’s going on. I couldn’t get any good idea from the core from prog.hs, but we can take a learn a bit how to better interpret it. There are some interesting constructions here:

Function annotations. Every function generated has annotations, which make it harder to read:

Main.f [...]
[GblId, Arity=1, Caf=NoCafRefs, Str=DmdType S]

According to [2], those are used by ghc for later stages of compilation. For inspecting the code though, we could remove the annotations.

Primitive types and boxed types. Most Haskell types are high-level abstractions that only point to data in the heap, but to not contain the actual values. For one side, it leads to cleaner programs and simpler APIs, but on the other hand it adds an overhead. When optimizing code, the compiler will try to convert types to the primitive versions (unboxing), so it’s rare that we’ll need to work with primitive types directly.

By convention primitive types are ended on the # sign. For example, comparing to C primitive types in parenthesis, we have Int# (long int), Double# (double), Addr# (void *).

We also see the GHC.Prim.State# type. GHC.Prim contains a collection of unboxed types. In particular, the State monad has a primitive type and it appears in the generated code because the IO monad (used in the function main through interact) can be written in terms of the State monad [2].

Gonzalez studies the core in more depth in this blog post, especially in regards to the IO monad. It’s an interesting read.

Strings vs. Bytestrings

Still stuck in the running time problem, I decided to ask a question on Stack Overflow and even though the main question was not answered, someone suggested using ByteStrings instead of Strings.

Chapter 8 of Real World Haskell actually talks about ByteStrings as a cheaper alternative to Strings [1]. The reason is that Strings are essentially a List of Chars, so we have 2 layers of abstraction, while ByteString is a data structure specialized for strings.

One limitation of ByteStrings is that it only works with 8-bit characters. For Unicode handling, the most common alternative is Data.Text, which also has a low overhead compared to String and can handle Unicode [3].

Converting the code was basically changing the Prelude function calls dealing with Strings to use the ByteStrings. Since most of the functions have the same name, we had to qualify them.

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

main = BC.interact $ BC.unlines . f . (map (getInt . BC.readInt)) . BC.lines

getInt::Maybe (Int, BC.ByteString) -> Int
getInt (Just (x, _)) = x
getInt Nothing       = 0

f::[Int] -> [B.ByteString]
f (n:ls)
  | n == 0    = []
  | otherwise = [(BC.pack . show) rr] ++ (f rest)
     where (xs, rest) = splitAt n ls
           rr = sum xs
f _ = []

Running this code yields a running time of 0.4s, roughly 10x faster than the String version. This was enough to make this program pass in the online judge.


This post was motivated by a slow running program when solving a programming challenge. While looking for ways to improve it, we learned some techniques for profiling code. We also learned about the core, which is a optimized (and hard to read) Haskell code generated by ghc during the compilation.

It turned out it the improvement necessary to speed up the code was only swapping String with ByteString. Henceforth, I’ll make sure to always use ByteStrings, at least when writing programming challenges solutions.

Upon writing this post, I stumbled into Zyang’s posts, which seem to delve into great detail on the core functionality. I didn’t have time to read, but I’ve bookmark those for future reading: Unraveling the mystery of the IO monad and Tracing the compilation of Hello Factorial!.


[1] Real World Haskell – Chapter 25: Profiling and optimization
[2] Haskell for all – “Hello, core!”
[3] School of Haskell – ByteString Bits and Pieces

Monad Transformers

In this post we’ll talk briefly about Monad Transformers. Chapter 18 from the Real World Haskell inspired this, but as usual, it was a bit hard for me to digest. The best source so far for this subject was the Haskell wiki [1].


In a high-level, monad transformers are monads generated by combining monads into a new one (thus transforming monads in monads).

Intuitively, it does so by using the analogy of wrapping, as we have for Monads, but monad transformers wraps monads inside monads. Thus, Dan Piponi makes an analogy of onion layers [3]. The idea of transformers is to avoid boilerplate in common scenarios where two monads are used in conjunction.

We’ll present some monad transformers, all follow the pattern where two monads are combined, the first one is fixed, and the other is generic. The fixed monads are Maybe, List and State and their correspondent transformers are called MaybeT, ListT and StateT, respectively. The Writer and Read monads also have corresponding transformers, but we’re not talking about them here.

As in [1], we’ll be more detailed in describing the MaybeT which is the simplest of our examples, and for the other two, we’ll limit ourselves to the definition and a brief explanation.

The MaybeT monad

Let’s start by recapping the Maybe monad, as seen in a previous post.

Review of the Maybe monad

The Maybe data type can be defined as follows:

data Maybe a = Nothing | Just a
 deriving (Show)

The implementation of the monad interface is the following:

instance Monad Maybe where
  return  = Just
  Just x   >>= f = f x
  Nothing  >>= f = Nothing

Remembering, return wraps an element in the monad and >>= is the bind operator, which takes an element wrapped in a monad, extracts it, applies a function f, which returns another element wrapped in the monad.

The MaybeT data type

Monads can contain other monads and one particular useful combination is of a monad containing the Maybe monad. While we can accomplish that by regular use of monads, we can also avoid some boilerplate code by having a special type that encodes this combination, in this case the MaybeT data type.

We can think of MaybeT data type as a 3-layer wrapping. The inner layer being the Maybe monad, then a generic monad and then the actual MaybeT wrapper.

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) } 

MaybeT is a Monad

MaybeT is also a monad. One possible implementation is:

instance Monad m => Monad (MaybeT m) where
  return  = MaybeT . return . Just
  x >>= f = MaybeT $ do maybe_value <- runMaybeT x
                        bindOfMaybe f maybe_value

bindOfMaybe f maybe_value = case maybe_value of
  Nothing    -> return Nothing
  Just value -> runMaybeT $ f value

Let’s break in parts:

(1) return = MaybeT . return . Just

The first part is Just, which encapsulates the inner element in the Maybe monad. The second return encapsulates in the generic monad m and finally we encapsulate in the MaybeT monad.


x >>= f = MaybeT $ do maybe_value <- runMaybeT x
                      bindOfMaybe f maybe_value

The type signature is given by:

(>>=) :: MaybeT m a -> (a -> MaybeT m b) -> MaybeT m b

The bind operation has to do the opposite operation first, that is,
de-encapsulate the three layers of monads before running f on it.

Then, we need to encapsulate into Maybe, m and MaybeT again

Alternatively, we can use the chained notation:

x >>= f = MaybeT $
            runMaybeT x >>=
              \maybe_value -> bindOfMaybe f maybe_value

The ListT monad

Let’s review the list [] monad, as we saw in a previous post.

instance Monad [] where
  return x = [x]
  xs >>= f =
      let yss = map f xs
       in concat yss

The idea is very similar to the Maybe monad, we wrap the list in two other layers, the intermediate one being a generic monad. The implementation of the monad class type is essentially the same, except that, again, we have to do extra wraps and unwraps:

newtype ListT m a = ListT { runListT :: m [a] }

instance (Monad m) => Monad (ListT m) where
  return x = ListT $ return [x]
  tm >>= f = ListT $ do xs  <- runListT tm
		        yss <- mapM (runListT . f) xs
                        return (concat yss)
  -- Alternatively
  -- tm >>= f = ListT $ runListT tm
  --                      >>= \xs -> mapM (runListT . f) xs
  --                        >>= \yss -> return (concat yss)

The StateT monad

We’ve talked about the State monad before. It can be defined in the following way:

newtype State s a =
    State { runState :: (s -> (a,s)) }

And the monad implementation is given by:

instance Monad (State s) where
    return a        = State $ \s -> (a,s)
    (State x) >>= f = State $ \s ->
          let (v,s') = x s
                  in runState (f v) s'

The idea of combining it with another generic monad and wrapping it, leads to a analogous idea to the List/ListT classes. Let’s define the StateT class:

newtype StateT s m a =
    StateT { runStateT :: (s -> m (a,s)) }

In this case, the generic monad wraps the result of the previous function s -> (a, s). The monad implementation is similar to the State one, except that we have to take into account the extra layer:

instance (Monad m) => Monad (StateT s m) where
  return a         = StateT $ \s -> return (a,s)
  (StateT x) >>= f = StateT $ \s -> do
    (v,s') <- x s
    runStateT (f v) s'

For the return definition, we wrap (a, s) in using the m monad, by the use of return function of m.

The bind operator, x applied to s will return m (a, s). We extract if from m by using the <- operator and then run the function on a, but since (f v) returns the result wrapped in StateT, we need to extract it using runStateT, and finally wrap into m again and then into StateT.

The MonadTrans interface

All the monad transformers can implement the MonadTrans interface, which basically defines the function lift.

ghci> :m +Control.Monad.Trans
ghci> :info MonadTrans
class MonadTrans t where lift :: (Monad m) => m a -> t m a
  	-- Defined in Control.Monad.Trans

Lift is a generic version of liftM, in a sense it allows a function that only applies to the inner element to be applicable to the top-level monad. The implementation of the MaybeT monad is the following:

instance MonadTrans MaybeT where
    lift = MaybeT . (liftM Just)

Stack of monads

After understand better the concept of Monad transformers, Chapter 18 from Real World Haskell [2] becomes easier to digest and it’s quite interesting.

At one point, it discusses the real power of combining multiple monad transformers (for example, the generic monad in MaybeT could be another monad transformer, say WriterT). This “equips” a given data type with traits corresponding to the underlying monads (in the example the optional nature of Maybe and the logging capabilities of the Writer monad).


[1] Wikibooks: Haskell/Monad
[2] Real World Haskell: Chapter 18. Monad transformers
[3] A Neighborhood of Infinity: Grok Haskell Monad Transformers

An Introduction to the Parsec Library


The process of compiling or interpreting programs requires, as one of its steps, parsing the source code and structuring them. More specifically, we have a step to convert a string into a set of tokens, called lexical analysis, which is carried out by a lexer or tokenizer.

After that, we structure those tokens in a such a way that encodes meaning of these tokens, such as abstract syntax trees (AST). This step is called parsing and is out by a parser.

The parsec library

Parse combinators are high-order functions that combine smaller combinators (parsers) to build more complex ones. This resembles the idea of context free grammars (CFG), which we talked about in a previous post, where we have productions of form

S \rightarrow A | B

Here, S can be formed by either from A or B, which in turn, can be other productions or terminals.

The parsec library is an implementation of a parser combinator in Haskell. We talked about combinators in Haskell previously (in portuguese).

Parsec vs. Bison/Yacc/Antlr

All Bison, Yacc and Antlr are not actual parsers, but rather parsers generators. They take a grammar file and generate parsers for the languages that can be described by those grammars.

Parsec, on the other hand, is a parser that you write yourself.

In this post we’ll go through several basic concepts of the parsec library, using as the main reference, the book Real World Haskell, Chapter 16.

The source code for all the examples to follow can be found on the blog’s github repository.

Basic combinators

The basic idea of a parser is that it takes an input (for example, a string), it consumes the characters of this string until it’s done with what it’s supposed to parse and then pass the remaining (unparsed) string along, which might be used as input to subsequent parsers.

One of the simples parsers is the one that only consumes a single specific character. There’s a parser named char at Data.Char library, so let’s write one that parses the letter a:

import Data.Char (char)
charAParser = (char 'a')

To test our parser with an input, we use the parse function form Text.Parsec

import Text.Parsec
-- parsedInput = parse someParser "source name" "some input"

This function takes as input a parser, a source name and the actual input. The source name parameter is not important for us now, so we can pass the empty string. Let’s write a simple wrapper to avoid boilerplate:

test p = parse p ""

We can now test our parser with same sample inputs:

> test charAParser "a"
Right 'a'
> test charAParser "ab"
Right 'a'
> test charAParser "ba"
Left (line 1, column 1):
unexpected "b"
expecting "a"

It extracts the first character of the input string if it’s the 'a' character, otherwise it throws an error. If we want to match any char, there’s also the function anyChar. Running it with the same examples:

> test anyChar "a"
Right 'a'
> test anyChar "ab"
Right 'a'
> test anyChar "ba"
Right 'b'

Note that it doesn’t fail for strings starting with 'b'. So far our parsers only match one character, so for example, the string "ab", it only returns the first character.

We can use a parser for the string too. There’s the string combinator but let’s develop our own and show how we can combine combinators to form new ones. There’s the many combinator that applies the combinator passed as argument until it fails.

Thus, we can write a string parser as many anyChar:

stringParser:: Parsec String st String
stringParser = many anyChar

Now let’s try it with the string "ab":

> test stringParser "ab"
Right "ab"

More useful than matching all characters is matching all except some, so we know when to stop parsing. For that, we can use noneOf instead of anyChar. It takes a list of characters as parameter and matches any character that is not on that list.

Let’s now write a wordParser, which keeps parsing all characters until it finds an whitespace:

wordParser:: Parsec String st String
wordParser = many $ noneOf [' ']

Let’s try it on the most classical string example:

> test wordParser "hello world"
Right "hello"

Note that our parsers are throwing away all the unparsed strings. How can we parse the remaining, unparsed string?

The two flavors of the Parsec library: Monads and Applicatives Functors

We’ve talked about Functors and Monads before, but not about Applicatives functors.

Intuitively, they are a structure in between Functors and Monads, that is, they’re more complex and general than Functors but less than Monads. We can also make the analogy of wrappers that we did for monads.

Originally, the Parsec library was written with Monads in mind, but Applicative functors were introduced after that and using them to write parsers usually leads to more clear syntax. So, in this post, we’ll use the applicative flavor to write our parsers.

Here, we’ll only provide an overview of some of the main applicative operators. For further details, the book Learn You a Haskell for Great Good has a nice introduction to Applicatives.

Operators Cheat Sheet. We can use the Maybe typeclass to illustrate the main applicative operators, since it implements an applicative functor.

() Unwrap the contents of both sides, combine them and wrap again

> Just (+3)  Just 9
Just 12

(*>) Unwrap the contents of both sides, but discard the result on the left

> Just (+3) *> Just 9
Just 9
> Just 7 *> Just 8
Just 8

(<*) Unwrap the contents of both sides, but discard the result on the right.

> Just 7 <* Just 8
Just 7

() Unwrap the contents of the right, combine the left and right arguments and return

> (+3)  Just 9
Just 12

(<$) Unwrap the contents of the right, but only wrap the one to the left

> 3 <$ Just 9
Just 3

This analogy of wrappers applied to parsers is not as natural though. In this case, we can think of unwrapping as executing the parser, by consuming the input and wrapping the result as getting the parsed token. The unparsed string is always carried over from parser to parser.

Hopefully with the next examples this will become clearer:

Parsing the second word

If we are to get the token from the second parser instead of the first, we need to execute both parsers but ignore the result of the first. Thus, we can use the operator (*>) to obtain something like

wordParser *> wordParser

This won’t quite work because the first parser doesn’t consume the whitespace, so the second parser will stop before consuming anything. We can fix that by consuming the whitespace:

wordParser *> (char ' ') *> wordParser

So let’s write:

secondWordParser:: Parsec String st String
secondWordParser = wordParser *> (char ' ')  *> wordParser

and now we can test:

> test secondWordParser "ab cd"
> cd

Parsing two words

We can also return both tokens if we use the operator () and then combine them into a list:

twoWordsParser:: Parsec String st [String]
twoWordsParser = listfy  wordParser  ((char ' ') *> wordParser)
                   where listfy a b = [a, b]

Parsing multiple words

Generalizing, we can parse multiple words with the aid of the many combinator:

wordsParser:: Parsec String st [String]
wordsParser = (:)  wordParser  many ((char ' ') *> wordParser)

We could actually write this using the sepBy1 parser, which parses a list of tokens separated by a separator and requires the list to have at least one element:

wordsParserAlt:: Parsec String st [String]
wordsParserAlt = sepBy1 (char ' ')

Simple CSV parser

With what we’ve seen so far, we can write a very basic CSV parser in 4 lines of code.

csvParser:: Parsec String st [[String]]
csvParser = lineParser `endBy` newline <* eof
              where lineParser = cellParser `sepBy` (char ',')
                    cellParser = many $ noneOf ",\n"

Note that it doesn't handle some corner cases like escaped commas within cells. For a full example, refer to either [1] or [2].

Choosing combinators

Recall that in Context Free Grammars, we can have production rules of the type:

S \rightarrow A | B

which means that S can be generated either from A or B. In Parsec, we can express this option using the () operator. Let’s write a simple parser that parses either the "cat" or "dog" strings:

dogCatParser:: Parsec String st String
dogCatParser = (string "dog")  (string "cat")

Testing on some inputs:

> test dogCatParser "dog"
Right "dog"
> test dogCatParser "cat"
Right "cat"
> test dogCatParser "elephant"
Left (line 1, column 1):
unexpected "e"
expecting "cat" or "dog"

Let’s write another example with different animal names:

camelCatParser:: Parsec String st String
camelCatParser = (string "camel")  (string "cat")

and try again with the input "cat":

> test camelCatParser "cat"
Left (line 1, column 1):
unexpected "t"
expecting "camel"

The parser failed because the strings have common prefix. It started matching the camel parser, but it also consumed the "ca" characters and then it failed to match the cat parser.

The try combinator

To avoid this, problem, there’s the try combinator, which will make a parser to not consume its input if it fails to match:

camelCatTryParser:: Parsec String st String
camelCatTryParser = try (string "camel")  (string "cat")

which works as expected:

> test camelCatTryParser "cat"
Right "cat"

We can see that it’s straightforward to convert a standard context free grammar into a haskell program using parsec.

Simple Expression Parser

So far we our parsers have only returned strings and list of strings. We can use data types to structure our parsed data in a way that is easier to evaluate later.

For our example, we’ll build a very simple parser for expressions that only contain + and - binary operators, terminals are all integers and all binaries are surrounded by parenthesis so we don’t have to handle precedence. Examples of valid expressions are "12", "(1+2)", "((3+4)-5)", whereas "1+2" is invalid (no parenthesis).

The first thing we want to do is to define our data types. Our number type, TNumber, is just an alias to Int. Our operator type, TOperator can be one of addition (TAdd) or subtraction (TSubtract). Finally, the expression is either binary (TNode) or a number (TTerminal).

type TNumber = Int

data TOperator = TAdd
               | TSubtract
                 deriving (Eq, Ord, Show)

data TExpression = TNode (TExpression) TOperator (TExpression)
                 | TTerminal TNumber
                   deriving (Show)

From what we’ve seen so far, it’s not very complicated to write parsers for TNumber and TOperator:

numberParser:: Parsec String st TNumber
numberParser = read  (many $ oneOf "0123456789")

operatorParser:: Parsec String st TOperator
operatorParser = chooseOp  (oneOf "+-")
                   where chooseOp '+' = TAdd
                         chooseOp '-' = TSubtract

For the expression we have two choices. Either we parse another expression enclosed in parenthesis or we parse a terminal. In the first case, we call the binaryExpressionParser which looks for the left expression, the operator and then the right expression.

expressionParser:: Parsec String st TExpression
expressionParser = (between (char '(') (char ')') binaryExpressionParser) 
                   (TTerminal  numberParser)

binaryExpressionParser:: Parsec String st TExpression
binaryExpressionParser = TNode  expressionParser  operatorParser  expressionParser

And that’s it! We can now run an example with a valid expression:

> test expressionParser "(123+(324-456))"
Right (JNode (JTerminal 123) JAdd (JNode (JTerminal 324) JSubtract (JTerminal 456)))

The advantage of having this AST is that it’s now very simple to evaluate:

evaluate:: TExpression -> TNumber
evaluate (TNode exp1 TAdd exp2)      = (evaluate exp1) + (evaluate exp2)
evaluate (TNode exp1 TSubtract exp2) = (evaluate exp1) - (evaluate exp2)
evaluate (TTerminal v)               = v

And the final test:

> let Right x = expressionParser "(123+(324-456))"
> evaluate x

It works! We implemented a simple parser and interpreter for a very limited arithmetic expression. There are much better tools to do expression parsing (see [5] for a tutorial), but it’s out of the scope of this post.


We’ve learned the basics of the Parsec library and built some non-trivial parsers gluing together basic parsers using combinators. We even started scratching the parsing of programming languages by writing a parser for arithmetic expressions.

The Parsec applications presented in the Real World Haskell book are great. I felt that the content was a bit hard to follow, but writing helped me get a better understanding of the subject.


[1] Real World Haskell – Chapter 16
[2] A gentle introduction to Parsec
[3] StackOverflow – Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?
[4] Wikipedia – Parser Combinator
[5] Haskell Wiki – Parsing expressions and statements

An Introduction to Agda


Per Martin-Löf is a Swedish logician, philosopher and mathematician statistician. Since 1970 he has been mainly publishing on logic.

Martin-Löf has been active in developing intuitionistic type theory as a constructive foundation of mathematics and his work on type theory has influenced computer science.

He completed his PhD program under supervision of Andrei Kolgomorov and is currently Professor at Stockholm University, being a member of the Royal Swedish Academy of Sciences.

In this post we’ll talk about assisted theorem proving, more specifically using a tool named Agda, which has it’s syntax based on Haskell and the engine is based on the Per Martin-Löf’s intuitionistic type theory.

We’ll first talk about the basic concepts such as assisted theorem proving and dependent type theory and then present a very brief introduction to Agda based on a tutorial from Ulf Norell and James Chapman [3].

Assisted Theorem Proving

We can classify theorem proving in the following categories:

1) Manual
2) Manual with machine support
3) Interactive
4) Automated

(1) is the oldest, most known and widely used method by mathematicians. (2) Since the introduction of computers, we now can have machines to verify syntax and make manual proofs less error prone. (3) We can also write proofs in a way the machine understands so that it can perform some logical deductions steps for us. (4) In the last extreme case, we can let the machine do all the job and find proofs by itself.

Dependent Types

Basically speaking, dependent type is a type that depends on a value. In typed programming languages we can have a type like vector T, representing a vector containing elements of type T, but in a dependent type language we could also have vector T n, representing vectors with elements of type T and length n. We’ll see an example for this type in the next section using Agda.


Agda is a dependent type language based on intuitionistic type theory, introduced by Martin-Löf. It was first implemented by Catarina Coquand (who is married to Thierry Coquand, which inspired the Coq tool, another famous assisted proof tool). Later it was rewritten as Agda 2 by Ulf Norell, one of the authors of the tutorial we’re going to follow.


The first thing we want to do before starting is to get Agda running. This can be done using the cabal package manager from Haskell:

> cabal install Agda

On Mac, this will install some binaries to ~/Library/Haskell. We want to use some of those binaries, so let’s include in our bin path:

export PATH="$HOME/Library/Haskell/bin:$PATH"

Agda has a very good integration with Emacs. On Mac we can make use of that integration in Aquamacs. To set up the right configuration into the .emacs file, we can now run:

> agda-mode setup

Main Concepts

Latex and Unicode. Agda uses makes use of unicode characters to make the code more readable and accepts LaTeX syntax to define them. For example, we can input \rightarrow which will be transformed into the character. Also there are shortcuts to avoid typing a lot (for example \r is expanded to \rightarrow)

Delimiters. Agda is very flexible in naming, by only considering very few characters as delimiter (including space, parenthesis and curly brackets). This means we can have names like :<a*, but then we have to be careful in adding space whenever we need to separate two variables.

Modules and name convention. Agda files have the extension .agda. For our tutorial, let’s create one called basic.agda. The first thing we want to do is to define the module, which is used to manage namespaces. By convention, the module name has to match the file name, so we write:

module basic where

Data Types. Our basic working blocks will be types and we can define them by data types, which have syntax similar to data types in Haskell. For example, we can model the boolean type, which we call Bool. In the following code we have the type named Bool which is a subset of Set (pre-defined) and we provide two constructors, both being of the type Bool.

data Bool : Set where
  true : Bool
  false : Bool

In the same manner we can define the set of natural numbers. Since it’s impossible to explicitly define all value like we did for booleans, we do so by defining a function suc recursively, which takes a Nat and returns its successor, like in the following example:

-- Natural numbers
data Nat : Set where
  zero : Nat
  suc  : Nat -> Nat

Note that we are saying the constructor suc represents a function that receives a Nat and return a Nat, but we are not providing the implementation for it.

So let’s write some functions to work with the Nat type. More specifically, addition and multiplication.

Functions. In Agda, when we write a function surrounded by underscores, like _+_, we’re defining a function that takes two arguments using infix notation. Let’s define the addition operation:

-- Addition
_+_ : Nat -> Nat -> Nat 
zero  + m = m
suc n + m = suc (n + m)

Note how we define everything in terms of the basic elements from Nat, that is, zero and suc, we have no notion of values. This is important because if we want to encode proofs as programs, we have to be generic and ‘value agnostic’. In this function we pattern match on different values of the first argument: If it’s zero, we are in the base case. Otherwise, the first argument must be a successor of some another value n, so we do recursion on n until it hit the base case.

Multiplication is analogous, but here we use the _+_ operator to define the second matched pattern:

-- Multiplication
_*_ : Nat -> Nat -> Nat
zero  * m = zero
suc n * m = m + n * m

The code above will cause an error because Agda requires strictly different priorities for different operators.

We can attribute some random values to these operators using infixl, which takes a priority value and a function. The higher the value, the higher the priority. Since we want multiplication to have higher priority than addition we can assign a higher number for it, for example:

-- Defining priority
infixl 6 _*_
infixl 4 _+_

Parametrized types. We can define a type that depends on other types, like parametrized types in Haskell or templates in C++. In this tutorial, we’ll create a list of elements of a given type A, as in the following data type:

-- Definition of a list with parametrized type
data List (A : Set) : Set where
  -- empty operator
  [] : List A
  -- append operator
  _::_ : A -> List A -> List A

Let’s write the map function over List. Differently from Haskell, Agda doesn’t perform type inference by default, so we need to explicitly provide the types when defining/calling a function that depends on other types. In the following example, the types A and B must be provided as parameters:

map1 : (A : Set) -> (B : Set) -> (A -> B) -> List A -> List B 
map1 A B f [] = []
map1 A B f (x :: xs) = f x :: map1 A B f xs

We have the following syntax sugar to combine two or more types that are subset of the same type:

(A : Set) -> (B : Set) is equivalent to (A B : Set)

Implicit Types. We have the option to make types implicit and let Agda figure out the right one. The syntax is to use curly brackets instead of parenthesis:

map2 : {A B : Set} -> (A -> B) -> List A -> List B 
map2 f [] = []
map2 f (x :: xs) = f x :: map2 f xs

Dependent Types. So far the types and operations we discussed so far don’t seem to show the real power of dependent types. So let’s define a new version of List, enhanced with the length of the list, which we’ll then call Vector:

data Vec (A : Set) : Nat -> Set where
  []   : Vec A zero
  -- Make the length type implicit
  _::_ : {n : Nat} -> A -> Vec A n -> Vec A (suc n)

Here we have two parameters when defining the data type, Nat -> Set. Notice also how when defining the list recursively we update the value n properly.

One nice thing about this is that in defining a head function, we don’t need to handle the empty case as long as we restrict the type of non-empty lists:

-- Doesn't need to check for empty lists. Restriction is encoded in
-- the function type!
head : {A : Set}{n : Nat} -> Vec A (suc n) -> A
head (x :: xs) = x 

Proofs as types. We are now going to write an example on how to use dependent types to encode proofs ( as types.

Let’s start by defining data types to represent the two possible values of a boolean. Note that we don’t specify constructors here.

data False : Set where
data True  : Set where

Now we define a function to convert a boolean values into these types:

isTrue : Bool -> Set
isTrue true = True
isTrue false = False

Let’s, for instance, define a function to be use as predicate, in this case, the < operator:

-- Definition of the < operator
_<_ : Nat -> Nat -> Bool
_ < zero = false
zero  < suc n = true
suc m < suc n = m < n

We will need the following function, that returns the length of a list, in our final example:

-- Definition of the length function
length : {A : Set} -> List A -> Nat
length [] = zero
length (x :: xs) = suc (length xs)

Finally, we can now write a sample function that encodes a predicate as a type. In the following example, the function lookup accepts one member of the family of predicates in which n less than the size of the list is a true statement.

-- Function that looks up the i-th element in a list
-- param 1: list
-- param 2: look up index
-- param 3: proof
lookup : {A : Set}(xs : List A)(n : Nat) -> isTrue (n < length xs) -> A
lookup [] n ()
lookup (x :: xs) zero p = x
lookup (x :: xs) (suc n) p = lookup xs n p

This concludes our brief introduction to Agda syntax and dependent types. In following posts we’ll learn more about writing proofs and have Agda to perform logical derivations for us.


I have close to zero knowledge in formal systems. I’ve been exposed briefly to it by reading parts of Gödel, Escher and Bach and the subject sounds very interesting. I based most of the content of this post in a few introductory tutorials [3-6], but I still didn’t get a good grasp of the power of systems like Agda.

I was initially interested in learn Coq but due to some problems in setting it up, I ended up looking for alternatives and found Agda. It seems that Coq is based on Ocaml and Agda is based on Haskell and since I’ve been focused in learning Haskell, it was a pleasant surprise. Agda has also has a good integration with Emacs, which is my favorite editor, so this was a plus.


[1] Per Martin-Löf – Wikipedia
[2] Proof Assistant – Comparison of Systems – Wikipedia
[3] Dependently Typed Programming in Agda – Ulf Norell and James Chapman
[4] Interactive Theorem Proving, Lecture Notes 0 – Anton Setzer
[5] Interactive Theorem Proving, Lecture Notes 1 – Anton Setzer
[6] Interactive Theorem Proving, Lecture Notes 2 – Anton Setzer