Turing Machines and Undecidability


Jeffrey Ullman is a professor of Computer Science at Stanford. He is famous for his book Introduction to Automata Theory, Languages, and Computation. He teaches the Automata course in Coursera, and I’ve just finished the most recent edition.

Ullman’s research interests include database theory, data integration, data mining, and education using the information infrastructure. He was Sergey Brin’s PhD advisor and won the Knuth Prize in 2000 [1].

We divide this post in two parts. In the first part, we’ll provide a brief overview of automata theory, including basic concepts and terminology. In Part 2, we’ll focus on the limitations of automatons by discussing undecidability.

Part 1: Automata Theory

In the following sections, we’ll go from the more limited automata, to a push-down automata and then to a turing machine.

Finite automata

We can think of a finite automaton as a state machine. It’s represented by a directed graph where the nodes represent states and each edge (u, v) connecting two states has a label, which is a symbol from a given alphabet \Sigma. Consider an initial state q_0, a set of final states F and a string w of the alphabet \Sigma.

To be more generic, we can describe the edges of the graph as a transition function. It takes as input a state p and a symbol c and outputs a new state q. We denote such function as \delta, so we have \delta(p, c) \rightarrow q.


Figure 1: Sample DFA

We can then decide if input w is accepted by this automaton by simulating it on the automaton. That consists in consuming each character c from w at a time and moving from one state to another by following the edge labeled c. If there’s no such edge, we can assume it goes to a dead-end state where it gets stuck and is rejected. If after consuming all the input it is in one of the final states, we say it’s accepted, otherwise it’s rejected.

If there’s always at most one edge to take at any point of the simulation, we have a deterministic way to decide wheter w is accepted, so we call such class of automaton, determinist finite automata or DFA for short. In case it has multiple choices, we have a non-deterministic finite automata, or NFA.

Surprisingly, a DFA is as powerful as a NFA, because we can simulate a NFA with a DFA, though such DFA might have an exponential number of states. DFA’s are also as powerful as regular expressions, which is out of the scope of this post.

We define a set of strings as a language. We say that an automaton defines a given language if it accepts all and only the strings contained in this language.

In the case of finite automata, the class of languages that they define is called regular languages.

Push-down automata

A push-down automata is essentially a finite automata equipped with a stack. The symbols that can go into the stack are those from the input and also some additional ones. In this automaton, the transition function takes as input, the state p, the input symbol c and the element at the top of the stack X. It outputs the new state q and the new top of the stack that will replace the top X, which can be more than one element or event the empty symbol, meaning that we didn’t put any element on top.

This type of automaton can be shown to be equivalent to a context-free grammar (CFG), which is out of the scope of this post as well, but it’s very common for describing programming languages.

The set of languages for which there is a push-down automata that define them, is called context-free languages, which contains the regular languages.

Turing machines

What could we accomplish with a finite automaton with 2 stacks? Quite a bit. Such automata are called two-push down stack automata. It’s possible to show that a finite automaton with 2 stacks is equivalent to a Turing machine.

So, what is exactly a Turing machine? It’s a device with an state, an infinite tape and a needle that points to a given position on this tape. The input w is initially written on the tape and the needle points to the first character of w. All the other positions in the tape contains the blank symbol.

Figure 2: A concept of the Turing Machine

The transition function that takes as input the state p and the input symbol pointed by the needle c and it outputs the new state q, the new symbol that will be written replacing c and the direction that the needle moves after that (either right or left). In case there’s no defined transition function for a state and input symbol, we say that the machine halts.

There are two main flavors of TM regarding when an input considered accepted: On options is that the TM has a set of final states and whenever we get to one of that states, we accept. The other option is by accepting when the TM halts.

It’s possible to show that in the general case both versions are equivalent, in the sense that both define the same class of languages, which was historically denoted by recursive enumerable languages.

An algorithm for a decision problem is a special type of TM that accepts by final state and it’s always guaranteed to halt (whether or not it accepts). The class of languages that are defined by algorithms are called recursive languages.

Extending Turing machines

We can simulate more general TMs using our definition of TM. For example, we can model a TM with multiple tracks. Instead of having a single element, each position on the tape can contain a tuple, where each element of the such tuple represents the element on each tape with the same position.

We can even have multiple needles, by using a dedicated tape for representing the current position of each additional needle.

Part 2: Undecidability

In this second part of the post we’ll discuss about undecidability. A problem is said undecidable if there’s no algorithm to solve it, that is, no TM that is guaranteed to halt on all its inputs.

The Turing Machine Paradox

Question: Are Turing machines powerful enough so that for any language there exists a corresponding TM that defines it?

Not really. We will next define a language that creates a paradox if a TM exists for it. Before that, we first discuss how we can encode TM’s as strings so it can be used as input to other TM’s.

Encoding Turing machines. Given a description of a Turing machine (the input, the transition functions), it’s possible to encode it as a binary string. Each possible Turing machine will have a distinct binary representation. We can then convert these strings to integer. Since two strings can be converted to the same integer because of leading zeroes (e.g. both 010 and 10 map to 2), we append a 1 before all the strings (e.g. now 1010 maps to 10 and 110 to 6).

We can now order all Turing machines by their corresponding integer representation. Note that some integers might not be encodings of an actual TM, but for simplicity, we can assume it’s a TM that accepts the empty language.

Since a TM can be seen as a string, we can say that a TM accepts itself if it accepts the corresponding encoding of itself as a binary string. We can even define a language as the set of TM encodings that don’t accept themselves:

L_d = \{w | w \mbox{ is the encoding of a TM that doesn't accept itself} \}

So we want to know whether there is a TM T_d for L_d. Suppose there is. Then we ask: does T_d accepts itself?

The self-referencing property is the origin Russell’s Paradox. It essentially defines a set S that contains all elements (which can also be sets) that do not contain themselves. The paradox is that if S contains itself, then there’s a set in S that do contain itself, which is a contradiction. On the other hand, if it doesn’t contain itself, S left out a set, which also contradicts the definition of S.

We can use the same analogy for TMs. If T_d accepts itself, then it is accepting a TM that doesn’t comply with the constraints. If it does not, it left out the encoding of a TM that doesn’t accept itself.

The only assumption we made so far is that there is such TM for language L_d, so we can conclude this was a wrong assumption and thus L_d is not a recursive enumerable language.

The Universal Turing Machine

Question: Is there an algorithm to tell whether a Turing Machine accepts a given input w?

To answer this question, we’ll first define a special type of turing machine. The Universal Turing Machine (UTM) is a TM that takes as input a TM M encoded as string and an input w as a pair (M, w) and accepts it if and only if the corresponding TM accepts w. We denote by L_u the language of the UTM.

We can design the UTM as a TM with 3 tapes (see multiple tapes TM on the Extended Turing Machines section),

* Tape 1 holds the M description
* Tape 2 represents the tape of M
* Tape 3 represents the state of M

The high-level steps we need to carry over (and for which we have to define the proper transition function, but we’re not doing here) are:

1. Verify that M is a valid TM
2. Decide how many “bits” of the tape it needs to represent one symbol from M
3. Initialize Tape 2 with input w and Tape 3 with the initial state of M
4. Simulate M, by reading from Tape 1 the possible transitions given the state in Tape 3 and the input at Tape 2.
5. If the simulation of M halts accepting the string w, then the UTM accepts (M,w).

We can now answer the question in the beginning of this section by the following proposition:

Proposition: L_u is recursive enumerable, but not recursive.

Proof: It’s recursive enumerable by definition, since it’s defined by a TM (in this case the UTM). So we need to show that the UTM is not guaranteed to always halt, that is,.

Suppose there’s such an algorithm. Then we have an algorithm to decide whether a string is in L_d. First we check if w is a valid encoding of a TM. If it’s not, then we assume it represents a TM that defines an empty language, so obviously it doesn’t accept itself and thus should go into L_d.

Otherwise, we ask our hypothetical algorithm whether it accepts (w, w). If it does, then the TM corresponding to w accepts itself and should not go into L_d. On the other hand, if our algorithm doesn’t accept, it should go into L_d. We just described an algorithm that defines L_d, but in the previous section we proved that L_d has no TM that defines it. This contradiction means that our assumption that an algorithm exists for L_u is incorrect.


This is equivalent to the halting problem, in which we want to know whether a program (a Turing machine) will halt for a given input or run forever.

Post Correspondence Problem

Question: What are other problems undecidable problems?

We can show a problem is undecidable by reducing an already known undecidable problem to it. In particular, if we can show a particular problem can be used to simulate an universal Turing machine, we automatically prove it undecidable. We’ll now do this with a fairly simple problem to describe and that has no apparent connection to turing machines.

The post correspondence problem (PCP) can be stated as follows. Given n pairs of strings of the form (a_i, b_i), 1 \le i \le n, the problem consists in finding a list of indexes R = \{r_1, r_2, \cdots, r_k\} such that concatenating the pairs in this order leads to the same string, that is, a_{r_1}a_{r_2} \cdots a_{r_k} = b_{r_1}b_{r_2} \cdots b_{r_k} (note that an index can be used more than once in the solution).

There’s a stricter version of this problem in which the first pair of the input must be the first pair in the solution, which we’ll refer to MPCP. This doesn’t make the problem harder because it’s possible to reduce it to the original PCP.

It’s possible to simulate a TM by solving the MPCP so that there’s a solution to the MPCP if, and only if, the TM accepts an input w. We provide the detailed reduction in the Appendix for the interested reader. This means that the if there exist an algorithm for PCP (and thus for MPCP), we have an algorithm for any TM, including the UTM, which is a contradiction, and thus we conclude that PCP is undecidable.

Many other problems can be proved to be undecidable by reducing the PCP to them. The list includes some decision problems regarding Context Free Grammars. For example, given a CFG, tell whether a string can be generated in more than one way (whether a CFG is ambiguous); or whether a CFG generates all the strings over the input alphabet; or whether a CFL is regular;


So far this is the second course I finish on coursera (the other one I did was the Probabilistic Graphical Models). I’m very glad that classes from top tier universities are freely available on the web.

Automata theory was one of the gaps in my computer science theory base, especially Turing machines. I’ve done an introductory course on theoretical computer science, but it was mostly focused on intractable problems and cryptography.


[1] Wikipedia – Jeffrey Ullman
[2] Slides from Automata Course – Coursera


Proposition. It’s possible to decide whether an universal turing machine accepts a string by reducing it to MPCP.

Sketch of proof. We won’t give a formal proof of the reduction, but the idea is pretty neat and after understanding it, the reduction becomes more intuitive.

First thing, let’s define a snapshot from a simulation of a TM. A snapshot is a compact description of the TM in a particular simulation step. We can represent it by placing the needle to the left of the tape position it points to. We can omit both the right and left infinite endpoints that are composed eintirely by blanks.

Thus, the first snapshot is represented by q_0 w_1\cdots w_n. If we move the needle to the right writing X to the current position and changing the state to p, the snapshot becomes x p w_2 \cdots w_n. If instead from the first snapshot we moved to the left, writing X and changing the state to p, we would have p \square x w_2 \cdots w_n, where \square is the blank symbol and we need to represent it because it’s not part of the contiguous infinite blanks block anymore.

And each incomplete solution of the MPCP, we have the first part one step behind the second part. With that, we can “tie” to snapshots together by a pair, which can be use to encode the transition function.

The instance we’ll build for the MPCP will consist of the following classes of pairs:

First pair, represent the initial snapshot:

(1) (\#, \#q_0w\#)

Delimiter pair, to separate adjacent snapshots:

(2) (\#, \#)

Copy pair, to copy tape symbols that are not involved in the transition function. For each tape symbol X we create the pair

(3) (X, X)

Transition function pairs

For a move to the right, that is, for each \delta(q, X) \rightarrow (p, Y, R) we create the pair

(4) (qX, Yp)

For a move to the left, that is, for each \delta(q, X) \rightarrow (p, Y, R) we create the pair

(5) (ZqX, pZY)

In case X is the blank character, the right movement, that is \delta(q, B) \rightarrow (p, Y, R) will become:

(6) we create (q\#, Yp\#)

The left movement, that is \delta(q, B) \rightarrow (p, Y, L) will become:

(7) (Zq\#, pZY\#)

Final state pairs

For a final state f and for all tape symbols X, Y we add the following auxiliary pairs, from (8) to (10):

(8) (XfY, f)

(9) (fY, f)

(10) (Xf, f)

(11) (f\#\#, \#)

Simulation – Example

The solution will have to start with pair (1), because it’s the first pair and we’re solving the MPCP.

\#q_0 w_1 \cdots w_n

We can now simulate a movement. For example, say we have a right move defined by \delta(q_0, w_1) \rightarrow (p, x, R) by applying the corresponding pair (4) (q_0 w_1, x p), which will lead us to the partial solution

\#q_0 w_1
\#q_0 w_1 \cdots w_n \# x p

for now, we can only use the pairs (3), because we need to match w_2, w_3, \cdots w_n, so we apply those pairs until we get

\#q_0 w_1 \cdots w_n
\#q_0 w_1 \cdots w_n \# x p w_2 \cdots w_n

Now, the only pair that can follow is (30, that is, (\#, \#), which will finish the snapshot:

\#q_0 w_1 \cdots w_n \#
\#q_0 w_1 \cdots w_n \# x p w_2 \cdots w_n \#

Note that the pairs of form (3) are only used to copy the remaining part of the tape. That’s because the transition function itself is local, in the sense that it doesn’t know about the tape values that are not involved its definition.

This movement is analogous for the other transition functions.

The remaining case is when the reach a final state, in which we want to accept my making the first part of the partial solution to catch up with the second part.

Say we are in this stage:

\cdots \#
\cdots \#ABfCDE\#

where f is a final stage. We first apply the copy pair (3) on A:

\cdots \#A
\cdots \#ABfCDE\#A

and then use pair (8) as (BfC, f) to get

\cdots \#ABfC
\cdots \#ABfCDE\#Af

we can only copy and finish the snapshot using (3) and (4):

\cdots \#ABfCDE\#
\cdots \#ABfCDE\#AfDE\#

we need the pair (8) again, now as (AfD, f) to get

\cdots \#ABfCDE\#AfD
\cdots \#ABfCDE\#AfDE\#f

again, by copying and finishing the snapshot we have,

\cdots \#ABfCDE\#AfDE\#
\cdots \#ABfCDE\#AfDE\#fE\#

now we use the pair (9) as (fE, f) to obtain

\cdots \#ABfCDE\#AfDE\#fE
\cdots \#ABfCDE\#AfDE\#fE\#f

ending another snapshot

\cdots \#ABfCDE\#AfDE\#fE\#
\cdots \#ABfCDE\#AfDE\#fE\#f\#

finally we use the closing pair (11)

\cdots \#ABfCDE\#AfDE\#fE\#f\#\#
\cdots \#ABfCDE\#AfDE\#fE\#f\#\#

Note that the final closing is carried over using several auxiliary snapshots that don’t actually correspond to the actual simulation, but it’s easy to obtain the right snapshots from a solution to the MPCP.


An Introduction to Agda


Per Martin-Löf is a Swedish logician, philosopher and mathematician statistician. Since 1970 he has been mainly publishing on logic.

Martin-Löf has been active in developing intuitionistic type theory as a constructive foundation of mathematics and his work on type theory has influenced computer science.

He completed his PhD program under supervision of Andrei Kolgomorov and is currently Professor at Stockholm University, being a member of the Royal Swedish Academy of Sciences.

In this post we’ll talk about assisted theorem proving, more specifically using a tool named Agda, which has it’s syntax based on Haskell and the engine is based on the Per Martin-Löf’s intuitionistic type theory.

We’ll first talk about the basic concepts such as assisted theorem proving and dependent type theory and then present a very brief introduction to Agda based on a tutorial from Ulf Norell and James Chapman [3].

Assisted Theorem Proving

We can classify theorem proving in the following categories:

1) Manual
2) Manual with machine support
3) Interactive
4) Automated

(1) is the oldest, most known and widely used method by mathematicians. (2) Since the introduction of computers, we now can have machines to verify syntax and make manual proofs less error prone. (3) We can also write proofs in a way the machine understands so that it can perform some logical deductions steps for us. (4) In the last extreme case, we can let the machine do all the job and find proofs by itself.

Dependent Types

Basically speaking, dependent type is a type that depends on a value. In typed programming languages we can have a type like vector T, representing a vector containing elements of type T, but in a dependent type language we could also have vector T n, representing vectors with elements of type T and length n. We’ll see an example for this type in the next section using Agda.


Agda is a dependent type language based on intuitionistic type theory, introduced by Martin-Löf. It was first implemented by Catarina Coquand (who is married to Thierry Coquand, which inspired the Coq tool, another famous assisted proof tool). Later it was rewritten as Agda 2 by Ulf Norell, one of the authors of the tutorial we’re going to follow.


The first thing we want to do before starting is to get Agda running. This can be done using the cabal package manager from Haskell:

> cabal install Agda

On Mac, this will install some binaries to ~/Library/Haskell. We want to use some of those binaries, so let’s include in our bin path:

export PATH="$HOME/Library/Haskell/bin:$PATH"

Agda has a very good integration with Emacs. On Mac we can make use of that integration in Aquamacs. To set up the right configuration into the .emacs file, we can now run:

> agda-mode setup

Main Concepts

Latex and Unicode. Agda uses makes use of unicode characters to make the code more readable and accepts LaTeX syntax to define them. For example, we can input \rightarrow which will be transformed into the character. Also there are shortcuts to avoid typing a lot (for example \r is expanded to \rightarrow)

Delimiters. Agda is very flexible in naming, by only considering very few characters as delimiter (including space, parenthesis and curly brackets). This means we can have names like :<a*, but then we have to be careful in adding space whenever we need to separate two variables.

Modules and name convention. Agda files have the extension .agda. For our tutorial, let’s create one called basic.agda. The first thing we want to do is to define the module, which is used to manage namespaces. By convention, the module name has to match the file name, so we write:

module basic where

Data Types. Our basic working blocks will be types and we can define them by data types, which have syntax similar to data types in Haskell. For example, we can model the boolean type, which we call Bool. In the following code we have the type named Bool which is a subset of Set (pre-defined) and we provide two constructors, both being of the type Bool.

data Bool : Set where
  true : Bool
  false : Bool

In the same manner we can define the set of natural numbers. Since it’s impossible to explicitly define all value like we did for booleans, we do so by defining a function suc recursively, which takes a Nat and returns its successor, like in the following example:

-- Natural numbers
data Nat : Set where
  zero : Nat
  suc  : Nat -> Nat

Note that we are saying the constructor suc represents a function that receives a Nat and return a Nat, but we are not providing the implementation for it.

So let’s write some functions to work with the Nat type. More specifically, addition and multiplication.

Functions. In Agda, when we write a function surrounded by underscores, like _+_, we’re defining a function that takes two arguments using infix notation. Let’s define the addition operation:

-- Addition
_+_ : Nat -> Nat -> Nat 
zero  + m = m
suc n + m = suc (n + m)

Note how we define everything in terms of the basic elements from Nat, that is, zero and suc, we have no notion of values. This is important because if we want to encode proofs as programs, we have to be generic and ‘value agnostic’. In this function we pattern match on different values of the first argument: If it’s zero, we are in the base case. Otherwise, the first argument must be a successor of some another value n, so we do recursion on n until it hit the base case.

Multiplication is analogous, but here we use the _+_ operator to define the second matched pattern:

-- Multiplication
_*_ : Nat -> Nat -> Nat
zero  * m = zero
suc n * m = m + n * m

The code above will cause an error because Agda requires strictly different priorities for different operators.

We can attribute some random values to these operators using infixl, which takes a priority value and a function. The higher the value, the higher the priority. Since we want multiplication to have higher priority than addition we can assign a higher number for it, for example:

-- Defining priority
infixl 6 _*_
infixl 4 _+_

Parametrized types. We can define a type that depends on other types, like parametrized types in Haskell or templates in C++. In this tutorial, we’ll create a list of elements of a given type A, as in the following data type:

-- Definition of a list with parametrized type
data List (A : Set) : Set where
  -- empty operator
  [] : List A
  -- append operator
  _::_ : A -> List A -> List A

Let’s write the map function over List. Differently from Haskell, Agda doesn’t perform type inference by default, so we need to explicitly provide the types when defining/calling a function that depends on other types. In the following example, the types A and B must be provided as parameters:

map1 : (A : Set) -> (B : Set) -> (A -> B) -> List A -> List B 
map1 A B f [] = []
map1 A B f (x :: xs) = f x :: map1 A B f xs

We have the following syntax sugar to combine two or more types that are subset of the same type:

(A : Set) -> (B : Set) is equivalent to (A B : Set)

Implicit Types. We have the option to make types implicit and let Agda figure out the right one. The syntax is to use curly brackets instead of parenthesis:

map2 : {A B : Set} -> (A -> B) -> List A -> List B 
map2 f [] = []
map2 f (x :: xs) = f x :: map2 f xs

Dependent Types. So far the types and operations we discussed so far don’t seem to show the real power of dependent types. So let’s define a new version of List, enhanced with the length of the list, which we’ll then call Vector:

data Vec (A : Set) : Nat -> Set where
  []   : Vec A zero
  -- Make the length type implicit
  _::_ : {n : Nat} -> A -> Vec A n -> Vec A (suc n)

Here we have two parameters when defining the data type, Nat -> Set. Notice also how when defining the list recursively we update the value n properly.

One nice thing about this is that in defining a head function, we don’t need to handle the empty case as long as we restrict the type of non-empty lists:

-- Doesn't need to check for empty lists. Restriction is encoded in
-- the function type!
head : {A : Set}{n : Nat} -> Vec A (suc n) -> A
head (x :: xs) = x 

Proofs as types. We are now going to write an example on how to use dependent types to encode proofs ( as types.

Let’s start by defining data types to represent the two possible values of a boolean. Note that we don’t specify constructors here.

data False : Set where
data True  : Set where

Now we define a function to convert a boolean values into these types:

isTrue : Bool -> Set
isTrue true = True
isTrue false = False

Let’s, for instance, define a function to be use as predicate, in this case, the < operator:

-- Definition of the < operator
_<_ : Nat -> Nat -> Bool
_ < zero = false
zero  < suc n = true
suc m < suc n = m < n

We will need the following function, that returns the length of a list, in our final example:

-- Definition of the length function
length : {A : Set} -> List A -> Nat
length [] = zero
length (x :: xs) = suc (length xs)

Finally, we can now write a sample function that encodes a predicate as a type. In the following example, the function lookup accepts one member of the family of predicates in which n less than the size of the list is a true statement.

-- Function that looks up the i-th element in a list
-- param 1: list
-- param 2: look up index
-- param 3: proof
lookup : {A : Set}(xs : List A)(n : Nat) -> isTrue (n < length xs) -> A
lookup [] n ()
lookup (x :: xs) zero p = x
lookup (x :: xs) (suc n) p = lookup xs n p

This concludes our brief introduction to Agda syntax and dependent types. In following posts we’ll learn more about writing proofs and have Agda to perform logical derivations for us.


I have close to zero knowledge in formal systems. I’ve been exposed briefly to it by reading parts of Gödel, Escher and Bach and the subject sounds very interesting. I based most of the content of this post in a few introductory tutorials [3-6], but I still didn’t get a good grasp of the power of systems like Agda.

I was initially interested in learn Coq but due to some problems in setting it up, I ended up looking for alternatives and found Agda. It seems that Coq is based on Ocaml and Agda is based on Haskell and since I’ve been focused in learning Haskell, it was a pleasant surprise. Agda has also has a good integration with Emacs, which is my favorite editor, so this was a plus.


[1] Per Martin-Löf – Wikipedia
[2] Proof Assistant – Comparison of Systems – Wikipedia
[3] Dependently Typed Programming in Agda – Ulf Norell and James Chapman
[4] Interactive Theorem Proving, Lecture Notes 0 – Anton Setzer
[5] Interactive Theorem Proving, Lecture Notes 1 – Anton Setzer
[6] Interactive Theorem Proving, Lecture Notes 2 – Anton Setzer