Today I would like to discuss the topic of self-referential data structures, using Haskell as the language of choice since it easily supports this construct.

Firstly, what do we mean by a self-referential data structure? As the name suggests, it is a data structure that in some way refers to itself. The key point to take here is that we want the data structure to refer indeed to itself, and not some other instance of the same of the same type.

For example, say you are building a binary tree in Java, and you have a basic Node class:

public class Node{
  int value;
  Node left;
  Node right;
}

This is not a self-referential data structure since the values of left and right refer to some other instances of type Node, and not this node itself.

Instead of talking more, here is a simplest example of a self-referential data structure in Haskell - it is the list of natural numbers.

nats = 1 : map (+1) nats

Try running this, and you will see that you indeed get the infinite stream of natural numbers. How does this work? It much easier to see when broken down into individual pieces. The key to understanding such construct is remembering that, in this case, nats is the same on both left and right hand side, it cannot change. Do not think that the nats on the right is some kind of reduced version of nats.

In the explanation, I will use ? to indicate what is known as a thunk, or a yet-to-be evaluated value. Another key to understanding this is that many functions in Haskell, including map are lazy, meaning they don’t process the whole input at once, they do so rather when it’s needed.

nats = 1 : map (+1) nats
nats = 1 : map (+1) 1:?
nats = 1 : 2 : map (+1) 2:?
nats = 1 : 2 : 3 : map (+1) 3:?

Let’s look at two more examples.

Generating the infinite list of even numbers.

evens = 0 : map (+2) evens

and of the odd numbers

odds = 1 : map (+2) odds

And now, if you want a bizarre way to generate the natural numbers, here you go!

concatMap (\(a,b) -> [a,b]) $ zip evens odds

Let’s look at something marginally more complicated, such as the Fibonacci sequence.

    0,1,1,2,3,5,8,13,21,...
      0,1,1,2,3,5,8, 13...   +
    ----------------------
    0,1,2,3,5,8,13,21,...

As you can see from my attempt at showing the relationship how the Fibonacci sequence can be generated using itself, it is quite easy! You would zipWith fibonacci with its tails using the addition function! What could be simpler?

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

If you still haven’t caught on how this works (it took me a while!), here is the expansion.

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
fibs = 0 : 1 : zipWith (+) 0:1:? 1:?
fibs = 0 : 1 : (0+1) : zipWith (+) 1:? ?
fibs = 0 : 1 : 1 : zipWith (+) 1:1:? 1:?
fibs = 0 : 1 : 1 : 2 : zipWith (+) 1:2:? 2:?
fibs = 0 : 1 : 1 : 2 : 3 : zipWith (+) 2:3:? 3:?

ad infitum… Really do take the time to understand this example, or you will have trouble coming up with your own self-referential solution methods.

Let’s have a look at another example. The problem statement is to generate the Excel column numbering, meaning [“A”,“B”,“C”,..,“Z”,“AA”,“AB”,..“ZZ”,“AAA”,..]. This is a problem whose solution is infinite, and whose structure lends itself to using a self-referntial data structure.

A good thing to do mentally when working with such problems is to assume that you already have a part of the solution, and see if from that you could compute the whole solution.

Think for example that you just have the first set of columns [“A”,..,“Z”], which is easily obtainable. To get the more of the sequence, notice that you simply need to take each element of this list, and append to it each of the values from [“A”,..,“Z”]. Then you will have [“AA”,..,“ZZ”], and again, simply take each of these elements, and append each element from [“A”..“Z”] to it. Let’s put this into code!

excel = "" : [s ++ [p] | s <- excel, p <- ['A'..'Z'] ]

See that we need the empty string in the beginning to jumpstart the computation! Other than that, it’s quite simple!

Let’s do another example, the Pascal triangle. The Pascal triangle is another structure that is very easily constructed from previous parts of itself, mainly by summing up adjacent numbers in one row. Try doing this yourself first, and then look at my solution below, to see if you came up with something similar.

pascal = [1,1] : map (\e -> 1 : zipWith (+) e (drop 1 e) ++ [1]) pascal

Then, the first 10 elements are:

Prelude> take 10 pascal
[[1,1],[1,2,1],[1,3,3,1],[1,4,6,4,1],[1,5,10,10,5,1],[1,6,15,20,15,6,1],[1,7,21,35,35,21,7,1],[1,8,28,56,70,56,28,8,1],[1,9,36,84,126,126,84,36,9,1],[1,10,45,120,210,252,210,120,45,10,1]]

One final example! We shall iteratively compute the points of triangles in the Sierpinski triangle. If you haven’t come across it, the Sierpinski triangle is a fractal, a self-referntial concept in itself, so surely we should be able to use a self-referential data structure to compute it!

Here is a picture of the Sierpinski triangle. Imagine the bottom left corner to be (0,0), bottom right (1,0), and the top (0.5, sqrt 2).

Sierpinski

We can then define three transformations that generate the three subtriangles give any triangle:

t1 (x,y) = (0.5*x,0.5*y)
t2 (x,y) = (0.5*x + 0.5, 0.5*y)
t3 (x,y) = (0.5*x + 0.25, 0.5*y + (sqrt 2) / 2)

As before, we need the initial triangle, which we can describe by:

start = [(0.0,0.0),(1.0,0.0),(0.5,sqrt 2)]

Let us now construct a function that will take one triangle, and generate its three successors (in whatever order you desire).

genTriangles points  = [map t points, t <- [t1,t2,t3] ]

Now, we use a similar construct as before to generate an infinite stream of the points making up the triangles in the Sierpinski triangle.

sierpinski = [start] ++ concatMap genTriangles sierpinski

Looking at the output, and at a picture, we confirm that this is correct!

Prelude> take 10 sierpinski 
[[(0.0,0.0),(1.0,0.0),(0.5,1.4142135623730951)],[(0.0,0.0),(0.5,0.0),(0.25,0.7071067811865476)],[(0.5,0.0),(1.0,0.0),(0.75,0.7071067811865476)],[(0.25,0.7071067811865476),(0.75,0.7071067811865476),(0.5,1.4142135623730951)],[(0.0,0.0),(0.25,0.0),(0.125,0.3535533905932738)],[(0.5,0.0),(0.75,0.0),(0.625,0.3535533905932738)],[(0.25,0.7071067811865476),(0.5,0.7071067811865476),(0.375,1.0606601717798214)],[(0.25,0.0),(0.5,0.0),(0.375,0.3535533905932738)],[(0.75,0.0),(1.0,0.0),(0.875,0.3535533905932738)],[(0.5,0.7071067811865476),(0.75,0.7071067811865476),(0.625,1.0606601717798214)]]

1 Conclusion

I hope you enjoyed this and learned something about Haskell self-referential data structures! Let me know if I have made an error or if anything is not clear! I myself am still a beginner in Haskell, and am writing a lot for my own benefit. You know you understand a topic when you can coherently explain it to someone.

Self referential data structures in Haskell

Posted on June 11, 2012

Last updated on February 28, 2013

1 Conclusion