Storing deep hierarchies in a database

Introducing nested sets: part one

Image for post
Image for post
Photo by Blake Weyland on Unsplash

The problem we’re trying to solve

id name
1 Europe
2 America

and countries:

id name continent_id
1 France 1
2 United Kingdom 1
3 Sweden 1
4 Canada 2

Each country belongs within a specified continent (let’s just pretend it’s that simple for this example!), specified by its id in the continent_id field. The query to fetch details for a given country is straightforward:

SELECT AS continent, AS country
country.continent_id =
AND = '2';

and produces the output:

| continent | country |
| Europe | United Kingdom |
1 row in set (0.00 sec)

But what if you want to store other location granularities, or other hierarchical data such as generic subject categories, with a variable or unbounded number of levels?

A naive solution

id name parent_place
1 Europe NULL
2 France 1
3 United Kingdom 1
4 England 3
5 Wales 3
6 Sweden 1

We can then use a join each time we want to ascend the hierarchy, like so:

SELECT AS place1, AS place2, AS place3
place p1,
place p2,
place p3
WHERE = '4'
AND = p1.parent_place
AND = p2.parent_place;

And we get results such as:

| place1 | place2 | place3 |
| England | United Kingdom | Europe |
1 row in set (0.00 sec)

This is fine if we have a fixed depth that we know ahead of time, but it doesn’t scale well, and it’s impractical if we have an unbounded depth.

In such cases, we can simply keep re-executing a query, feeding it the id of each ancestor until we get a NULL parent_place back, but that could be a lot of queries for a potentially deep hierarchy such as a subject categorisation.

All we’ve really done with this approach is to shove everything into a single table, without actually solving the original problem. How can we represent a generic hierarchy with unbounded depth, and still make the data accessible?

Enter ‘nested sets’.

Nested sets

id name lft rgt
1 Europe 1 14
2 France 2 3
3 United Kingdom 4 11
4 England 5 6
5 Wales 7 8
6 Scotland 9 10
7 Sweden 12 13

First, it’s very important to note that the two columns we’ve introduced — lft and rgt — do not reference other locations. In fact, they stand for ‘left’ and ‘right’, and they’re all (well, mostly) we need to store a full, unbounded hierarchy and be able to answer questions such as:

  1. What are my ancestors?
  2. What are my descendants?
  3. What are my siblings, in order?

each with a single straightforward query. That sounds too simple, right —how have we managed to pack all that information in, with the addition of just a single column? It turns out it’s all in the properties of lft and rgt and how we maintain those properties when we update the data.

Take a look at the following diagram which represents the data in the previous table:

Image for post
Image for post
Each box represents a record in our table, with its lft and rgt values displayed

This should make it far more obvious what the cryptic ‘left’ and ‘right’ nature of those two new columns is. They are, almost literally, points on a line which we can imagine our entries laid out along, from left to right. In particular, the following properties are obeyed:

  • If an item has no parent (e.g. Europe), its lft value is 1
  • If an item is the first child (e.g. England), its lft value is its parent’s lft value + 1
  • If an item is not the first child (e.g. Sweden), its lft value is its previous sibling’s rgt value + 1
  • If an item has children (e.g. United Kingdom), its rgt value is its last child’s rgt value + 1
  • If an item doesn’t have any children (e.g. France), its rgt value is its own lft value + 1

You can think of each record as a box, ‘containing’ all of its descendants — in order — and being ‘contained within’ all of its ancestors. Hence ‘nested sets’.

It turns out these properties even allow us to trivially answer a question such as “How many descendants do I have?” — it turns out this question isn’t particularly useful, but it’s interesting that we can answer it, given that an answer would be unfathomable using the naive approach presented earlier.

Interestingly, we can’t actually easily answer questions such as “how many children do I have?” or even “what are my children?”, but we’ll deal with that issue in due course (don’t worry, it’s easy to fix!). For now, let’s see how we can answer the original questions we promised to.

What are my ancestors?

location ancestor,
location original
original.lft > ancestor.lft
AND original.rgt < ancestor.rgt
AND = 5;

This gives the output:

| id | name |
| 1 | Europe |
| 3 | United Kingdom |
2 rows in set (0.00 sec)

What are my descendants?

location descendant,
location original
original.lft < descendant.lft
AND original.rgt > descendant.rgt
AND = 1;

which gives us:

| id | name |
| 2 | France |
| 3 | United Kingdom |
| 4 | England |
| 5 | Wales |
| 6 | Scotland |
| 7 | Sweden |
6 rows in set (0.00 sec)

How many descendants do I have?

(location.rgt - location.lft - 1) DIV 2 AS num_descendants
WHERE = 3;

This query gives the result:

| num_descendants |
| 3 |
1 row in set (0.00 sec)

The method works because a record’s ‘width’ is directly related to the number of records that fit ‘inside’ it i.e. its descendants.

Follow up

Supporting code — for MySQL — is available in the ‘nested-sets-part-one’ repository in GitHub. All the table setup is contained in setup.sql, whilst each query in this article is contained in a separate file. See the README if you’re unsure on how to run it.

Written by

Technologist & writer, Bobby is currently working on several projects including a management dashboard for static websites and an education portal.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store