Using ltree for hierarchical structures in PostgreSQL

Hello my dear friends. In the previous article we learned about storing the tree structures in the RDBMS. In this article we will learn how to work with ltree module for PostgreSQL, which allow store data in a hierarchical tree-like structure.

What is ltree?

Ltree is a PostgreSQL module. It is implements a data type ltree for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.

Why ltree?

  • The ltree implements a materialized path, which very quick for INSERT/UPDATE/DELETE and pretty quick for SELECT operations
  • It will be generally faster than using a recursive CTE or recursive function that constantly needs to recalculate the branching
  • As built in query syntax and operators specifically designed for querying and navigating trees
  • Indexes!!!

Initial data

First of all you should enable extension in your database. You can do this by this command:

CREATE EXTENSION ltree;

Let’s create table and add to it some data:

CREATE TABLE comments (user_id integer, description text, path ltree);
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001.0001.0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0001.0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 8, md5(random()::text), '0001.0003.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 11, md5(random()::text), '0001.0003.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0003.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0003.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 7, md5(random()::text), '0001.0003.0002.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 20, md5(random()::text), '0001.0003.0002.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 31, md5(random()::text), '0001.0003.0002.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0004');
INSERT INTO comments (user_id, description, path) VALUES ( 34, md5(random()::text), '0001.0003.0002.0002.0005');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0006');

Also we should add some indexes:

CREATE INDEX path_gist_comments_idx ON comments USING GIST(path);
CREATE INDEX path_comments_idx ON comments USING btree(path);

As you can see, I create table ‘comments’ with field ‘path’, which contain full path by tree for this comment. As you can see, for tree divider I use 4 numbers and dot.

Let’s found all comments, where path begin from ‘0001.0003’:

$ SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

We can check this sql by EXPLAIN command:

$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on comments  (cost=0.00..1.24 rows=2 width=38) (actual time=0.013..0.017 rows=12 loops=1)
   Filter: (path <@ '0001.0003'::ltree)
   Rows Removed by Filter: 7
 Total runtime: 0.038 ms
(4 rows)

Let’s for test disable seq scan:

$ SET enable_seqscan=false;
SET
$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using path_gist_comments_idx on comments  (cost=0.00..8.29 rows=2 width=38) (actual time=0.023..0.034 rows=12 loops=1)
   Index Cond: (path <@ '0001.0003'::ltree)
 Total runtime: 0.076 ms
(3 rows)

Now it’s slower, but you can see, how it use index. First request use sequence scan because we have not many data in table.

We can do select “path <@ ‘0001.0003’” in another way:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

Also you should not forget about ordering of data. Example:

$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0001');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0002');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0003');
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
(15 rows)

Now with order:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

There are several modifiers that can be put at the end of a non-star label in lquery to make it match more than just the exact match: * “@” - match case-insensitively, for example a@ matches A * “*” - match any label with this prefix, for example foo* matches foobar * “%” - match initial underscore-separated words

Also, you can write several possibly-modified labels separated with (OR) to match any of those labels, and you can put ! (NOT) at the start to match any label that doesn’t match any of the alternatives. Example:
$ SELECT user_id, path FROM comments WHERE path ~ '0001.*{1,2}.0001|0002.*' ORDER by path;
 user_id |           path
---------+--------------------------
       2 | 0001.0001.0001
       2 | 0001.0001.0001.0001
       1 | 0001.0001.0001.0002
       5 | 0001.0001.0001.0003
       6 | 0001.0002.0001
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(19 rows)

Now let’s use it for our commentable system. Find all direct childrens for parent ‘0001.0003’:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path;
 user_id |      path
---------+----------------
       8 | 0001.0003.0001
       9 | 0001.0003.0002
(2 rows)

Find all childrens for parent ‘0001.0003’:

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

Find parent for children ‘0001.0003.0002.0002.0005’:

$ SELECT user_id, path FROM comments WHERE path = subpath('0001.0003.0002.0002.0005', 0, -1) ORDER by path;
 user_id |        path
---------+---------------------
       2 | 0001.0003.0002.0002
(1 row)

If you path will not be unique, it will get several records.

Summary

As can be seen, working with ltree materialized path is very simple. In this article, I have listed are not all the possible usage of ltree. It is not considered full-text search issues ltxtquery. But you can found this in documentation.

That’s all folks! Thank you for reading till the end.

Published:

September 02 2013

LOADING DISCUSSION