EXPLAIN to me, why are you so slow?

Oleksii Vasyliev, Railsware

EXPLAIN to me,
why are you so
slow?

Brought to you by Alexey Vasiliev, Railsware

Oleksii Vasyliev

Your SQL query is slow.
What to check?

ExplainEverything

EXPLAIN

PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. You can use the EXPLAIN command to see what query plan the planner creates for any query

EXPLAIN ANALYZE

The ANALYZE option causes the sql statement to be executed first and then actual run-time statistics in the returned information including total elapsed time expended within each plan node and the number of rows it actually returned

EXPLAIN ANALYZE

The ANALYZE statement actually executes the SQL statement and discards the output information, therefore, if you want to analyze any update statement such as INSERT, UPDATE, or DELETE without affecting the data, you should wrap the EXPLAIN ANALYZE in a transaction as follow:

BEGIN;
EXPLAIN ANALYZE sql_statement;
ROLLBACK;

EXPLAIN VERBOSE

The VERBOSE parameter allows you to show additional information regarding the plan

EXPLAIN COSTS

The COSTS option includes the estimated startup and total costs of each plan node, as well as the estimated number of rows and the estimated width of each row in the query plan

EXPLAIN BUFFERS

This parameter adds information to the buffer usage. BUFFERS only can be used when ANALYZE is enabled

EXPLAIN TIMING

This parameter includes the actual startup time and time spent in each node in the output. The TIMING defaults to TRUE and it may only be used when ANALYZE is enabled

EXPLAIN SUMMARY

The SUMMARY parameter adds summary information such as total timing after the query plan. Note that when ANALYZE option is used, the summary information is included by default

EXPLAIN FORMAT

Specify the output format of the query plan such as TEXT, XML, JSON, and YAML. This parameter is set to TEXT by default

EXPLAIN results

EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2=t2.unique2 ORDER BY t1.fivethous;
                                                              QUERY PLAN
  --------------------------------------------------------------------------------------------------------------------------------------------
  Sort (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)
    Sort Key: t1.fivethous
    Sort Method: quicksort Memory: 77kB
    -> Hash Join (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)
        Hash Cond: (t2.unique2 = t1.unique2)
        -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)
          -> Hash (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 28kB
              -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)
                Recheck Cond: (unique1 < 100)
                  -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)
                    Index Cond: (unique1 < 100)
Planning time: 0.194 ms
Execution time: 8.008 ms

First reaction

EXPLAIN tree structure

Sort
└── Hash Join
    ├── Seq Scan
    └── Hash
        └── Bitmap Heap Scan
            └── Bitmap Index Scan

Index and Table Access

Index and Table Access

Index and Table Access

Join Operations

Join Operations

Sorting and Grouping

Sorting and Grouping

Top-N Queries

But how are the decisions made?

HowItSelectPlan

How does PostgreSQL optimize queries?

Dynamic programming

Genetic algorithm

HowItSelectPlan

Planner Cost Constants

seq_page_cost = 1.0 # measured ON an arbitrary scale
random_page_cost = 4.0 # same scale AS above
cpu_tuple_cost = 0.01 # same scale AS above
cpu_index_tuple_cost = 0.005 # same scale AS above
cpu_operator_cost = 0.0025 # same scale AS above

Planner Cost Constants

EXPLAIN ANALYZE SELECT * FROM users WHERE id = 15;
                          QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using users_pkey on users  (cost=0.42..8.44 rows=1 width=1111)
 (actual time=1.024..1.025 rows=1 loops=1)
  Index Cond: (id = 15)
 Total runtime: 1.071 ms
(3 rows)

Planner Cost Constants

SET enable_indexscan = OFF;
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 15;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on users  (cost=4.43..8.44 rows=1 width=1111)
 (actual time=0.674..0.675 rows=1 loops=1)
   Recheck Cond: (id = 15)
   -> Bitmap Index Scan on users_pkey  (cost=0.00..4.43 rows=1 width=0)
   (actual time=0.664..0.664 rows=1 loops=1)
         Index Cond: (id = 15)
 Total runtime: 0.714 ms
(5 rows)

Planner Cost Constants

SET enable_bitmapscan = OFF;
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 15;
                    QUERY PLAN
-----------------------------------------------------------------
Seq Scan on users  (cost=0.00..16474.22 rows=1 width=1111)
 (actual time=0.015..867.794 rows=1 loops=1)
   Filter: (id = 15)
   Rows Removed by Filter: 310789
 Total runtime: 867.817 ms
(4 rows)

Planner Cost Constants

Index Scan using users_pkey on users (cost=0.42..8.44 rows=1 width=1111)
Bitmap Heap Scan on users (cost=4.43..8.44 rows=1 width=1111)
Seq Scan on users (cost=0.00..16474.22 rows=1 width=1111)

The range (number..number) shows cost for starting the operation row and cost for getting all rows returned by this operation

Starting cost and rows

EXPLAIN ANALYZE SELECT * FROM users ORDER BY created_at DESC LIMIT 100;
                                 QUERY PLAN
-------------------------------------------------------------------------
Limit  (cost=27570.71..27570.96 rows=100 width=1111)
 (actual time=439.334..439.455 rows=100 loops=1)
   -> Sort (cost=27570.71..28347.35 rows=310658 width=1111)
   (actual time=439.332..439.380 rows=100 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 74kB
         -> Seq Scan on users  (cost=0.00..15697.58 rows=310658 width=1111)
         (actual time=0.006..205.236 rows=310789 loops=1)
 Total runtime: 439.521 ms

Width

# EXPLAIN ANALYZE SELECT * FROM users ORDER BY id DESC LIMIT 100;
                              QUERY PLAN
----------------------------------------------------------------------
 Limit  (cost=0.42..14.79 rows=100 width=1111)
 (actual time=0.971..20.660 rows=100 loops=1)
   ->  Index Scan Backward using users_pkey on users
   (cost=0.42..44641.72 rows=310658 width=1111)
   (actual time=0.968..20.578 rows=100 loops=1)
 Total runtime: 20.755 ms
(3 rows)

Width

# EXPLAIN ANALYZE SELECT id, email FROM users ORDER BY id DESC LIMIT 100;
                           QUERY PLAN
--------------------------------------------------------------------------
 Limit  (cost=0.42..14.79 rows=100 width=27)
 (actual time=0.034..0.354 rows=100 loops=1)
   ->  Index Scan Backward using users_pkey on users
   (cost=0.42..44641.72 rows=310658 width=27)
   (actual time=0.032..0.263 rows=100 loops=1)
 Total runtime: 0.426 ms
(3 rows)
# EXPLAIN analyze SELECT * FROM t LIMIT 100;
                          QUERY PLAN
---------------------------------------------------------------------------
 LIMIT  (cost=0.00..9.33 ROWS=100 width=608)
 (actual TIME=0.008..0.152 ROWS=100 loops=1)
   ->  Seq Scan ON t  (cost=0.00..93333.86 ROWS=999986 width=608)
  (actual TIME=0.007..0.133 ROWS=100 loops=1)
 Total runtime: 0.181 ms
(3 ROWS)
 Nested Loop  (cost=0.00..3.17 rows=1 width=8)
(actual time=0.316..0.797 rows=2 loops=1)
   ->  Seq Scan on s  (cost=0.00..1.07 rows=1 width=4)
   (actual time=0.072..0.141 rows=2 loops=1)
         Filter: (id = 7)
         Rows Removed by Filter: 4
   ->  Seq Scan on s2  (cost=0.00..2.09 rows=1 width=4)
  (actual time=0.128..0.160 rows=1 loops=2)
         Filter: (id = 7)
         Rows Removed by Filter: 86
 Total runtime: 1.115 ms
 Nested Loop  (cost=0.00..10715.90 rows=26284 width=4449)
(actual time=0.054..291.131 rows=26284 loops=1)
  ->  Index Scan using books_index_title on books
  (cost=0.00..3306.28 rows=26284 width=3357)
  (actual time=0.033..50.773 rows=26284 loops=1)
  ->  Index Scan using categories_pkey on categories
  (cost=0.00..0.27 rows=1 width=1092)
  (actual time=0.002..0.003 rows=1 loops=26284)
        Index Cond: (categories.id = books.category_id)
Total runtime: 312.212 ms

Examples

Improve work_mem

Bitmap Heap Scan on posts  (cost=2.07..16.39 rows=9 width=704)
  Recheck Cond: (id < 10)
  -> Bitmap Index Scan on posts_pkey
  (cost=0.00..2.07 rows=9 width=0)
        Index Cond: (id < 10)

Improve work_mem

Hash Join  (cost=2579.00..53605.00 rows=50000 width=72) (actual time=66.820..959.794 rows=21 loops=1)
  Hash Cond: ((authors.name)::text = (publications.author)::text)
  ->  Seq Scan on authors  (cost=0.00..20310.00 rows=1000000 width=50) (actual time=0.059..267.217 rows=1000000 loops=1)
  ->  Hash  (cost=1270.00..1270.00 rows=50000 width=88) (actual time=38.054..38.054 rows=50000 loops=1)
        Buckets: 4096  Batches: 2  Memory Usage: 2948kB
        ->  Seq Scan on publications  (cost=0.00..1270.00 rows=50000 width=88) (actual time=0.010..14.211 rows=50000 loops=1)
Planning time: 0.489 ms
Execution time: 960.482 ms

Improve work_mem

Hash Join  (cost=1895.00..32705.00 rows=50000 width=72) (actual time=59.224..624.716 rows=21 loops=1)
  Hash Cond: ((authors.name)::text = (publications.author)::text)
  ->  Seq Scan on authors  (cost=0.00..20310.00 rows=1000000 width=50) (actual time=0.031..146.327 rows=1000000 loops=1)
  ->  Hash  (cost=1270.00..1270.00 rows=50000 width=88) (actual time=34.436..34.436 rows=50000 loops=1)
        Buckets: 8192  Batches: 1  Memory Usage: 5860kB
        ->  Seq Scan on publications  (cost=0.00..1270.00 rows=50000 width=88) (actual time=0.008..13.382 rows=50000 loops=1)
Planning time: 0.481 ms
Execution time: 625.796 ms

CREATE STATISTICS

# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t
	WHERE a = 1 AND b = 1;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=1 width=8)
 (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900

CREATE STATISTICS

# CREATE STATISTICS stts (dependencies) ON a, b FROM t;
# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t
	WHERE a = 1 AND b = 1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..195.00 rows=100 width=8)
 (actual rows=100 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9900

Ratio between live and dead rows

# SELECT
  relname AS ObjectName,
  pg_stat_get_live_tuples(c.oid) AS LiveTuples,
  pg_stat_get_dead_tuples(c.oid) AS DeadTuples
FROM pg_class c;

Tools

PostgreSQL EXPLAIN Visualizers

PlannerViz

PostgreSQL tools

<Thank You!> Questions?

Contact information

QuestionsSlide