PostgreSQL and Big Data

Alexey Vasiliev, Railsware

Alexey Vasiliev

Web and Mobile Developer (Ruby, Java, JavaScript, Objective-C, C/C++, Golang, Elixir/Erlang), DevOps
Open-Source libs: PGTune, SQL Joins Visualizer, RWbox, Go-Kinesis, ElixirV8, WebP-ffi, Zopfli-ffi, MongodbLogger, SMTRails, SHTRails, ...
Open-Source books: Cooking Infrastructure by Chef, Setting up and scaling of PostgreSQL (Russian)
Leading RWpod podcast about Ruby and JavaScript

Big Data

What is Big Data?

That which doesn't fit in an Excel spreadsheet.

Stéphane Hamel

Any data too big for copy to DVDs and fit into 1 Lada.

Kazakhstan dictionary

When person monitors database, it is called small data and when database monitors person, it is called big data.

Tarun Kumar

What is Big Data?

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

PostgreSQL

What is PostgreSQL?

PostgreSQL logo PostgreSQL ("Postgres") - is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards-compliance. Based on the SQL language and supports many of the features of the standard SQL:2011. PostgreSQL evolved from the Ingres project at the University of California, Berkeley. In 1982 the leader of the Ingres team, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres. He returned to Berkeley in 1985 and started a post-Ingres project

PostgreSQL strengths

database support of virtually unlimited size
powerful and reliable transaction and replication mechanisms
extensible embedded programming languages: are supported PL/pgSQL, PL/Perl, PL/Python and PL/Tcl, PL/Java, PL/PHP, PL/Py, PL/R, PL/Ruby, PL/Scheme, PL/sh and PL/V8 + loading C-compatible modules
inheritance
easy extensibility

PostgreSQL limits

Limit	Value
Maximum Database Size	Unlimited
Maximum Table Size	32 TB
Maximum Row Size	1.6 TB
Maximum Field Size	1 GB
Maximum Rows per Table	Unlimited
Maximum Columns per Table	250-1600 depending on column types
Maximum Indexes per Table	Unlimited

Indexes

Index use cases

Data search - all indexes support search values on equality. Some indexes also support prefix search (like “abc%”), arbitrary ranges search
Optimizer - B-Tree and R-Tree indexes represent a histogram arbitrary precision
Join - indexes can be used for Merge, Index algorithms
Relation - indexes can be used for except/intersect operations
Aggregations - indexes can effectively calculate some aggregation function
Grouping - indexes can effectively calculate the arbitrary grouping and aggregate functions (sort-group algorithm)

B-Tree

R-Tree

Hash Index

Bitmap Index

Generalized Search Tree (GiST) index

Generalized Inverted (GIN) index

Block Range (BRIN) Indexes (9.5+)

Block Range (BRIN) Indexes

# \dt+ orders
                  List of relations
 Schema |  Name  | Type  | Owner | Size  | Description
--------+--------+-------+-------+-------+-------------
 public | orders | table | leo   | 13 GB |
(1 row)

Block Range (BRIN) Indexes

# EXPLAIN ANALYSE SELECT count(*) FROM orders
    WHERE order_date BETWEEN '2012-01-04 09:00:00'
    and '2014-01-04 14:30:00';
...
Planning time: 0.140 ms
Execution time: 30172.482 ms
(6 rows)

Block Range (BRIN) Indexes

# CREATE INDEX idx_order_date_brin ON orders
 USING BRIN (order_date);
# \di+ idx_order_date_brin
                              List of relations
 Schema |        Name         | Type  | Owner | Table  |  Size  | Description
--------+---------------------+-------+-------+--------+--------+-------------
 public | idx_order_date_brin | index | leo   | orders | 504 kB |
(1 row)

Block Range (BRIN) Indexes

# EXPLAIN ANALYSE SELECT count(*) FROM orders
       WHERE order_date BETWEEN '2012-01-04 09:00:00'
       and '2014-01-04 14:30:00';
...
->  Bitmap Index Scan on idx_order_date_brin
      Index Cond: ((order_date >= '2012-01-04 09:00:00+00'::timestamp with time zone) AND (order_date <= '2014-01-04 14:30:00+00'::timestamp with time zone))
Planning time: 0.108 ms
Execution time: 6347.701 ms
(9 rows)

Block Range (BRIN) Indexes

# CREATE INDEX idx_order_date_brin_32
 ON orders
 USING BRIN (order_date) WITH (pages_per_range = 32);

# CREATE INDEX idx_order_date_brin_512
 ON orders
 USING BRIN (order_date) WITH (pages_per_range = 512);

Block Range (BRIN) Indexes

# \di+ idx_order_date_brin*
                                List of relations
 Schema |          Name           | Table  |  Size
--------+-------------------------+--------+---------
 public | idx_order_date_brin     | orders | 504 kB
 public | idx_order_date_brin_32  | orders | 1872 kB
 public | idx_order_date_brin_512 | orders | 152 kB
(3 rows)

JSONB

The only difference between json and jsonb is their storage:

json is stored in its plain text format, while
jsonb is stored in some binary representation

There are 3 major consequences of this:

jsonb usually takes more disk space to store than json (sometimes not)
jsonb takes more time to build from its input representation than json
json operations take significantly more time than jsonb (& parsing also needs to be done each time you do some operation at a json typed value)

JSONB

# SELECT '{"name": "Joe", "age": 30}'::jsonb
       || '{"town": "London"}'::jsonb;
                    ?column?
 ----------------------------------------------
  {"age": 30, "name": "Joe", "town": "London"}

JSONB Operators

Operator	Description
->	Get an element by key as a JSON object
->>	Get an element by key as a text object
#>	Get an element by path as a JSON object
#>>	Get an element by path as a text object
<@, @>	Evaluate whether a JSON object contains a key/value pair
?	Evaluate whether a JSON object contains a key or a value
?\|	Evaluate whether a JSON object contains ANY of keys or values
?&	Evaluate whether a JSON object contains ALL of keys or values
\|\|	Insert or Update an element to a JSON object
-	Delete an element by key from a JSON object
#-	Delete an element by path from a JSON object

Analytics

WITH

# WITH prepared_data AS ( ... )
SELECT data, count(data),
       min(data), max(data)
FROM prepared_data
GROUP BY data;

WITH RECURSIVE

# WITH RECURSIVE t(n) AS (VALUES (1)
UNION ALL SELECT n+1 FROM t WHERE n < 100)
SELECT sum(n) FROM t;

 sum
------
 5050
(1 row)

Window functions

# SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
 depname  | empno | salary |          avg
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)

Window functions

# SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
 salary |  sum
--------+-------
   3500 |  3500
   3900 |  7400
   4200 | 11600
   4500 | 16100
   4800 | 25700
   4800 | 25700
   5000 | 30700
   5200 | 41100
   5200 | 41100
   6000 | 47100
(10 rows)

FILTER replaces CASE WHEN

# SELECT
  COUNT(*) AS unfiltered,
  SUM( CASE WHEN i < 5 THEN 1 ELSE 0 END ) AS filtered
FROM generate_series(1,10) AS s(i);

# SELECT
  COUNT(*) AS unfiltered,
  COUNT(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);

Ordered-set (WITHIN GROUP) aggregates (9.4+)

percentile_cont()
percentile_disc()
mode()
rank()
dense_rank()
percent_rank()
cume_dist()

Mode() - most common value

# SELECT mode() WITHIN GROUP (ORDER BY some_value)
      AS modal_value FROM tbl;
 mode
------
    9
(1 row)

Rank() - rank of a value in a subset

# SELECT v, RANK() OVER(ORDER BY v) FROM t;
| V | RANK |
|---|------|
| a |    1 |
| a |    1 |
| b |    3 |
| c |    4 |
| c |    4 |
| d |    6 |

Dense_rank() - rank() with no gaps

# SELECT v, DENSE_RANK() OVER(ORDER BY v) FROM t;
| V | DENSE_RANK |
|---|------------|
| a |          1 |
| a |          1 |
| b |          2 |
| c |          3 |
| c |          3 |
| d |          4 |

GROUPING SETS (9.5+)

# SELECT * FROM items_sold;
 brand | size | sales
-------+------+-------
 Foo   | L    |  10
 Foo   | M    |  20
 Bar   | M    |  15
 Bar   | L    |  5
(4 rows)

GROUPING SETS (9.5+)

# SELECT brand, size, sum(sales) FROM items_sold
      GROUP BY GROUPING SETS ((brand), (size), ());
 brand | size | sum
-------+------+-----
 Foo   |      |  30
 Bar   |      |  20
       | L    |  15
       | M    |  35
       |      |  50

CUBE (9.5+)

# SELECT brand, size, sum(sales) FROM items_sold GROUP BY CUBE (brand, size);
 brand | size | sum
-------+------+-----
 Bar   | L    |   5
 Bar   | M    |  15
 Bar   |      |  20
 Foo   | L    |  10
 Foo   | M    |  20
 Foo   |      |  30
       |      |  50
       | L    |  15
       | M    |  35

CUBE (9.5+)

# CUBE(c1, c2, c3)
    (с1, null, null)
    (null, c2, null)
    (null, null, c3)
    (c1, c2, null)
    (c1, null, c3)
    (null, c2, c3)
    (c1, c2, c3)
    (null, null, null)

ROLLUP (9.5+)

# SELECT brand, size, sum(sales) FROM items_sold
      GROUP BY ROLLUP (brand, size);
   brand | size | sum
  -------+------+-----
   Bar   | L    |   5
   Bar   | M    |  15
   Bar   |      |  20
   Foo   | L    |  10
   Foo   | M    |  20
   Foo   |      |  30
         |      |  50

ROLLUP (9.5+)

# ROLLUP(c1, c2, c3, c4)
    (c1, c2, c3, c4)
    (c1, c2, c3, null)
    (c1, c2, null, null)
    (c1, null, null, null)
    (null, null, null, null)

TABLESAMPLE

# EXPLAIN ANALYZE SELECT * FROM test TABLESAMPLE SYSTEM ( 0.001 );
                  QUERY PLAN
------------------------------------------------------------------
 Sample Scan (system) on test  (cost=0.00..0.10 rows=10 width=25)
   (actual time=0.029..0.042 rows=136 loops=1)
 Planning time: 0.125 ms
 Execution time: 0.061 ms
(3 rows)

TABLESAMPLE

# EXPLAIN ANALYZE SELECT * FROM test TABLESAMPLE BERNOULLI ( 0.001 );
                  QUERY PLAN
------------------------------------------------------------------
Sample Scan (bernoulli) on test  (cost=0.00..7353.10 rows=10 width=25)
    (actual time=2.779..27.288 rows=15 loops=1)
 Planning time: 0.112 ms
 Execution time: 27.312 ms
(3 rows)

TABLESAMPLE - Sampling methods

TABLESAMPLE REPEATABLE

The optional REPEATABLE clause specifies a seed number or expression to use for generating random numbers within the sampling method. The seed value can be any non-null floating-point value. Two queries that specify the same seed and argument values will select the same sample of the table, if the table has not been changed meanwhile

# SELECT * FROM t TABLESAMPLE SYSTEM (0.001) REPEATABLE (200);

Materialized Views

# REFRESH MATERIALIZED VIEW CONCURRENTLY my_view

Sharding

Yahoo claims 2-petabyte database is world's biggest, busiest

the system is described as an over 2 petabyte repository of user click stream and context data with an update rate for 24 billion events per day

James Hamilton

Petascale SQL DB at Yahoo!

Sharding solutions

PL/Proxy (Skype, Open Source)
Pivotal Greenplum (Pivotal, Open Source)
CitusDB (CitusData, Open Source)
AWS Redshift (Amazon, closed source)

Postgres-XC

Write‐scalable PostgreSQL cluster (more than 3x scalability performance speedup with five servers, compared with pure PostgreSQL)
Synchronous multi‐master configuration (any update to any master is visible from other masters immediately)
Table location transparent
Based upon PostgreSQL
Same API to Apps as PostgreSQL

Pg_shard

CREATE EXTENSION pg_shard;
CREATE TABLE customer_reviews
(
    customer_id TEXT NOT NULL,
    review_date DATE,
    review_rating INTEGER
);
SELECT master_create_distributed_table('customer_reviews', 'customer_id');
SELECT master_create_worker_shards('customer_reviews', 16, 2);

Pg_shard Limitations

Transactional semantics for queries that span across multiple shards
Unique constraints on columns other than the partition key, or foreign key constraints
Distributed "JOIN"s also aren't supported
Table alterations are not supported
"DROP TABLE" does not have any special semantics when used on a distributed table
Queries such as "INSERT INTO foo SELECT bar, baz FROM qux" are not supported

Cstore_fdw - columnar store for analytics with PostgreSQL

Foreign table inheritance (9.5+)

CREATE TABLE events (...);
CREATE TABLE events_current () INHERITS(main);
CREATE FOREIGN TABLE events_201501 (
CHECK(20150101 <= event_date AND event_date <= 20150131)
)
INHERITS(main) SERVER cstore_server OPTIONS(compression 'pglz');

Fast INSERT and look-ups into current table
Periodically move data to archive table for compression
Query both via main table

Additional Options - "NoSQL" database

CouchDB, MongoDB, HBase, Cassandra, Redis
Document store
Map/Reduce querying

Foreign Data Wrappers

Foreign Data Wrappers

<Thank You!> Questions?

Contact information

web: leopard.in.ua
github: le0pard
twitter: @leopard_me
podcast: RWpod