Streaming huge databases using logical decoding

Streaming Huge Databases using Logical Decoding

Adventures of a naive programmer

Oleksandr “Alex” Shulgin, Zalando SE

Overview

● Introduction● Problem Statement● Approach● Problems● Some Numbers● Conclusion

Introduction

What is Logical Decoding all about?

● A new feature of PostgreSQL since version 9.4.

● Allows streaming database changes in a custom format.

● Requires an Output Plugin to be written (yes, in C).

● Consistent snapshot before all the changes?

Logical Decoding

CREATE_REPLICATION_SLOT "slot1" LOGICAL <plugin_name>;

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SET TRANSACTION SNAPSHOT 'XXXXXXXX-N';

SELECT pg_export_snapshot();

Logical Decoding Output{

"action": "I", /* INSERT */

"relation": [

"myschema", /* INTO myschema.mytable(id) */

"mytable"

],

"newtuple": {

"id": 1 /* VALUES(1) */

}

}

Problem Statement

?

Approach

?

W

W

https://github.com/zalando/saiki-tawon



Command Line and Dashboard

Long Live the Snapshot{ "w-8ee20b3": { "snapshot_id": "51E426C2-1", "ts_start": "2016-01-14 11:02:40 UTC", "heartbeat": "2016-01-15 07:10:27 UTC", "pid": 1, "backend_pid": 58793 }, "w-fbfb655": { "snapshot_id": "51E426C4-1", "ts_start": "2016-01-14 11:02:41 UTC", "heartbeat": "2016-01-15 07:10:28 UTC", "pid": 1, "backend_pid": 58794 },

...

The Source System

● Ubuntu precise (12.04) 3.2.0-xx-generic

● CPU: @2.50GHz Xeon with 24 cores

● RAM: 125.88 GB

● 6x HDDs (spinning drives) in a RAID 1+0, 5 TB total capacity

● Data size: 3.0 TB / 17 B rows (+ 1.8 TB indexes)

● PostgreSQL 9.6devel

The Target System

“Things are working amazingly fast when you write to /dev/null.”

– proverbial wisdom

Problems?

Problems!

● OpenVPN quickly becomes the bottleneck on the laptop

Problems

● OpenVPN quickly becomes the bottleneck on the laptop

Obvious solution: deploy workers closer to the database.

Docker + Mesosphere DCOS

https://zalando-techmonkeys.github.io/



Problems

● Workers quickly run out of memory

The (problematic) code:

cursor.execute("SELECT * FROM mytable")

Problems

● Workers quickly run out of memory

The (problematic) code:

cursor.execute("SELECT * FROM mytable")

● Invokes PQexec().

● Async. connection doesn’t help.

● psycopg2 is not designed to stream results.

Problems

● Invoke COPY protocol!

Corrected code:

cursor.copy_expert(

"COPY (SELECT * FROM mytable) TO STDOUT",

...)

Problems

● Invoke COPY protocol!

Corrected code:

cursor.copy_expert(


...)

● Tried 32 MB, then 64 MB per worker: it was not enough...

● One of the values was around 80 MB(!). Not much we can do.

More Problems?

● More problems with this code

The correct(?) code:

cursor.copy_expert(


...)

More Problems?

● More problems with this code

The correct(?) code:

cursor.copy_expert(


...)

● SELECT * FROM [ONLY] myschema.mytable

NoSQL?

● How about some JSON for comparison?

SELECT row_to_json(x.*) FROM mytable AS x

● Slows down the export 2-3 times.

● Not 100% equivalent to what output plugin emits.

● Have to write a C function for every plugin.

What if we would write a generic function...

CREATE FUNCTION pg_logical_slot_stream_relation(

IN slot_name NAME,

IN relnamespace NAME,

IN relname NAME,

IN nochildren BOOL DEFAULT FALSE,

VARIADIC options TEXT[] DEFAULT '{}',

OUT data TEXT

)

RETURNS SETOF TEXT ...

The Final Codecursor.copy_expert(

"COPY (SELECT pg_logical_slot_stream_relation(...)) TO STDOUT",

...)

● Do not use SELECT … FROM pg_logical_slot_… – it caches result in the backend.

● Requires changes to core PostgreSQL.

● Ideally should not require a slot, only a snapshot.

● Slots cannot be used concurrently (yet).

At Last: Some Numbers6 client processes, SSL (no compression), 1Gbit/s network interface

Query Run Time Volume Notes

SELECT * FROM … 7.5 h 2.7 TB 105 MB/s

pglogical / JSON 17.5 h 6.7 TB 112 MB/s

pglogical / native 30+ h (incomplete) 11+ TB 106 MB/s

Bottled Water / Avro 13.5 h 5.0 TB 108 MB/s

Space for ImprovementIn native protocol format pglogical_output emits metadata per each tuple.

● Metadata overhead: 5.0 TB (167%)

○ nspname + relname + attnames

● Protocol overhead: 1.5 TB (50%)

○ message type + lengths

Set plugin option relmeta_cache_size to -1.

● Network is apparently the bottleneck.

● What if we enable SSL compression?..

A Common Number: ~110 MB/s

sslcompression=1?

● Nowadays seems to be really hard to do, re: CRIME vulnerability.

● Older distro versions: set env. OPENSSL_DEFAULT_ZLIB

● Newer distro versions: OpenSSL is compiled without zlib. Good luck!

● TLSv1.3 will remove support for compression.

● HINT: your streaming replication is likely to be running uncompressed.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-4929



A Much Better Picture

Compression FTW!24 client processes, SSL (with compression)

Query Run Time Volume Notes

SELECT * FROM … 3h (vs. 7.5 h) 2.7 TB

pglogical / JSON 7-8* h (vs. 17.5 h) 6.7 TB *ordering

pglogical / native 8-9* h (vs. 30+ h) 7.2 TB (vs. 11+ TB)

Bottled Water / Avro 10 h (vs. 13.5 h) 5.0 TB

In Conclusion

● Set relmeta_cache_size with pglogical_output native.

● Run a benchmark to see if you need compression.

● Order tables from largest to smallest.

● Do listen on the replication slot once the export is finished.

● Help needed: review the proposed streaming interface!

References

● https://github.com/zalando/saiki-tawon

● https://github.com/2ndQuadrant/postgres/tree/dev/pglogical-output

● https://github.com/confluentinc/bottledwater-pg/

● https://zalando-techmonkeys.github.io

● https://github.com/zalando/pg_view

● http://www.slideshare.net/AlexanderShulgin3/adding-replication-

protocol-support-for-psycopg2

● http://www.postgresql.org/message-id/flat/CACACo5RNZ0OB8K...



https://github.com/2ndQuadrant/postgres/tree/dev/pglogical-output

https://github.com/2ndQuadrant/postgres/tree/dev/pglogical-output

https://github.com/confluentinc/bottledwater-pg/

https://github.com/confluentinc/bottledwater-pg/

https://zalando-techmonkeys.github.io

https://zalando-techmonkeys.github.io

https://github.com/zalando/pg_view

https://github.com/zalando/pg_view

http://www.slideshare.net/AlexanderShulgin3/adding-replication-protocol-support-for-psycopg2



http://www.postgresql.org/message-id/flat/CACACo5RNZ0OB8KwhuwF4F_xumfgbdXbw1sxdqxmyJ_gCobn=iA@mail.gmail.com

http://www.postgresql.org/message-id/flat/CACACo5RNZ0OB8KwhuwF4F_xumfgbdXbw1sxdqxmyJ_gCobn=iA@mail.gmail.com

Thank you!Questions?

Streaming huge databases using logical decoding

Technology

Transcript of Streaming huge databases using logical decoding