Streaming huge databases using logical decoding

32
Streaming Huge Databases using Logical Decoding Adventures of a naive programmer Oleksandr “Alex” Shulgin, Zalando SE

Transcript of Streaming huge databases using logical decoding

Page 1: Streaming huge databases using logical decoding

Streaming Huge Databases using Logical Decoding

Adventures of a naive programmer

Oleksandr “Alex” Shulgin, Zalando SE

Page 2: Streaming huge databases using logical decoding

Overview

● Introduction● Problem Statement● Approach● Problems● Some Numbers● Conclusion

Page 3: Streaming huge databases using logical decoding

Introduction

What is Logical Decoding all about?

● A new feature of PostgreSQL since version 9.4.

● Allows streaming database changes in a custom format.

● Requires an Output Plugin to be written (yes, in C).

● Consistent snapshot before all the changes?

Page 4: Streaming huge databases using logical decoding

Logical Decoding

CREATE_REPLICATION_SLOT "slot1" LOGICAL <plugin_name>;

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SET TRANSACTION SNAPSHOT 'XXXXXXXX-N';

SELECT pg_export_snapshot();

Page 5: Streaming huge databases using logical decoding

Logical Decoding Output{

"action": "I", /* INSERT */

"relation": [

"myschema", /* INTO myschema.mytable(id) */

"mytable"

],

"newtuple": {

"id": 1 /* VALUES(1) */

}

}

Page 6: Streaming huge databases using logical decoding

Problem Statement

?

Page 7: Streaming huge databases using logical decoding

Approach

?

W

W

https://github.com/zalando/saiki-tawon

Page 8: Streaming huge databases using logical decoding

Command Line and Dashboard

Page 9: Streaming huge databases using logical decoding

Long Live the Snapshot{ "w-8ee20b3": { "snapshot_id": "51E426C2-1", "ts_start": "2016-01-14 11:02:40 UTC", "heartbeat": "2016-01-15 07:10:27 UTC", "pid": 1, "backend_pid": 58793 }, "w-fbfb655": { "snapshot_id": "51E426C4-1", "ts_start": "2016-01-14 11:02:41 UTC", "heartbeat": "2016-01-15 07:10:28 UTC", "pid": 1, "backend_pid": 58794 },

...

Page 10: Streaming huge databases using logical decoding

The Source System

● Ubuntu precise (12.04) 3.2.0-xx-generic

● CPU: @2.50GHz Xeon with 24 cores

● RAM: 125.88 GB

● 6x HDDs (spinning drives) in a RAID 1+0, 5 TB total capacity

● Data size: 3.0 TB / 17 B rows (+ 1.8 TB indexes)

● PostgreSQL 9.6devel

Page 11: Streaming huge databases using logical decoding

The Target System

“Things are working amazingly fast  when you write to /dev/null.”

– proverbial wisdom

Page 12: Streaming huge databases using logical decoding

Problems?

Page 13: Streaming huge databases using logical decoding

Problems!

● OpenVPN quickly becomes the bottleneck on the laptop

Page 14: Streaming huge databases using logical decoding

Problems

● OpenVPN quickly becomes the bottleneck on the laptop

Obvious solution: deploy workers closer to the database.

Docker + Mesosphere DCOS

https://zalando-techmonkeys.github.io/

Page 15: Streaming huge databases using logical decoding

Problems

● Workers quickly run out of memory

The (problematic) code:

cursor.execute("SELECT * FROM mytable")

Page 16: Streaming huge databases using logical decoding

Problems

● Workers quickly run out of memory

The (problematic) code:

cursor.execute("SELECT * FROM mytable")

● Invokes PQexec().

● Async. connection doesn’t help.

● psycopg2 is not designed to stream results.

Page 17: Streaming huge databases using logical decoding

Problems

● Invoke COPY protocol!

Corrected code:

cursor.copy_expert(

"COPY (SELECT * FROM mytable) TO STDOUT",

...)

Page 18: Streaming huge databases using logical decoding

Problems

● Invoke COPY protocol!

Corrected code:

cursor.copy_expert(

"COPY (SELECT * FROM mytable) TO STDOUT",

...)

● Tried 32 MB, then 64 MB per worker: it was not enough...

● One of the values was around 80 MB(!). Not much we can do.

Page 19: Streaming huge databases using logical decoding

More Problems?

● More problems with this code

The correct(?) code:

cursor.copy_expert(

"COPY (SELECT * FROM mytable) TO STDOUT",

...)

Page 20: Streaming huge databases using logical decoding

More Problems?

● More problems with this code

The correct(?) code:

cursor.copy_expert(

"COPY (SELECT * FROM mytable) TO STDOUT",

...)

● SELECT * FROM [ONLY] myschema.mytable

Page 21: Streaming huge databases using logical decoding

NoSQL?

● How about some JSON for comparison?

SELECT row_to_json(x.*) FROM mytable AS x

● Slows down the export 2-3 times.

● Not 100% equivalent to what output plugin emits.

● Have to write a C function for every plugin.

Page 22: Streaming huge databases using logical decoding

What if we would write a generic function...

CREATE FUNCTION pg_logical_slot_stream_relation(

IN slot_name NAME,

IN relnamespace NAME,

IN relname NAME,

IN nochildren BOOL DEFAULT FALSE,

VARIADIC options TEXT[] DEFAULT '{}',

OUT data TEXT

)

RETURNS SETOF TEXT ...

Page 23: Streaming huge databases using logical decoding

The Final Codecursor.copy_expert(

"COPY (SELECT pg_logical_slot_stream_relation(...)) TO STDOUT",

...)

● Do not use SELECT … FROM pg_logical_slot_… – it caches result in the backend.

● Requires changes to core PostgreSQL.

● Ideally should not require a slot, only a snapshot.

● Slots cannot be used concurrently (yet).

Page 24: Streaming huge databases using logical decoding

At Last: Some Numbers6 client processes, SSL (no compression), 1Gbit/s network interface

Query Run Time Volume Notes

SELECT * FROM … 7.5 h 2.7 TB 105 MB/s

pglogical / JSON 17.5 h 6.7 TB 112 MB/s

pglogical / native 30+ h (incomplete) 11+ TB 106 MB/s

Bottled Water / Avro 13.5 h 5.0 TB 108 MB/s

Page 25: Streaming huge databases using logical decoding

Space for ImprovementIn native protocol format pglogical_output emits metadata per each tuple.

● Metadata overhead: 5.0 TB (167%)

○ nspname + relname + attnames

● Protocol overhead: 1.5 TB (50%)

○ message type + lengths

Set plugin option relmeta_cache_size to -1.

Page 26: Streaming huge databases using logical decoding

● Network is apparently the bottleneck.

● What if we enable SSL compression?..

A Common Number: ~110 MB/s

Page 27: Streaming huge databases using logical decoding

sslcompression=1?

● Nowadays seems to be really hard to do, re: CRIME vulnerability.

● Older distro versions: set env. OPENSSL_DEFAULT_ZLIB

● Newer distro versions: OpenSSL is compiled without zlib. Good luck!

● TLSv1.3 will remove support for compression.

● HINT: your streaming replication is likely to be running uncompressed.

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-4929

Page 28: Streaming huge databases using logical decoding

A Much Better Picture

Page 29: Streaming huge databases using logical decoding

Compression FTW!24 client processes, SSL (with compression)

Query Run Time Volume Notes

SELECT * FROM … 3h (vs. 7.5 h) 2.7 TB

pglogical / JSON 7-8* h (vs. 17.5 h) 6.7 TB *ordering

pglogical / native 8-9* h (vs. 30+ h) 7.2 TB (vs. 11+ TB)

Bottled Water / Avro 10 h (vs. 13.5 h) 5.0 TB

Page 30: Streaming huge databases using logical decoding

In Conclusion

● Set relmeta_cache_size with pglogical_output native.

● Run a benchmark to see if you need compression.

● Order tables from largest to smallest.

● Do listen on the replication slot once the export is finished.

● Help needed: review the proposed streaming interface!

Page 32: Streaming huge databases using logical decoding

Thank you!Questions?