Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen...
Transcript of Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen...
![Page 1: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/1.jpg)
Reaching 1 billion rows / second
Hans-Jürgen Schönigwww.postgresql-support.de
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 2: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/2.jpg)
Reaching a milestone
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 3: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/3.jpg)
Traditional PostgreSQL limitations
I Traditionally:I We could only use 1 CPU core per queryI Scaling was possible by running more than one query at a timeI Usually hard to do
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 4: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/4.jpg)
PL/Proxy: The traditional way to do it
I PL/Proxy is a stored procedure language to scale out to shards.I Worked nicely for OLTP workloadsI Somewhat usable for analytics
I A LOT of manual work
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 5: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/5.jpg)
PL/Proxy: The future
I Still ok for OLTPI Certainly not the way to scale out in the futureI Too much manual workI Not transparentI Not cool enough
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 6: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/6.jpg)
On the app level
I Doing scaling on the app levelI A lot of manual workI Not cool enoughI Needs a lot of developmentI Why use a database if work is still manual?
I Solving things on the app level is certainly not an option
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 7: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/7.jpg)
The goal
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 8: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/8.jpg)
Our goal on the PostgreSQL level
I Import massive amounts of dataI Run typical aggregatesI Process 1 billion rows in less than a secondI Scale out to as many nodes as needed
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 9: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/9.jpg)
Coming up with a data structure
I We tried to keep that simple:
node=# \d t_demoTable "public.t_demo"
Column | Type | Collation | Nullable |--------+---------+-----------+----------+id | serial | | not null |grp | integer | | |data | real | | |
Indexes:"idx_id" btree (id)
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 10: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/10.jpg)
The query
SELECT grp, count(data)FROM t_demoGROUP BY 1;
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 11: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/11.jpg)
Single server performance
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 12: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/12.jpg)
Tweaking a simple server
I The main questions are:I How much can we expect from a single server?I How well does it scale with many CPUs?I How far can we get?
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 13: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/13.jpg)
PostgreSQL parallelism
I Parallel queries have been added in PostgreSQL 9.6I It can do a lotI It is by far not feature complete yet
I Number of workers will be determined by the PostgreSQLoptimizer
I We do not want thatI We want ALL cores to be at work
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 14: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/14.jpg)
Adjusting CPU core usage
I Usually the number of processes per scan is derived from thesize of the table
test=# SHOW min_parallel_relation_size ;min_parallel_relation_size
----------------------------8MB
(1 row)
I One process is added if the tablesize triples
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 15: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/15.jpg)
Overruling the planner
I We could never have enough data to make PostgreSQL go for16 or 32 cores.
I Even if the value is set to a couple of kilobytes.I The default mechanism can be overruled:
test=# ALTER TABLE t_demoSET (parallel_workers = 32);
ALTER TABLE
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 16: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/16.jpg)
Making full use of cores
I How well does PostgreSQL scale on a single box?I For the next test we assume that I/O is not an issue
I If I/O does not keep up, CPU does not make a differenceI Make sure that data can be read fast enough.
I Observation: 1 SSD might not be enough to feed a modernIntel chip
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 17: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/17.jpg)
Single node scalability (1)
{width=80% }
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 18: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/18.jpg)
Single node scalability (2)
I We used a 16 core box hereI As you can see, the query scales up nicelyI Beyond 16 cores hyperthreading kicks in
I We managed to gain around 18%
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 19: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/19.jpg)
Single node scalability (3)
I On a single Google VM we could reach close to 40 million rows/ second
I For many workloads this is already more than enoughI Rows / sec will of course depend on type of query
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 20: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/20.jpg)
Moving on to many nodes
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 21: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/21.jpg)
The basic system architecture (1)
I We want to shard data to as many nodes as neededI For the demo: Place 100 million rows on each node
I We do so to eliminate the I/O bottleneckI In case I/O happens we can always compensate using more
servers
I Use parallel queries on each shard
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 22: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/22.jpg)
Testing with two nodes (1)
explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;Finalize HashAggregate
Group Key: t_demo.grp-> Append
-> Foreign Scan (partial aggregate)-> Foreign Scan (partial aggregate)-> Partial HashAggregate
Group Key: t_demo.grp-> Seq Scan on t_demo
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 23: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/23.jpg)
Testing with two nodes (2)
I Throughput doubles as long as partial results are smallI Planner pushes down stuff nicelyI Linear increases are necessary to scale to 1 billion rows
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 24: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/24.jpg)
Preconditions to make it work (1)
I postgres_fdw uses cursors on the remote sideI cursor_tuple_fraction has to be set to 1 to improve the
planning processI set fetch_size to a large value
I That is the easy part
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 25: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/25.jpg)
Preconditions to make it work (2)
I We have to make sure that all remote nodes work at the sametime
I This requires “parallel append and async fetching”I All queries are sent to the many nodes in parallelI Data can be fetched in parallel
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 26: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/26.jpg)
Preconditions to make it work (3)
I PostgreSQL could not be changed without substantial workbeing done recently
I Traditionally joins had to be done BEFORE aggregationI This is a showstopper for distributed aggregation because all the
data has to be fetched from the remote host before aggregation
I Kyotaro Horiguchi fixed, which made our work possibleI This was a HARD task !
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 27: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/27.jpg)
Preconditions to make it work (4)
I Easy tasks:I Aggregates have to be implemented to handle partial results
coming from shardsI Code is simple and available as extension
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 28: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/28.jpg)
Parallel execution on shards is now possible
I Dissect aggregationI Send partial queries to shards in parallelI Perform parallel execution on shardsI Add up data on main node
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 29: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/29.jpg)
Final results
node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;grp | count
-----+-----------0 | 3200000001 | 320000000
...9 | 320000000
(10 rows)Planning time: 0.955 msExecution time: 2910.367 ms
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 30: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/30.jpg)
Hardware used
I We used 32 boxes (16 cores) on GoogleI Data was in memoryI Adding more servers is EASY
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 31: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/31.jpg)
Future ideas
I JIT compilation will speed up executionI More parallelism for more executor nodesI General speedups (tuple deforming, etc.)I In the future FEWER cores will be needed to achieve similar
results
Hans-Jürgen Schönigwww.postgresql-support.de
![Page 32: Reaching 1 billion rows / second - Cybertec · Reaching 1 billion rows / second Hans-Jürgen Schönig Hans-Jürgen Schönig](https://reader035.fdocuments.us/reader035/viewer/2022070709/5ebf9e4f7ae45b68255f9fc9/html5/thumbnails/32.jpg)
Contact us
Cybertec Schönig & Schönig GmbHHans-Jürgen SchönigGröhrmühlgasse 26A-2700 Wiener Neustadt
www.postgresql-support.de
Follow us on Twitter: @PostgresSupport
Hans-Jürgen Schönigwww.postgresql-support.de