Postgres-XC/XL Scale-out Approach in PostgreSQL...– A kind of Postres-XC shell – Builtin...
Transcript of Postgres-XC/XL Scale-out Approach in PostgreSQL...– A kind of Postres-XC shell – Builtin...
Copyright © 2015 NTT DATA INTELLILINK Corporation
July 25th, 2015NTT DATA INTELLILINK CorporationKoichi Suzuki
Postgres Conference
HangZhou, China
Postgres-XC/XL Scale-out Approach in PostgreSQL
2Copyright © 2015 NTT DATA INTELLILINK Corporation
Introduction
3Copyright © 2015 NTT DATA INTELLILINK Corporation
● Fellow at NTT DATA Intellilink Corporation● Principal, Technology Professionals at NTT DATA Group
In Charge Of
● General Database Technology● Database in huge data warehouse and its design● PostgreSQL and its cluster technology
In The Past
● Character Set Standard (Extended Unix Code, Unicode, etc)● Heisei-font development (Technical Committee)● Oracle Porting● Object-Relational Database
About the Speaker
4Copyright © 2015 NTT DATA INTELLILINK Corporation
● Growing Database Workload both in OLTP (OnLine Transaction Processing)and OLAP (OnLine Analytical Processing) applications.
● Shared-Nothing Approach● Performance with commodity hardware/software
● Extension to existing PostgreSQL
● Transparent API● Internal API could be different
● Transparent libpq Interface
● No significant restriction to transaction ACID properties and SQLlanguage.
Motivation
5Copyright © 2015 NTT DATA INTELLILINK Corporation
● Distribution/Replication of table rows among different database “nodes”● Parallelism
● Local join operation
● SQL planning for row distribution/replication
● Consistent and synchronous transaction management among “nodes”
● Performance with commodity hardware/software
Scale-out approach
6Copyright © 2015 NTT DATA INTELLILINK Corporation
Read Scale-out in PostgreSQL Master/Slave
Slave
Master
WAL (or Redo Log)
Read/WriteTransactions Read-only Transactions
Possible time delay
7Copyright © 2015 NTT DATA INTELLILINK Corporation
Scaling Out in Postgres XL/XC
Backend Transaction Synchronization
LocalDisk
LocalDisk
LocalDisk
LocalDisk
Read/Write Transactions
No Delay in Update Visibility
8Copyright © 2015 NTT DATA INTELLILINK Corporation
OLTP Workload Scalability and Table Design
9Copyright © 2015 NTT DATA INTELLILINK Corporation
DBT-1 Workload Scalability
DBT-1 (Rev)
10Copyright © 2015 NTT DATA INTELLILINK Corporation
Table Design in DBT-1 Benchmark
C_IDC_UNAMEC_PASSWDC_FNAMEC_LNAMEC_ADDR_IDC_PHONEC_EMAILC_SINCEC_LAST_VISITC_LOGINC_EXPIRATIONC_DISCOUNTC_BALANCEC_YTD_PMTC_BIRTHDATEC_DATA
ADDR_IDADDR_STREET1ADDR_STREET2ADDR_CITYADDR_STATEADDR_ZIPADDR_CO_IDADDR_C_ID
O_IDO_C_IDO_DATEO_SUB_TOTALO_TAXO_TOTALO_SHIP_TYPEO_BILL_ADDR_IDO_SHIP_ADDR_IDO_STATUS
CUSTOMER
ADDRESS
ORDERS
OL_IDOL_O_IDOL_I_IDOL_QTYOL_DISCOUNTOL_COMMENTSOL_C_ID
ORDER_LINE
I_IDI_TITLEI_A_IDI_PUB_DATEI_PUBLISHERI_SUBJECTI_DESCI_RELATED1I_RELATED2I_RELATED3I_RELATED4I_RELATED5I_THUMBNAILI_IMAGEI_SRPI_COSTI_AVAILI_ISBNI_PAGEI_BACKINGI_DIMENASIONS
ITEM
CX_I_IDCX_TYPECX_NUMCX_NAMECX_EXPIRYCX_AUTH_IDCX_XACT_AMTCX_XACT_DATECX_CO_IDCX_C_ID
CC_XACTS
OL_IDOL_O_IDOL_I_IDOL_QTYOL_DISCOUNTOL_COMMENTSOL_C_ID
AUTHOR
ST_I_IDST_STOCK
STOCK
SC_IDSC_C_IDSC_DATESC_SUB_TOTALSC_TAXSC_SHIPPING_COSTSC_TOTALSC_C_FNAMESC_C_LNAMESC_C>DISCOUNT
SHOPPING_CART
SCL_SC_IDSCL_I_IDSCL_QTYSCL_COSTSCL_SRPSCL_TITLESCL_BACKINGSCL_C_ID
SHOPPING_CART_LINE
CO_IDCO_NAMECO_EXCHANGECO_CURRENCY
COUNTRY
Distributed withCustomer ID
Replicated
Distributed withItemID
Distributed withShopping Cart ID
11Copyright © 2015 NTT DATA INTELLILINK Corporation
MPP Performance – DBT-3 (TPC-H)
By courtesy of Mason Sharp, Postgres-XL leader
12Copyright © 2015 NTT DATA INTELLILINK Corporation
Categorize tables into two groups:
Large and frequently-updated tables
→ Distribute rows among nodes (Distributed Tables)→ Based on a column value (distribution key)→ Hash, modulo or round-robin
→ Parallelism among transactions (OLTP) or in SQL processing (OLAP)
Smaller and stable tables
→ Replicate among nodes (Replicated Tables)
→ Join Pushdown
Avoid joins between Distributed Tables with join keys different from distributionkey as possible.
Scale Out Approach (1): Table Distribution/Replication
13Copyright © 2015 NTT DATA INTELLILINK Corporation
Three distribution keys:
● Customer ID
● Shopping Cart ID
● Item ID
Some transactions involve joins across distributed tables with non-distributionjoin keys.
Scale Out Table Design in DBT-1
14Copyright © 2015 NTT DATA INTELLILINK Corporation
Some More in XL/XC Node Configuration
15Copyright © 2015 NTT DATA INTELLILINK Corporation
Coordinator:
● Maintains global catalog information
● Build global SQL plan and SQL statements for datanodes
● Interact with datanode to execute local SQL statements and accumulatethe result
Datanode
● Maintains actual data (local data)
● Run local SQL statement from Coordinator(In XL, datanode may ask other datanodes for their local data)
Node Configuration: Two-Tier Approach
16Copyright © 2015 NTT DATA INTELLILINK Corporation
Coordinator and Datanode
Coordinator
Datanode
Read/Write Transactions
17Copyright © 2015 NTT DATA INTELLILINK Corporation
GTM: Global Transaction Manager
Synchronizes each node's transaction status
Node Configuration: Yet Another Node: GTM
18Copyright © 2015 NTT DATA INTELLILINK Corporation
Two-Phase Commit Protocol Does:
● Maintain database consistency in transactions updating more than onenode.
Two-Phase Commit Protocol Doesn't:
● Maintain Atomic Visibility of Updates to other transactions (next slide)
Why GTM? Two-Phase Commit Protocol doesn't work?
19Copyright © 2015 NTT DATA INTELLILINK Corporation
Atomic Visibility and GTM
Node A Node B
Updates Aand B
Prepares Aand B
Commits Aand B
TXN 1
TXN 2
Reads B andgets old value
Reads A andgets new value
InconsistentRead!
GTM monitors TXNactivity and makenew value availableat this timing.
20Copyright © 2015 NTT DATA INTELLILINK Corporation
Final Configuration: GTM, Coordinator and Datanode
Coordinator
Datanode
Read/Write Transactions
GTM
21Copyright © 2015 NTT DATA INTELLILINK Corporation
Just like configuring many database servers to talk each other
● Many pitfalls
● Pgxc_ctl provides simpler way to configure the whole cluster
● Provide only needed parameters
● Pgxc_ctl will do the rest to issue needed commands and SQLstatements.
– Visit http://sourceforge.net/p/postgres-xc/xc-wiki/PGOpen2013_Postgres_Open_2013/
Configuration in Practice
22Copyright © 2015 NTT DATA INTELLILINK Corporation
Scalability in OLTP Workloads
23Copyright © 2015 NTT DATA INTELLILINK Corporation
Number of Transactions: ManyNumber of Involved Table Rows: SmallLocality of Row Allocation: HighUpdate Frequency: High
OLTP Workload Characteristics
24Copyright © 2015 NTT DATA INTELLILINK Corporation
Scaling Out OLTP Workload
Coordinator
Datanode
Read/Write Transactions
GTM
Run Transactions in Parallel
High workload
25Copyright © 2015 NTT DATA INTELLILINK Corporation
Scalability in OLAP (Analytic) Workloads
26Copyright © 2015 NTT DATA INTELLILINK Corporation
Number of Transactions: SmallNumber of Involved Table Rows: HugeLocality of Row Allocation: LowUpdate Frequency: Low
OLAP Workload Characteristics
27Copyright © 2015 NTT DATA INTELLILINK Corporation
Scaling Out OLAP Workload
Coordinator
Datanode
GTM
SQL
Run Small Local SQLs for eachDatanode in Parallel
Low workload
May need lesscoordinatorsTop level
aggregation
28Copyright © 2015 NTT DATA INTELLILINK Corporation
Join Offloading
29Copyright © 2015 NTT DATA INTELLILINK Corporation
Join Offloading: When row allocation is available
● Replicated Table and Partitioned Table– Can determine which datanode to go from WHERE clause
30Copyright © 2015 NTT DATA INTELLILINK Corporation
Join Offloading: When row allocation is available
● Replicated Table and Partitioned Table– When the coordinator cannot determine which datanode to go from WHERE clause
31Copyright © 2015 NTT DATA INTELLILINK Corporation
Parallel Aggregation
32Copyright © 2015 NTT DATA INTELLILINK Corporation
Aggregate Functions in PostgreSQL
Finalize Function State TransitionFunction
33Copyright © 2015 NTT DATA INTELLILINK Corporation
Aggregate Functions in Postgres-XC/XL
Finalize FunctionState Transition
FunctionState Transition
FunctionState Transition
FunctionCollector Function
DatanodeCoordinator
(Sum, Count)AVG ← (Sum, Count)
Similar to Map Reduce!
34Copyright © 2015 NTT DATA INTELLILINK Corporation
● CREATE BARRIER– Synchronize all node's WAL for restoration.
● CREATE|ALTER|DROP NODE– Maintenance of cluster node
● Caution! – not automatically propagated. Issue to each coordinator.
● CREATE/DROP NODE GROUP– Alias for group of node
● EXECUTE DIRECT
– Run SQL locally
– Read operation only
● If you are superuser, turn xc_maitenance_mode to on by setstatement to allow write operations.
● You must be responsible to any inconsistencies and side effects!
Specific statements
35Copyright © 2015 NTT DATA INTELLILINK Corporation
● pgxc_class– Definition of table distribution
● pgxc_node– Postgres-XC node information
● pgxc_group
– Node group
Specific catalogs
36Copyright © 2015 NTT DATA INTELLILINK Corporation
● pgxc_version()– Show version
● pgxc_pool_check()– Check if connection pooler is consistent with pgxc_node catalog.
● pgxc_pool_reload
– Reload cached connection data and synchronize pooler connectioninformation with pgxc_node.
● pgxc_lock_for_backup
– Only for adding new nodes.
– Locks DDL execution to make catalog stable for backup and copy to newnode.
Specific functions
37Copyright © 2015 NTT DATA INTELLILINK Corporation
Specific statements, catalogues, functions andparameters
http://postgres-x2.github.io/reference/1.2/html/sql-commands.html for details
38Copyright © 2015 NTT DATA INTELLILINK Corporation
● gtm_backup_barrier (bool)– Enable CREATE BARRIER statement.
● persistent_datanode_connections (bool)– If “true”, session never releases connections.
● xc_maintenance_mode
– Enable write operation in “EXECUTE DIRECT” statement.
– Only allowed to root users.
● min_pool_size
– Threashold for pooler to create new connection.
● max_pool_size
– Max pooled connection size.
● pooler_port
– Port number for the pooler (pgxc_ctl takes care of it)
● gtm_port
– GTM port number (pgxc_ctl takes care of it)
Specific parameters (planner parameters not included)
39Copyright © 2015 NTT DATA INTELLILINK Corporation
● max_datanodes● max_coordinators● pgxcnode_cancel_delay
– Timeout to wait cancel operation in millisconds.
– Mainly for automatic test.
● gtm_host
– GTM host name/IP address. Pgxc_ctl takes care of this.
● pgxc_node_name
– Node name of the self. Pgxc_ctl takes care of this.
Specific parameters (cont.)
40Copyright © 2015 NTT DATA INTELLILINK Corporation
Community status and future
41Copyright © 2015 NTT DATA INTELLILINK Corporation
● CREATE/DROP NODE GROUP– Alias for group of node
● Unified again?
Specific statements
42Copyright © 2015 NTT DATA INTELLILINK Corporation
● Postgres-XC is the original community– Based upon PostgreSQL 9.3
– Tested more for OLPT workload
– Now community activity as Postgres-X2
– Stabilization
● Participated by many Chinese engineers
● Next minor release are planned in this August
● Postgres-XL was became separate community for more product-oriented and betterstability– Based upon PostgreSQL 9.2
– Shares most of XC code base
– Tested more for OLAP workload
● Direct data capture between datanodes
– Provide many fixes. Most of them apply to XL as well
– Just finished merge with Postgres 9.5 alfa
● Unified again?
XC and XL community
43Copyright © 2015 NTT DATA INTELLILINK Corporation
● Source code inherits all the PostgreSQL repository (at some point)
● Fundamental features are all available
– Global transaction management
– SQL statements
– Utilities
● Further challenges
– Subtransaction (needed for full function support)
– Catching up PostgreSQL (needed?)
–
Product status
44Copyright © 2015 NTT DATA INTELLILINK Corporation
● Both communities need much more resource to move forward
– Developer
– Tester
– Real workload
● Now several Chinese farms are working together.
– Much more active members are welcome!
XC and XL community
45Copyright © 2015 NTT DATA INTELLILINK Corporation
● Both communities need much more resource to move forward
– Developer
– Tester
– Real workload
● Now several Chinese farms are working together.
– Much more active members are welcome!
XC and XL community
46Copyright © 2015 NTT DATA INTELLILINK Corporation
Postgres-XC
https://github.com/postgres-x2
https://postgres-x2.github.io
https://groups.google.com/forum/#!forum/postgres-x2-dev
https://groups.google.com/forum/#!forum/postgres-x2-general
Postgres-XL
http://www.postgres-xl.org/
XC and XL community sites
47Copyright © 2015 NTT DATA INTELLILINK Corporation
Configuring Postgres-XC
48Copyright © 2015 NTT DATA INTELLILINK Corporation
● Postgres-XC contrib module● Postgers-XC configuration and operation tool
– A kind of Postres-XC shell
– Builtin commands
– Can invoke any bash commands
● Does not expand $(variable).
● Simple configuration● Avoid many pitfalls in manual configuration and operation● Bash-based configuration file● You can write your favorite bash-script for your configuration
Pgxc_ctl
49Copyright © 2015 NTT DATA INTELLILINK Corporation
● prepare– Creates configuration file template
● deploy– Deploys postgres-xc binaries to necessary nodes
● Init [all]
– Initialize postgres-xc cluster
● Run initdb and initgtm at necessary nodes
● Do additional configuration
● Initialize node configuration
● Start/stop– Cluster and node start/stop
● Clean– Cleanup existing resource
● Monitor– See what node is running
Pgxc_ctl builtin commands (major ones)
50Copyright © 2015 NTT DATA INTELLILINK Corporation
● Createdb– Similar to createdb but select one coordinator to do it.
● Psql
– Similar to psql but select one coordinator or specify coordinatorname to connect to.
● Add
– Add gtm_proxy, coordinator and datanode (master and slave)
● Remove
– Remove gtm_proxy, coordinator and datanode (master andslave)
Pgxc_ctl builtin commands (major ones)
51Copyright © 2015 NTT DATA INTELLILINK Corporation
Demonstration
Copyright © 2015 NTT DATA INTELLILINK Corporation