200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4

44
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Transcript of 200 million qps on commodity hardware : Getting started with MySQL Cluster 7.4

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

OpenWorld 2015200 Million QPS on Commodity Hardware

Getting Started with MySQL Cluster 7.4

Frazer ClementMySQL Cluster Technical LeadBernd OcklinDirector, MySQL Cluster Engineering

October 26th, 2015

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

200 Million QPS on Commodity HardwareGetting started with MySQL Cluster 7.4

Users, Features and Releases1

2

3

4

3

Design for Availability and Scale

Performance, getting to 200M queries/second

How to get started with MySQL Cluster

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Keynote: Monday, 4.00-6.00 pm, YBCA TheaterState of the Dolphin

5

• Rich Mason, SVP & General Manager MySQL GBU, Oracle• Tomas Ulin, VP MySQL Engineering, Oracle

Customer Experiences

Hari Tatrakal, Director of Database Services, Live NationOlaniyi Oshinowo, MySQL & Open Source Technologies Leader, GEErnie Souhrada & Rob Wultsch, Database Engineers, Pinterest

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster content @ OpenWorld

Fully Elastic Real-Time Services with MySQL Cluster

Bernd Ocklin, Oracle

Conference Session

Tuesday 11am.Moscone South, 262

MySQL Server and MySQL Cluster at India’s Financial Inclusion Gateway Service

NEC et al

Conference Session

Tuesday 5.15pmMoscone South, 250

Get Started with MySQL Cluster

Benedita Vasconcelos, OracleHands On Lab

Thursday, 9.30amHotel Nikko - Peninsula

6

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Community Reception @ OpenWorld

Celebrate, Have Fun and Mingle with Oracle’s MySQL Engineers & Your Peers

7

• Tuesday, October 27th, 7 pm

• Jillian’s at Metreon: 175 Fourth Street, San Francisco CA94103At the corner of Howard and 4th st.; only 2-min walk from Moscone Center

(same place as last year)

Join us!

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster deployments

8

Web

Telecoms

High volume OLTPeCommerceOn-Line GamingDigital Marketing

User Profile ManagementSession Management & Caching

Service Delivery Platforms

VAS: VoIP, IPTV & VoD

Mobile Content Delivery

Mobile Payments

Other

Online gaming : AAA + profile management

Payment fraud detection

Many more, some unknown

DBMS research

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Who's using MySQL Cluster?

9

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster highlights

10

High Throughput Reads & Writes

Carrier-Grade Availability

Real-Time Responsiveness

On-Line, Linear Scalability

Low TCO, Open platform

Distributed, Parallel architectureTransactional, ACID-compliant relational database

Shared-nothing design, synchronous data replicationSub-second failover & self-healing recovery

Data structures optimized for RAM. Real-time extensionsPredictable low latency, bounded access times

Incrementally scale out, scale up and scale on-lineLinearly scale with distribution awareness

GPL & Commercial editions, scale on COTSFlexible APIs: SQL, C++, Java, OpenJPA, LDAP & HTTP

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster highlights

11

SQL Joins, Foreign Keys, Transactions, Row locks, Triggers, Views, Stored procedures, Blobs, keyless tables, newSQL, MySQL compatible... connectors for most languages, ORMs etc...

NoSQL Full C++ Api for best control and performance (MySQLD SE built on top), Other Apis :

Java, JPA, Node.js, Memcache....

HA 99.999% uptime systems (five nines), No single point of failure (SPOF),

Heartbeating, cluster membership, automatic failover + recovery, automatic client failover, transactional DDL, CP, async replication, advanced exception logging...

Performance and parallelism High throughput, low bounded latency (200M read tx/s). Batching, optimised protocols, Intra and Inter query parallelism, pushed

parallel filters, pushed parallel joins, non-blocking event driven multithreaded....

HA, High performance, Relational, Transactional, Distributed, Parallel, SQL, NoSQL, Shared-Nothing, Commodity ...

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster highlights

12

Scalability Scale-out nodegroups or stateless API clients online, Scale-up data nodes and

clients online with multithreading, scale up hardware online

Replication Synchronous two phase commit internally, Transactional HA async replication between clusters, conflict detection+resolution...

Storage Data transparently distributed and balanced by hash, Indexed columns in

memory, others on disk or memory, Secondary unique and ordered indexes, Redundant Redo logs and periodic checkpoints...

Manageability Online add + drop (index, column), Online consistent backup, Online upgrade, Online OS or hardware upgrade, consolidated cluster logs, C management Api for tooling...

Shared nothing, Commodity No need for shared storage, In-memory data uses disk frugally,

TCP over Ethernet / Infiniband etc, No special layer 2 requirements. Open source.

HA, High performance, Relational, Transactional, Distributed, Parallel, SQL, NoSQL, Shared-Nothing, Commodity ...

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

7.3

MySQL Cluster Releases

13

7.2 7.4

- Distributed parallel joins- Multi-TC- Active-Active- Memcached- MySQL Server 5.5

- Foreign keys- Client lib performance- node.js- MySQL Server 5.6

- Restart performance- Active-Active- Internal reporting- MySQL Server 5.6

Regular fixes and improvements

2012 2013 2014 2015

...

MySQL Cluster is built on top of and tracks GA MySQL Server releases, gaining their features, optimisations and bug fixes.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 7.4

Active-Active replication enhancements

1

System restart and maintenance activities parallelised

Improved observability and manageability

14

Performance optimisations in the data node kernel

More detail and download links at dev.mysql.com

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture

15

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

NdbApi protocol

Tables and Indices are horizontally partitioned, distributed across and replicated within the NodeGroups. Application Nodes including MySQLD, use

NdbApi to perform transactional operations and queries on data.

Most Application Nodes are themselves Servers for various client protocols

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Availability

16

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

Redundancy for availability - All nodes in each nodegroup store the same data - Can survive data node failures so long as one node per nodegroup is

available. - Load balanced, Synchronous 2PC, heartbeating, automatic failover,

recovery

PC A

Redundant components

MySQL Cluster is a CP system in that consistency is favoured over availability. Async replication between clusters gives AP properties

NdbApi protocol

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Availability

17

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

Redundancy for availability - Two (or more) management servers. - Used for configuration, node startup/shutdown, triggering backups, logging + 'split-brain' arbitration - Not critical – not involved in transaction processing / querying

Redundant components

Management nodes act as lightweight arbitrators, avoiding the cost of odd-sized data node quorums to cope with single failures.

NdbApi protocol

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Availability

18

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

Redundancy for availability - API nodes are stateless and consistent, can use n + m sparing with simple front end load balancing. - NdbApi automatically balances, fails over and back on data node failures. - Network needs no SPOF too – no single failure takes out > 1 cluster member.

Redundant components

NdbApi protocol

Availability also comes from support for online operations : Schema changes, Hardware and OS upgrades, Software upgrades, Cluster scaling

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Out

19

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

Performance + CapacityOnline scale out of back end by adding whole node groups

(Read + Write scaling)

Data Nodes can be added online, while transactions and queries are running. Existing data is rebalanced across all nodegroups.

NdbApi protocol

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Out

20

Data Nodes

Node Group 1

F1

F4

F4

F1

Node Group 2

F2

F5

F5

F2

Application Nodes

Cluster Mgmt

Cluster Mgmt

RESTJPA

Node Group 3

F3

F6

F6

F3

F1

F2

F3

F4

F5

F6

Table 1

Performance + CapacityOnline scale out of back end by adding whole node groups

(Read + Write scaling)

Application Nodes can be added and removed online, all have equal, consistent access to the data stored by the data nodes.

Performance + HAOnline scale out of front end / Api nodes

NdbApi protocol

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Up

21

Replication thread

Main thread

LDM instancesShared nothing

TC instancesShared nothing

Send threads

Request processing threads

TC and LDM threads do most work, must be well fed by Send + Receive

threads

Receive threads IO threads

Connect threads Watchdog

ndbmtdTCTransaction coordinatorLDMLocal data manager (Table + Index partitions)

Generally no more than one request processing thread per [HT] core

Data node

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Up

22

Replication thread

Main thread

LDM instancesShared nothing

TC instancesShared nothing

Send threads

Request processing threads

TC and LDM threads do most work, must be well fed by Send + Receive

threads

Receive threads IO threads

Connect threads Watchdog

ndbmtd

Data node

Configurable parallelism within a Data node

TCTransaction coordinatorLDMLocal data manager (Table + Index partitions)

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Up

23

Applicationnode

Database / Persistence layer

Business logic / State machines

Protocol decoding

Many* threads

NdbApi

API conn

Clients Clients Clients

libndbclient

NdbApi calls

Client protocol (mysql, memcached, ldap...)

'Protocol 6'

API conn API conn

MysqldMemcachedNode.js*

JavaSlapd...

- Can scale the number of threads to meet demand

- Can scale the number of NdbApi connections to avoid bottlenecks

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Architecture for Scale Up

24

Applicationnode

Database / Persistence layer

Business logic / State machines

Protocol decoding

Many* threads

NdbApi

API conn

Clients Clients Clients

libndbclient

NdbApi calls

Client protocol (mysql, memcached, ldap...)

'Protocol 6'

API conn API conn

MysqldMemcachedNode.js*

JavaSlapd...

- Can scale the number of threads to meet demand

- Can scale the number of NdbApi connections to avoid bottlenecks

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance

25

Distributedefficiency

Protocol design, optimisation, packing,

multiplexing.Data Distribution

awarenessLocality - Pushed down

filtering and joining

Coordinationavoidance

Non blocking readsParallel commit

Balance

Hash partitioning

Localefficiency

OS call amortisationNon blocking execution

Cache friendly data structures

Lock free shared data structures

Local data structuresMulti granularity pools

Scale Out Scale Up

See MySQL Connect 2012 session 'Breakthrough performance with MySQL Cluster'

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance

26

SQL joins,aggregates

Optimisations build in layers

NoSQL R/W of single rows

NoSQL R/W of multi rows

SQL R/W of multi rows

MySQL Server SQL optimisationsDistributed parallel filter + join

Batching hints, distribution

awareness, read removal

Optimised 2PC, asyncAPIs.

Low level efficiency,Coordinationavoidance

Lower volume, more complex, bigger footprint

Higher volume,simpler, smaller

footprint

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance

27

7.2 7.3 7.4

Feb 20121 billion NoSQL reads per minute

Jun 2013 8.5x better performance per NdbApi connection

Feb 2015 200 million NoSQL reads per second

50% better Sysbench read performance

Jul 20121 billion writes per minute

2.5 million SQL statements per second

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

7.3

MySQL Cluster Performance

28

7.2 7.4

Data node Multiple Transaction Coordinator (TC) threads

NdbApi Connection thread contention reduction

Data node Scan + PK lookup optimisations, Send + Recv optimisations

Regular improvements compound over releases

...

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance

29

NoSQL Bulk benchmarks - Getting to millions of requests per second on a distributed system is often a matter of efficient multiplexing and demultiplexing of individual requests - Modern hardware is very capable and so it is important to keep out of the way, avoiding context switches, threads, lock contention, small messages, extra hops, and unnecessary communication or coordination. - Many small requests must be gathered together and handled in bulk, without adversely affecting latency or application semantics.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance

30

- Delivered as part of source distribution - Multithreaded C++ NdbApi application - Uses the asynchronous features of NdbApi which allow a single thread to participate in multiple concurrent database transactions. - Row operations using the full primary key - Can make use of NdbApi Distribution Awareness hints to minimise communication - Parameters : Number of API connections, Number of threads, Number of parallel transactions per thread, Number of rows per transaction, Number of columns, Size of each column, Lockmode, Distribution Awareness, Thread partitioning …

NoSQL benchmark tool flexAsynchUnlike e.g. MySQLD / Memcached, has no

upstream clients to serve, so simpler

Details : http://mikaelronstrom.blogspot.co.uk/2013/11/how-to-make-efficient-scalable-key.html

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

31

72 API client machines running flexAsynch

32 Data node machines

running ndbmtd

1 Management node

- 100 bytes data / read - 19 GB/s aggregate data read rate - 6.4 M reads/s per data node - 612 MB/s data node read rate - 2.86 M reads/s per client - 272 MB/s read per client

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

32

72 API client machines running flexAsynch

32 Data node machines

running ndbmtd

1 Management node

- 100 bytes data / read - 19 GB/s aggregate data read rate - 6.4 M reads/s per data node - 612 MB/s data node read rate - 2.86 M reads/s per client - 272 MB/s read per client

216 NdbApi connections18,432 client threads> 10 million concurrent reads

384 TC threads384 LDM threads

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

33

The Infiniband

CloudTM

10 million conc. reads

72 x 256 threads

72 x 3 API connections

flexAsynch ndbmtd

32 x 12 TC + LDM threads

> 100 GB data

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

34

The Infiniband

CloudTM

flexAsynch ndbmtd

Not distribution aware, extra hop to data

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

35

The Infiniband

CloudTM

flexAsynch ndbmtd

Distribution aware, minimal hops

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

36

The Infiniband

CloudTM

flexAsynch ndbmtd

Distribution aware, minimal hops

Batching of requests

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

37

The Infiniband

CloudTM

flexAsynch ndbmtd

Distribution aware, minimal hops

Batching of requests

Partitionedclient threads

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster 200 million NoSQL reads/s

38

Intel hardware lab (Thanks!)

105 machines, each with 28 cores (56 HT threads) - 2 sockets Intel Xeon 'Haswell' E5-2697 v3 processors Each socket : - 14 cores (28 HT threads) - 2.6GHz base, 3.6GHz turbo - 35MB LLC - 64GB DDR4 memory - Infiniband + Gig Ethernet

56 Gbps switched Infiniband network.~1 Tbps bisection bandwidth

Software configuration

Data nodes : - 12 LDM threads (non-HT) - 12 TC threads (HT) - 2 Send threads (non-HT) - 8 Receive threads (HT) - MaxSendDelay config

API nodes : - 3 NdbApi connections per client machine - 256 flexAsynch threads per client machine

Scripts : https://dev.mysql.com/downloads/benchmarks.html

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster NoSQL Scale Out

39

0 5 10 15 20 25 30 350

50

100

150

200

250

Data node throughput scaling

Million NoSQL reads/s as number of data nodes scales

Number of Data nodes

Mill

ion

re

ad

s/s

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160

180

API connection scaling

Million NoSQL reads/s as API connections scale @ 24 data nodes

Number of Api connections

Mill

ion

re

ad

s/s

API node scaling saturates Data nodes

with Infiniband interrupts

Near-linear scaling, 92% efficiency at 32 nodes

Infiniband adapters configred for latency rather than throughput, but benchmarks reached within 10% of maximum throughput in any case

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Getting started with MySQL Cluster

40

Try Cluster at OOW!Benedita's Hands-on Lab on Thursday morning

Getting started video on YouTubehttps://www.youtube.com/watch?v=4OixfzhOJoA

QuickStart whitepaperhttp://downloads.mysql.com/tutorials/cluster/mysql_wp_cluster_quickstart.pdf

MySQL Cluster 'Getting Started' pagehttps://www.mysql.com/products/cluster/start.html

education.oracle.com MySQL Cluster courses

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Getting started with MySQL Cluster

41

Tips

Start small and simple - Minimal nodes + configuration - (< 10M concurrent reads!) - Start on localhost to rule out firewall issues

Get it up and running, then add complexity

Experiment with mysql / mysqld, node failures, applications

Consider using MySQL Cluster Manager (https://edelivery.oracle.com)

Ask for help : forums.mysql.com

F1

F4

F4

F1

My laptop

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 42

Classroom

Training

Learning

Subscription

Live Virtual Class

Training On

Demand

Keep Learning with OracleUniversity

education.oracle.com

Cloud

Technology

Applications

Industries

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Keynote: Monday, 4.00-6.00 pm, YBCA TheaterState of the Dolphin

43

• Rich Mason, SVP & General Manager MySQL GBU, Oracle• Tomas Ulin, VP MySQL Engineering, Oracle

Customer Experiences

Hari Tatrakal, Director of Database Services, Live NationOlaniyi Oshinowo, MySQL & Open Source Technologies Leader, GEErnie Souhrada & Rob Wultsch, Database Engineers, Pinterest

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

MySQL Cluster Performance Gains

44

Synchronous API - Operation definition and execution are separated. - Single user thread can define a batch of operations, then execute them together, with only one API ↔ DB round trip - A transaction can contain one or more batches of operations. - 1 user thread : 1 executing transaction

Asynchronous API adds : - Single user thread can define, execute and wait for the results of multiple independent transactions. - 1 user thread : n executing transactions

Async Api allows the number of client threads to be reduced giving efficiency gains.