MySQL Cluster no PayPal

34
Henrique Leandro [email protected] Consultor Senior MySQL [email protected] Dec-2012 MySQL Cluster @ PayPal Wednesday, December 12, 12

description

Por que o PayPal escolheu o MySQL como solução para seus problemas de Big Data

Transcript of MySQL Cluster no PayPal

Page 1: MySQL Cluster no PayPal

<Insert Picture Here>

Henrique [email protected]

Consultor Senior [email protected]

Dec-2012

MySQL Cluster @ PayPal

Wednesday, December 12, 12

Page 2: MySQL Cluster no PayPal

MySQL Connect Conference BrazilDecember 04, 2012 v1.3

Big Data is a Big Scam (Most of the Time)

Daniel Austin, PayPal Technical Staff

Wednesday, December 12, 12

Page 3: MySQL Cluster no PayPal

Confidential and Proprietary 3

Big Myths about Big Data

YESQL: A Counterexample

Q&A

Today’s Agenda

Wednesday, December 12, 12

Page 4: MySQL Cluster no PayPal

Confidential and Proprietary

THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS

“How Do We Manage Reliable Distribution of Data Across Geographical Distances?”

Wednesday, December 12, 12

Page 5: MySQL Cluster no PayPal

Confidential and Proprietary

Big Data Myth #1: Big Data = NoSQL

• ‘Big Data’ Refers to a Common Set of Problems– Large Volumes– High Rates of Change

• Of Data• Of Data Models• Of Data Presentation and Output

– Often Require ‘Fast Data’ as well as ‘Big’• Near-real Time Analytics• Mapping Complex Structures

Takeaway: Big Data is the problem, NoSQL is one (proposed) solution

Wednesday, December 12, 12

Page 6: MySQL Cluster no PayPal

Confidential and Proprietary

The NoSQL Solution

• NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems– Slow - RDBMS has not scaled with CPUs– Often require complex data management (SOX,

SOR)– Costly to build and maintain, slow to change and

adapt– Intolerant of CAP models (more on this later)

• Non-relational models, usually key-value• May be batched or streaming• Not necessarily distributed geographically

Wednesday, December 12, 12

Page 7: MySQL Cluster no PayPal

Confidential and Proprietary

3 Kinds of Big Data Systems

1. Columnar K-V Systems– Hadoop, Hbase, Cassandra, PNUTs

2. Document-Based– MongoDB, TerraCotta

3. Graph-Based– FlockDB, Voldemort

Takeaway: These were originally designed as solutions to specific problems because no commercial solution would work.

Wednesday, December 12, 12

Page 8: MySQL Cluster no PayPal

Confidential and Proprietary

Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does

• Consistency, Availability, (Network) Partition• The Real Story: These are not Independent

Variables• AP =CP (Um, what? But…A != C )• Variations:

– PACELC (adds latency tolerance)

Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter.

Wednesday, December 12, 12

Page 9: MySQL Cluster no PayPal

Confidential and Proprietary

Big Data Hype Cycle: Where Are We Now?

There are currently more than 120+ NoSQL databases listed at nosql-databases.com!

You Are Here ?

As the pace of new technology solutions has slowed, some clear winners have emerged.

Wednesday, December 12, 12

Page 10: MySQL Cluster no PayPal

Confidential and Proprietary

BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS

• The first and most successful such system is DNS, created in 1983.

• Began with flat files• Currently serves the entire

Internet (!)• DNS is an AP system, availability

is #1• Many extensions complicate a

simple design• Suggests a new term for CAP-like

ideas: variability• DNS variability is very high,

often 2-3x the mean

Wednesday, December 12, 12

Page 11: MySQL Cluster no PayPal

Confidential and Proprietary

Big Data Myth: You Need A Big Data System

Well, Maybe….But Before You Go There…There are essentially two ‘Big Data Problems’:“I have too much data and it’s coming in too fast to handle with any RDBMS.”

“I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.”

Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you.But there are also other alternatives…

Wednesday, December 12, 12

Page 12: MySQL Cluster no PayPal

Confidential and Proprietary 12

Big Myths About Big Data

YESQL: A Counterexample

Q&A

Today’s Agenda

Wednesday, December 12, 12

Page 13: MySQL Cluster no PayPal

Confidential and Proprietary

“Develop a globally distributed DB For user-related data.”• Must Not Fail (99.999%)• Must Not Lose Data. Period.• Must Support Transactions• Must Support (some) SQL• Must WriteRead 32-bit integer globally in

1000ms• Maximum Data Volume: 100 TB• Must Scale Linearly with Costs

Mission YESQL

Wednesday, December 12, 12

Page 14: MySQL Cluster no PayPal

Confidential and Proprietary

What about “High Performance”?

•Maximum lightspeed distance on Earth’s Surface: ~67 ms

•Target: data available worldwide in < 1000 ms

Sound Easy?

Think Again!

Wednesday, December 12, 12

Page 15: MySQL Cluster no PayPal

Confidential and Proprietary

WHY MYSQL CLUSTER?Pro

• True HA by design– Fast recovery

• Supports (some) X-actions

• Relational Model• In-memory

architecture = high performance

• Disk storage for non-indexed data (since 5.1)

• APIs, APIs, APIs

Con• Some semantic

limitations on fields• Size constraints (2

TB?) Hardware limits also

• Higher cost/byte• Requires reasonable

data partitioning• Higher complexity

Wednesday, December 12, 12

Page 16: MySQL Cluster no PayPal

Confidential and Proprietary

How MySQL Cluster Works in One Slide

Graphics courtesy dev.mysql.com

Wednesday, December 12, 12

Page 17: MySQL Cluster no PayPal

Confidential and Proprietary 17

Wednesday, December 12, 12

Page 18: MySQL Cluster no PayPal

Confidential and Proprietary

CIRCULAR REPLICATION/FAILOVER

Graphics courtesy O’Reilly OnLamp.com

Wednesday, December 12, 12

Page 19: MySQL Cluster no PayPal

Confidential and Proprietary

• Click to edit Master text styles

19

Wednesday, December 12, 12

Page 20: MySQL Cluster no PayPal

Confidential and Proprietary

AWS Meets MySQL Cluster

• Why AWS?– Cheap and easy infrastructure-in-a-box

(Or so I thought! Ha!)• Services Used:

– EC2 (Centos 5.3, small instances for mgm & query nodes, XL for data

– Elastic IPs/ELB– EBS Volumes– S3– Cloudwatch

Wednesday, December 12, 12

Page 21: MySQL Cluster no PayPal

Confidential and Proprietary

ARCHITECTURAL TILESTiling Rules• Never separate NDB & SQL• Ndb:2-SQL:1-MGM:1• Scale by adding more tiles• Failover 1st to nearest AZ• Then to nearest DC• At least 1 replica/AZ• Don’t share nodes• Mgmt nodes are redundantLimitations• AWS is network-bound @ 250

MBPS – ouch!• Need specific ACL across AZ

boundaries• AZs not uniform!• No GSLB

A B

C

AWS Availability Zones

NDB SQLMGM

Unused (not present in all locations)

ELB

Wednesday, December 12, 12

Page 22: MySQL Cluster no PayPal

Confidential and Proprietary

Architecture Stack

A B A B A B

A B A B

5 AWS Data Centers:US-E, US-W, TK, EU, AS

A BA B

Scale by Tiling

Wednesday, December 12, 12

Page 23: MySQL Cluster no PayPal

Confidential and Proprietary

SYSTEM READ/WRITE PERFORMANCE (!)

What we tested:• 32 & 256 byte char fields• Reads, writes, query speed vs. volume• Data replication speedsResults:• Global replication < 350 ms• 256 byte read < 1000 ms worldwide

06/19/2011 06/20/2011 06/21/2011 06/22/2011 06/23/2011

In-region replication tests

Wednesday, December 12, 12

Page 24: MySQL Cluster no PayPal

Confidential and Proprietary

Data Models and Query Optimization for NDB

• Network Latency is an obvious issue• Data model requires all segments present

in each geo-region• Parameterized (Linked) Joins

– SPJ technique from Clustra (see Clement Frazer’s blog for details)

Wednesday, December 12, 12

Page 25: MySQL Cluster no PayPal

Confidential and Proprietary

Hard Lessons, Shared

• Be Careful…– With “Eventual Consistency”-related concepts– ACID, CAP are not really as well-defined as we’d like

considering how often we invoke them• MySQL Cluster is a good solution

– Real HA, real SQL– Notable limitations around fields, datatypes– Successfully competes with NoSQL systems for most use

cases – better in many cases• NoSQL Systems

– All have relatively low levels of maturity– More suitable for simpler key-value models– Victim of Tech Fashion

Wednesday, December 12, 12

Page 26: MySQL Cluster no PayPal

Confidential and Proprietary 26

Wednesday, December 12, 12

Page 27: MySQL Cluster no PayPal

Confidential and Proprietary 27

Wednesday, December 12, 12

Page 28: MySQL Cluster no PayPal

28

MySQL Enterprise Edition

Wednesday, December 12, 12

Page 29: MySQL Cluster no PayPal

28

MySQL Enterprise Edition

Mais produtividade, menores riscos e maior capacidade para o MySQL.

Oracle Premier Lifetime Support

Oracle Product Certifications/Integrations

MySQL Enterprise High Availability

MySQL Enterprise Security

MySQL Enterprise Scalability

MySQL Enterprise Backup

MySQL Enterprise Monitor/Query Analyzer

MySQL Workbench

MySQL Enterprise Audit

Wednesday, December 12, 12

Page 30: MySQL Cluster no PayPal

Confidential and Proprietary

Future Directions

• Alternate solution using Pacemaker, Heartbeat– Uses InnoDB, not NDB

• Implement Memcached plugin–To test NoSQL functionality from MySQL

• Add simple connection-based persistence to preserve connections during failover

• Better data node distribution• Better testing & monitoring

Wednesday, December 12, 12

Page 31: MySQL Cluster no PayPal

Confidential and Proprietary

Summing Up On YESQL v0.85

• It works! Far better than expected.• Very fast, very reliable• Reduced complexity since v0.7• AWS poses challenges that private data

centers may not experience• You can achieve high performance and

availability without giving up relational models and read consistency!

Wednesday, December 12, 12

Page 32: MySQL Cluster no PayPal

Confidential and Proprietary

The Big Picture on Big Data

• Only use Big Data solutions when you have a real Big Data problem.– Don’t be a Dedicated Follower of Tech Fashion!

• Not all Big Data solutions are created equal– What tradeoffs are most important to you?– Consistency, Fault Tolerance, Availability,

Performance, Variability • Is your data model a fit for NoSQL?

– You don’t have to give up the relational model in most cases, so don’t!

• You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL!

Wednesday, December 12, 12

Page 33: MySQL Cluster no PayPal

Twitter: @daniel_b_austinEmai: [email protected]

“In the long run, we are all dead eventually consistent.”Maynard Keynes on NoSQL Databases

With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works!

Wednesday, December 12, 12

Page 34: MySQL Cluster no PayPal

<Insert Picture Here>

Henrique [email protected]

Consultor Senior [email protected]

Dec-2012

Obrigado

Wednesday, December 12, 12