Sharding using MySQL and PHP
-
Upload
mats-kindahl -
Category
Documents
-
view
120 -
download
10
description
Transcript of Sharding using MySQL and PHP
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 122
Sharding using PHP
Mats Kindahl (Senior Principal Software Developer)
Insert Picture Here
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
About the Presentation
After this presentation you should know what sharding is and the basic caveats surrounding sharding. You should also have an idea of what is needed to develop a sharding solution.
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Program Agenda
Why do we shard
Introduction to sharding
High-level sharding architecture
Elements of a sharding solution
Sharding planning
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
What is sharding?
● Slice your database into independent data “shards”
● Queries execute only on one shard
● Shards can be stored on different servers
Splintering
HorizontalPartitioning
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Sharding for locality “Big Data” close to user
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Sharding for performance
Reduced working set
Parallel processing
Database vs. cache
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Sharding Limitations
● Auto-increment
– Composite key
– Distributed key generation
– UUID?
● Cross-shard joins
– Very expensive: avoid them
– Federated tables?
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Developing a Sharding Solution
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
High-level Architecture
● Broker
– Distributes queries
● Sharding Database
– Information about the shards
– If it goes down, all goes down
– Need to be HA
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Running Example: Employees sample database
Table Rows
salaries 2 844 047
titles 443 308
employees 300 024
dept_emp 331 603
dept_manager 24
departments 9
00
0000
00
0000
12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data
Query Operations
Meta-Data
Areas to cover
ShardingSharding
13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
PartitionData
MappingKeys
ShardAllocation
Data
Key Columns
Dependent Columns
Tables to Shard
Single Shard
Multiple Shards
Range Mapping
Hash Mapping
List Mapping
14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data
Table Rows
salaries 284 404 700
titles 44 330 800
employees 30 002 400
dept_emp 33 160 300
dept_manager 2 400
departments 900
15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: sharding column(s)
● Sharding columns dictated by queries
– Queries should give same result before and after sharding
● One or more columns
– Does not have to be primary key, but easier if it is
● Sharding key is needed for re-sharding
emp_no birth_date first_name last_name gender hire_date
4711 1989-06-13 John Smith M 2009-12-24
19275 1954-11-12 Sally Smith F 1975-01-01
27593 1477-05-19 Mats Kindahl M 2002-02-27
587003 1830-08-28 Charles Bell M 2003-11-31
16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
● Choice of sharding columns
– Distribution
– Locality
● Avoid non-unique keys
– Difficult to get good distribution
– Avoid: Country
– Prefer: Employee ID
9 millions
200 millions
US
SE
Partitioning the data: sharding column(s)
17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: dependent columns
Table Rows
salaries 284 404 700
titles 44 330 800
employees 30 002 400
dept_emp 33 160 300
dept_manager 2 400
departments 900
??
??
??
??Foreign keys
18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: dependent columns
● Referential Integrity Constraint
– Example query joining salaries and employees
– Same key, same shard
● JOIN within a shard
SELECT first_name, last_name, salaryFROM salaries JOIN employees USING (emp_no)WHERE emp_no = 21012 AND CURRENT_DATE BETWEEN from_date AND to_date;
19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: dependent columns
● Referential Integrity
– Foreign Keys
● Dependent rows
– Same shard
– Join on equality
● Sharding Columns
– Follow foreign keys
mysql> SELECT table_schema, table_name, column_name -> FROM -> information_schema.key_column_usage -> JOIN -> information_schema.table_constraints -> USING -> (table_schema, table_name, constraint_name) -> WHERE constraint_type = 'FOREIGN KEY' -> AND referenced_table_schema = 'employees' -> AND referenced_table_name = 'employees' -> AND referenced_column_name = 'emp_no';+--------------+--------------+-------------+| table_schema | table_name | column_name |+--------------+--------------+-------------+| employees | dept_emp | emp_no || employees | dept_manager | emp_no || employees | salaries | emp_no || employees | titles | emp_no |+--------------+--------------+-------------+4 rows in set (0.56 sec)
Handy query to f
ind
all dependent colu
mns
20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: unsharded tables
Table Rows
salaries 284 404 700
titles 44 330 800
employees 30 002 400
dept_emp 33 160 300
dept_manager 2 400
departments 900
??
21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: unsharded tables
● Referential Integrity Constraint
– Join with sharded tables
– Tables dept_emp (and dept_manager) references two tables
● Shard table departments?
– Not necessary: small table
– Difficult to get right: keeping shards of two tables in same location
SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no)WHERE emp_no = 21012 GROUP BY emp_no;
22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Partitioning the data: unsharded tables
● Solution: do not shard departments
– Keep table on all shards
– Joins will only need to address one shard
● You need to consider
… how to update unsharded table
SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no)WHERE emp_no = 21012 GROUP BY emp_no;
23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
PartitionData
MappingKeys
ShardAllocation
Data
Key Columns
Dependent Columns
Tables to Shard
Single Shard
Multiple Shards
Range Mapping
Hash Mapping
List Mapping
24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Keys to Shards
● Given
– Sharding key value
– Optional other information (tables accessed, RO or RW, etc.)
● Provide the following
– Shard location (host, port)
– Shard identifier (if you have multiple shards for each server)
25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Keys to Shards
● Range Mapping: range of values for each shard
– Type-dependent
● Hash Mapping: hash of key to find shard
– Type-independent
– Complicated?
● List Mapping: list of keys for each shard
– Does not offer good distribution
26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
PartitionData
MappingKeys
ShardAllocation
Data
Key Columns
Dependent Columns
Tables to Shard
Single Shard
Multiple Shards
Range Mapping
Hash Mapping
List Mapping
27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Shard Allocation: Single Shard per Server
● Idea: there is only one shard on each server
● Advantage: Cross-database queries does not require rewrite
● Disadvantage: Expensive to balance server load
… moving hot data from server requires re-sharding
SELECT first_name, last_nameFROM employees.employees JOIN expenses.reciepts USING (emp_no)WHERE currency = 'USD'
28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Shard Allocation: Multiple Shards per Server
● Idea: Keep several “virtual shards” on each server
● Advantages
– Easier to balance load of servers
… move hot virtual shards to other server
– Improves performance
– Increases availability
29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
SELECT first_name, last_nameFROM employees.employees JOIN expenses.reciepts USING (emp_no)WHERE currency = 'USD'
Shard Allocation: Multiple Shards per Server
● Disadvantage: cross-database queries require rewrite
– Error-prone
– Expensive?
● Queries that go to one database not a problem
30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Shard Allocation: Multiple Shards per Server
● Idea: Add suffix to database name (optionally table name)
employees_N.employees
employees_N.employees_N
● Idea: Keep substitution pattern in query string
SELECT first_name, last_nameFROM {employees.employees} JOIN {expenses.reciepts} USING (emp_no)WHERE currency = 'USD'
31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Shard Allocation: Multiple Shards per Server
class my_mysqli extends mysqli { var $shard_id;
public function query($query, $resultmode = MYSQLI_STORE_RESULT) { $real_query = preg_replace('/\{(\w+)\.(\w+)\}/', “$1_{$this>shard_id}.$2”, $query); return parent::query($real_query, $resultmode); }}
32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data
Query Operations
Meta-Data
Areas that we need to cover
ShardingSharding
33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
MappingSchemes
Range Mapping
Hash Mapping
List Mapping
ShardInformation
Shard ID
Shard Host
Shard Specifics*
MappingMethods
Static Sharding
Dynamic Sharding
Meta Data
* If you use multiple shards per server
34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Methods: Static Sharding
● Idea: Compute shard statically
● Advantages
– Simple
– No extra lookups
– No single point of failure
● Disadvantage
– Lack of flexibility
35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Methods: Static Sharding, in code
class Dictionary { private $emp_no; public function __construct() { ... } public function set_key($emp_no) { $this->emp_no = $emp_no; }
public function get_connection() { $i = $this->shardinfo[$this->emp_no % count($this->shards)]; return new mysqli("p:{$i->host}", $i->user, $i->passwd, $i->db, $i->port); }}
● Dictionary class
● Input: sharding key
● Output: connection
36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Methods: Static Sharding, in code
$HIRED = <<<END_OF_QUERYSELECT first_name, last_name, hire_date, salary FROM employees AS e, salaries AS sWHERE s.emp_no = e.emp_no AND e.emp_no = ? AND CURRENT_DATE BETWEEN s.from_date AND s.to_dateEND_OF_QUERY;
$DICTIONARY = new Dictionary();
$DICTIONARY->set_key($emp_no);$link = $DICTIONARY->get_connection();if ($stmt = $link->prepare($HIRED)) { $stmt->bind_param('i', $emp_no); $stmt->execute(); $stmt->bind_result($first, $last, $hire, $salary); while ($stmt->fetch()) printf("%s %s was hired at %s and have a salary of %s\n", $first, $last, $hire, $salary);}
37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Methods: Dynamic Sharding
● Idea: use a sharding database to keep track of shard locations
● Advantages:
– Easy to migrate shards
– Easy to re-shard
● Disadvantages:
– Complex
● Performance?
38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Dynamic sharding, in code
$FETCH_SHARD = <<<END_OF_QUERYshard selection queryEND_OF_QUERY;
class Dictionary { var $dict; var $emp_no;
public function __construct() { $this->dict = new mysqli('shardinfo.example.com', ...); }
public set_key($emp_no) { $this->emp_no = $emp_no; }
public function get_connection() { $stmt = $this->dict->prepare($FETCH_SHARD)) $stmt->bind_param('i', $this->emp_no); $stmt->execute(); $stmt->bind_result($no, $host, $user, $passwd, $db, $port); $stmt->fetch(); return new mysqli("p:{$host}", $user, $passwd, $db, $port); }}
39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
MappingSchemes
Range Mapping
Hash Mapping
List Mapping
ShardInformation
Shard ID
Shard Host
Shard Specifics*
MappingMethods
Static Sharding
Dynamic Sharding
Meta Data
* If you use multiple shards per server
40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Schemes: Range Mapping
● Most basic scheme
● One row for each range
● Just store lower bound
Shard ID Lower
0 0
1 20000
2 50000
SELECT shard_id, hostname, portFROM shard_ranges JOIN shard_locations USING (shard_id)WHERE key_id = 1 AND 2345 >= shard_ranges.lower_boundORDER BY shard_ranges.lower_bound LIMIT 1;
41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Schemes: Regular Hashing
● Computing a hash from the key
ShardID = SHA1(key) mod N
● Adding (or removing) a shard
… can require moving rows between many shards
… often a lot of rows
42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Schemes: Regular Hashingemp_no=36912emp_no=23456emp_no=43210emp_no=20101
0 1 2 43
HASH(key) mod N
N
N+1
43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Mapping Schemes: Consistent Hashing
● Computing a hash from the key
SHA1(key)
● Adding (or removing) a shard
… only require moving rows from one shard to the new shard
Shard ID Hash
6 08b1286ad1bebe6...
2 1c2d4132144211a...
4 9893238ed75cfc9...
1 989bb9d2bc381f4...
5 cab8c76b85c4e24...
3 eccf30f69fe850f...
44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
HashRing
shard3
shard1
shard2
shard4
Mapping Schemes: Consistent Hashing
emp_no=20101
emp_no=43210
emp_no=23456
emp_no=36912
shard5
45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data
Query Operations
Meta-Data
Areas that we need to cover
ShardingSharding
46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
QueryDispatch
Mechanism
Single/Multi Cast
Handling Reads
Handling Updates
ConnectorCaches
QueryHandling
ShardingKey
Parsing
Applicationprovided
Time (TTL)
On Error
Explicit
Transaction Handling
47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Query Dispatch: Mechanism
● Proxy
– Sharding key extracted from query
– Requires extra hop
● Application level
– Application provides sharding key
– No extra hop
48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Query Dispatch: Query Type
● Read Query
– How do you ensure that it is executing on the right shard?
– How do you ensure that it is not cross-shard?
● Update Query
– Updating an unsharded table – think about consistency
49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Query Dispatch: Handling Transactions
● All statements of a transaction should go to the same session
– Sharding key on start of transaction?
– Is it a read-only or read-write transaction?
● Statements for different transactions can go to different sessions
– How to detect transaction boundaries
● Maintaining the session state
50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Query Dispatch: Handling Transactions
BEGINSELECT salary INTO @s FROM salaries WHERE emp_no = 20101;SET @s = 1.1 * @s;INSERT INTO salaries VALUES (20101, @s);COMMITBEGININSERT INTO ... COMMIT
Sharding key? Ah, there it is!Session state?
Hmm... looks likea read transaction
Oops.. it was awrite transaction!
Transaction done!Clear session state?
New transaction! Different connection?
51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
QueryDispatch
Mechanism
Single/Multi Cast
Handling Reads
Handling Updates
ConnectorCaches
QueryHandling
ShardingKey
Parsing
Applicationprovided
Time (TTL)
On Error
Explicit
Transaction Handling
52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Extracting Sharding Key
● Parsing the query
– Locating the key
– Handling Transactions
● Application-provided sharding key
– Annotating queries
– Separate function in connector
53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Extracting Sharding Key: Parsing Query
● Problem: Locating the key
● No generic parser
– Application specific parser
– Constrain application developer
● Transactions
– Key needed for first statement
INSERT INTO titles(emp_no, title, from_date)SELECT emp_no, '', CURRENT_DATEFROM titles JOIN employees USING (emp_no)WHERE first_name = 'Keith'
BEGINSELECT …INSERT …COMMIT;
54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Extracting Sharding Key: Application Provided
● Idea: Provide key explicitly
● Annotate the statement
● Extend connection manager
– Demonstrated previously
/* emp_no=20101 */ BEGIN;SELECT …INSERT …COMMIT;
…$DICT>set_key($key);$link = $DICT>get_connection();…
55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data
Query Operations
Meta-Data
Areas that we need to cover
ShardingSharding
56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Monitoring the System
● Monitor load of each node
… to see if any node get an unfair number of queries
● Monitor load of each shard (multiple shards per node)
… to see if a shard gets an unfair number of queries
57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Re-balancing the System
● If a instance is hot:
– Move Shard: Move one shard to another instance
● If a shard is hot:
– Split Shard: Split the shard into multiple shards
– Move Shard: Move one of the shards to another instance
● If a shard is cold:
– Merge Shard: Merge a shard with other shards
● Avoid it – very difficult to do on-line
58 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Moving a Shard
● Offline (trivial)
– Bring source and target nodes down
– Copy shard from source to target
– Update dictionary
● Online (tricky)
– We go through it on the following slides
59 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Online Move of Shard
1. Backup shard
– Might be multiple databases
– Note down binary log position
● “Backup position”
– Online backup
● mysqldump
● MySQL Enterprise Backup
2. Restore backup on destination
Dst Src
@Pos
Application
60 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Online Move of Shard
3. Start replication
– Source to target
– Start replication from backup position
– Only replicate shard?Dst Src
@Pos
replicatewilddotable=db_1.*
Application
61 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Online Move of Shard
4. Wait until destination is close enough
5. Write lock on source
LOCK TABLES
6. Note binary log position
– “Catch-up Position”
Dst Src
@Pos
Application
62 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Online Move of Shard
7. Wait for destination to reach catch-up position
START SLAVE UNTIL
MASTER_POS_WAIT Dst Src
Application
@Pos
63 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Online Move of Shard
8. Update sharding database
… will re-direct queries
9. Stop replication
RESET SLAVE
10.Drop old shard
… unless you just wanted a copy
Dst Src
Application
@Pos
64 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Splitting a Shard
● Application dependent
– Change sharding key?
– Change sharding scheme?
● Can be expensive
● You will have to do it
… eventually
65 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Operations: Splitting a Shard
1. Copy shard to new location
– Use on-line move described on previous slides
2. Update sharding database
– Will re-direct queries
3. Remove rows from both shards
– Remove rows that do not belong to the shard
2
3 31
one.example.com two.example.com
66 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Great!Let's Shard!
Wait aminute...
67 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
When to shard?
● Inherently more complex
– Requires careful planning
– Application design?
● Alternatives?
– Functional partitioning?
– Archiving old data?
68 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Preparations for sharding
● Monitor the system
– Types of queries
● What are the join queries
– Access patterns
● What tables are accessed
● Find natural partition keys
– Robust and easy to implement
– Watch out for cross-shard joins
69 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Summary
● What are your goals?
● Do your homework
● Don't be too eager
● Plan
● Develop sharding solution
● Revise the plans
70 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Thanks for attending!
● Questions? Comments?
● Download MySQL!
http://dev.mysql.com
● Read our book!
– Covers replication, sharding, scale-out, and much much more