NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility

NoSQL Meets Relational — the MySQL Ecosystem Gains More FlexibilityColin Charles, Chief Evangelist, Percona [email protected] / [email protected] http://bytebot.net/blog/ | @bytebot on TwitterLondon MySQL Meetup14 March 2017

whoami• Chief Evangelist (in the CTO office), Percona Inc

• Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB Server), as well as the MongoDB ecosystem (Percona Server for MongoDB) + 100% open source tools from Percona like Percona Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc.

• Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation

• Formerly MySQL AB (exit: Sun Microsystems)• Past lives include Fedora Project (FESCO), OpenOffice.org• MySQL Community Contributor of the Year Award winner 2014

SQL• 1970 E.F. Codd paper “A Relational Model of Data for Large Shared

Data Banks”• ANSI standard in 1986• Manage your data in relational systems, with transaction controls,

SQL operators, and better cross-vendor compatibility

NoSQL/Big Data• “not only SQL”• 2004 Google paper “MapReduce: Simplified Data Processing on

Large Clusters” -> Hadoop ~2008 “big data movement”• 2006 Google paper “BigTable: A Distributed Storage System for

Structured Data” -> NoSQL DBs like MongoDB ~2009• Built without having a standardised structured query language• Non-relational, distributed, horizontally scalable, schema-free, easy

replication, eventually consistent

Eventual Consistency (BASE) & ACID• ACID: Atomicity, Consistency, Isolation, Durability

• the base of many transactional engines, like InnoDB/XtraDB• BASE: Basically available, Soft state, Eventual Consistency

• this is based on existence of CAP theorem which states that its impossible for a distributed computer system to simultaneously provide the following 3 guarantees:

• Consistency (all nodes have the same data)• Availability (guarantee that every request receives response)• Partition tolerance (continued operation with failure of parts of system) • Satisfy 2 at the same time, but not 3

NewSQL• 2012 Google paper “Spanner: Google’s Globally-Distributed

Database”• goals: scalable, multi-version, globally distributed, synchronously

replicated• comes with a unique clock API allowing non-blocking reads in past,

lock-free read-only transactions, atomic schema changes, etc.• “We believe it is better to have application programmers deal with

performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”

Is this the landscape?

Why NoSQL?• Couchbase says… (source: https://www.couchbase.com/nosql-resources/

why-nosql)• Support large numbers of concurrent users (tens of thousands, perhaps

millions)• Deliver highly responsive experiences to a globally distributed base of

users• Be always available – no downtime• Handle semi- and unstructured data• Rapidly adapt to changing requirements with frequent updates and new

features

https://www.couchbase.com/nosql-resources/why-nosql

Why NoSQL?• MongoDB says… (source: https://www.mongodb.com/nosql-explained)• When compared to relational databases, NoSQL databases are more

scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:

• Large volumes of rapidly changing structured, semi-structured, and unstructured data

• Agile sprints, quick schema iteration, and frequent code pushes• Object-oriented programming that is easy to use and flexible• Geographically distributed scale-out architecture instead of expensive,

monolithic architecture

https://www.mongodb.com/nosql-explained

Why NoSQL?• DataStax says… (source: http://www.datastax.com/resources/

whitepapers/why-nosql)• In general, NoSQL refers to progressive data management engines

that go beyond legacy relational databases in satisfying the needs of today’s modern business applications. A very flexible data model, horizontal scalability, distributed architectures, and the use of languages and interfaces that are “not only” SQL typically characterize NoSQL technology.

http://www.datastax.com/resources/whitepapers/why-nosql

Can’t MySQL do all that?• Support large numbers of concurrent users• Automatic master failover• Geographically distributed scale-out architecture• Handle regular code pushes and schema changes• Be always available with no downtime• Not only an SQL interface

HandlerSocket• Direct access to InnoDB/XtraDB for

CRUD operations• SQL: 105,000 qps (60% usr, 28%

sys)• memcached: 420,000 qps (8% usr,

88% sys)• HandlerSocket: 750,000 qps (45%

usr, 53% sys)

Dynamic Columns in MariaDB Server 5.3+• Allows you to create virtual columns with dynamic content for each row in

table• Basically a blob with handling functions (GET, CREATE, ADD, DELETE,

EXISTS, LIST, JSON)• Store different attributes for each item (like a web store). Hard to do relationally• In MariaDB 10: name support (instead of referring to columns by numbers,

name it), convert all dynamic column content to JSON array, interface with Cassandra

• https://kb.askmonty.org/en/dynamic-columns/ • INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name",

"value");

https://kb.askmonty.org/en/dynamic-columns/

memcached in MySQL 5.6+• innodb_memcache daemon plugin contains an embedded

memcached with InnoDB as a storage backend• Data accessed using memcache’s key-value protocol• Access different tables at same time; or same table using different

columns as memcache key• PHP? PECL mysqlnd_memache extension• Works with NDBCLUSTER too• Security via SASL

node.js with NDBCLUSTER • end-to-end JavaScript development, from browser to server• native access to storage layer, bypassing SQL layer• Also asynchronous interface• MySQL Cluster 7.3 (http://www.clusterdb.com/mysql/mysql-cluster-

with-node-js)

http://www.clusterdb.com/mysql/mysql-cluster-with-node-js

CassandraSE• Access data in your Cassandra cluster from

MariaDB• CQL is great but now you can just work in

SQL without switching (think ORMs too)• Use cases: sensor data, log collection &

analysis, form versioning, time series data, user activity tracking

• MariaDB users want a globally replicated table that’s fault tolerant?

CONNECT• CONNECT will speak XML, JSON, BSON or even grab data over an

ODBC connection• You can CONNECT to Oracle (via ODBC), join results from

Cassandra (via CassandraSE) and have all your results sit in InnoDB• MariaDB Server 10.0 and greater

MySQL JSON import/export• EXPLAIN in MySQL 5.6 has JSON output now• mysqljsonimport• mysqljsonexport• Export/import a database using the above commands in JSON

format

So… what’s wrong with MySQL?• Out of box high availability is missing• Multi-master replication is missing • Out of box high scalability is missing• Out of the box sharding is missing• Synchronous geographic replication for anything besides

NDBCLUSTER• No SQL interface

Sharding192.168.0.1

User id int(10)

username char(15)password char(15)

email char(50)

192.168.0.2User id int(10)


email char(50)

192.168.0.3User id int(10)


email char(50)

192.168.0.1User id int(10)


email char(50)

192.168.0.2

UserInfo login datetime

md5 varchar(32)guid varchar(32)

Better if INSERTheavy and there’sless frequentlychanged data

How do you Shard: alphabetically? n-users/shard? postal codes?

Sharding frameworks• Roll your own framework• Tungsten Replicator• SPIDER• Tumblr JetPants• MySQL Fabric/MySQL Router• MariaDB MaxScale• Vitess

Online schema changes• Facebook OCS• oak-online-alter• pt-online-schema-change• SoundCloud LHM (Large Hadron Migrator)• Github gh-ost

Galera Cluster (Percona XtraDB Cluster / MariaDB Galera Cluster)• High availability & scalability, with virtually synchronous replication

(transaction committed to all nodes or none), multi-master replication support, parallel replication (apply events on slaves in parallel), automatic node provisioning, data consistency (all slaves synced)

• Sounds rather NewSQL capable, no? :-)• Load balancing with ProxySQL / MariaDB MaxScale• Percona XtraDB Cluster 5.7: engineering within Percona, load

balancing with ProxySQL out of the box, integration with PMM, benefits of all the MySQL 5.7 feature-set (including encryption)

Group replication• Fully synchronous replication (update everywhere), self-healing, with

elasticity, redundancy• Single primary mode supported• MySQL InnoDB Cluster - a combination of group replication, Router, to

make magic!• Recent blogs:• https://www.percona.com/blog/2017/02/24/battle-for-synchronous-

replication-in-mysql-galera-vs-group-replication/• https://www.percona.com/blog/2017/02/15/group-replication-shipped-

early/

Handling Failure• MySQL-MMM• Severalnines ClusterControl• Orchestrator• MySQL MHA• Percona Replication Manager• Tungsten Replicator• 5.6: mysqlfailover, mysqlrpladmin• (MariaDB) Replication Manager

Orchestrator• Reads replication topologies, keeps state, continuous polling• Modify your topology — move slaves around• Nice GUI, JSON API, CLI

MySQL MHA• Like MMM, specialized solution for MySQL replication• Developed by Yoshinori Matsunobu at DeNA• Automated and manual failover options• Topology: 1 master, many slaves• Choose new master by comparing slave binlog positions• Can be used in conjunction with other solutions• http://code.google.com/p/mysql-master-ha/

Load Balancers for multi-master clusters• Synchronous multi-master clusters like Galera require load balancers• HAProxy• Galera Load Balancer (GLB)• MaxScale• ProxySQL

MySQL Router• Routing between applications and any backend MySQL servers• Failover• Load Balancing• Pluggable architecture (connection routing)

MariaDB MaxScale• “Pluggable router” that offers connection & statement based load

balancing• Possibilities are endless - use it for logging, writing to other

databases (besides MySQL), preventing SQL injections via regex filtering, route via hints, query rewriting, have a binlog relay, etc.

• Load balance your Galera clusters

ProxySQL• High Performance MySQL proxy with a GPL license• Performance is a priority - the numbers prove it• Can query rewrite• Sharding by host/schema or both, with rule engine + modification to

SQL + application logic

Facebook DocStore• Dynamic Columns• Indexes• Compile time option, with JSON support naturally• Never made it outside of the Facebook mysql-5.6 tree

MySQL 5.7 JSON• Provide native support for JavaScript applications• JSON data type• optimised for read intensive workloads, parse and validation on INSERT only,

dictionary, in-place updates• all native JSON types (Numbers, strings, bool, objects, arrays) with extensions (Date,

time, timestamp, datetime, other)• schemaless, UTF8MB4• Functions to handle JSON data• SELECT JSON_ARRAY(1, "abc", NULL, TRUE, CURTIME());• Indexing of JSON data• Also spatial GeoJSON functions

No SQL… with mysqlsh!• X Protocol, X Dev API• CRUD and SQL operations, authentication via SASL, streaming commands

via pipelines• mysqlsh: JavaScript, Python, good ‘ole SQL• Connect using a URI - colin@server:33060/testdb• db.city.insert(“ID”, “Name”, “CountryCode”).• You can now use MySQL as a document store (yes, a document collection)

with the mysqlx plugin• The basis of MySQL InnoDB Cluster (with MySQL Router and group

replication)

So… what’s wrong with MySQL?• Out of box high availability is missing -> MHA, MySQL Router,

mysqlfailover• Multi-master replication is missing -> Galera Cluster, Group

Replication• Out of box high scalability is missing -> ProxySQL, MySQL Router• Out of the box sharding is missing -> SPIDER, JetPants, vitess• Synchronous geographic replication for anything besides

NDBCLUSTER -> Galera Cluster, Group Replication• No SQL interface -> mysqlsh, Document Store

Thank you!Colin [email protected] / [email protected]://bytebot.net/blog | @bytebot on twitterslides: slideshare.net/bytebot

NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility

Technology

Transcript of NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility