NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
-
Upload
ivan-zoratti -
Category
Technology
-
view
129 -
download
3
Transcript of NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NoSQL Meets Relational — the MySQL Ecosystem Gains More FlexibilityColin Charles, Chief Evangelist, Percona [email protected] / [email protected] http://bytebot.net/blog/ | @bytebot on TwitterLondon MySQL Meetup14 March 2017
whoami• Chief Evangelist (in the CTO office), Percona Inc
• Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB Server), as well as the MongoDB ecosystem (Percona Server for MongoDB) + 100% open source tools from Percona like Percona Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc.
• Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation
• Formerly MySQL AB (exit: Sun Microsystems)• Past lives include Fedora Project (FESCO), OpenOffice.org• MySQL Community Contributor of the Year Award winner 2014
SQL• 1970 E.F. Codd paper “A Relational Model of Data for Large Shared
Data Banks”• ANSI standard in 1986• Manage your data in relational systems, with transaction controls,
SQL operators, and better cross-vendor compatibility
NoSQL/Big Data• “not only SQL”• 2004 Google paper “MapReduce: Simplified Data Processing on
Large Clusters” -> Hadoop ~2008 “big data movement”• 2006 Google paper “BigTable: A Distributed Storage System for
Structured Data” -> NoSQL DBs like MongoDB ~2009• Built without having a standardised structured query language• Non-relational, distributed, horizontally scalable, schema-free, easy
replication, eventually consistent
Eventual Consistency (BASE) & ACID• ACID: Atomicity, Consistency, Isolation, Durability
• the base of many transactional engines, like InnoDB/XtraDB• BASE: Basically available, Soft state, Eventual Consistency
• this is based on existence of CAP theorem which states that its impossible for a distributed computer system to simultaneously provide the following 3 guarantees:
• Consistency (all nodes have the same data)• Availability (guarantee that every request receives response)• Partition tolerance (continued operation with failure of parts of system) • Satisfy 2 at the same time, but not 3
NewSQL• 2012 Google paper “Spanner: Google’s Globally-Distributed
Database”• goals: scalable, multi-version, globally distributed, synchronously
replicated• comes with a unique clock API allowing non-blocking reads in past,
lock-free read-only transactions, atomic schema changes, etc.• “We believe it is better to have application programmers deal with
performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”
Is this the landscape?
Why NoSQL?• Couchbase says… (source: https://www.couchbase.com/nosql-resources/
why-nosql)• Support large numbers of concurrent users (tens of thousands, perhaps
millions)• Deliver highly responsive experiences to a globally distributed base of
users• Be always available – no downtime• Handle semi- and unstructured data• Rapidly adapt to changing requirements with frequent updates and new
features
Why NoSQL?• MongoDB says… (source: https://www.mongodb.com/nosql-explained)• When compared to relational databases, NoSQL databases are more
scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
• Large volumes of rapidly changing structured, semi-structured, and unstructured data
• Agile sprints, quick schema iteration, and frequent code pushes• Object-oriented programming that is easy to use and flexible• Geographically distributed scale-out architecture instead of expensive,
monolithic architecture
Why NoSQL?• DataStax says… (source: http://www.datastax.com/resources/
whitepapers/why-nosql)• In general, NoSQL refers to progressive data management engines
that go beyond legacy relational databases in satisfying the needs of today’s modern business applications. A very flexible data model, horizontal scalability, distributed architectures, and the use of languages and interfaces that are “not only” SQL typically characterize NoSQL technology.
Can’t MySQL do all that?• Support large numbers of concurrent users• Automatic master failover• Geographically distributed scale-out architecture• Handle regular code pushes and schema changes• Be always available with no downtime• Not only an SQL interface
HandlerSocket• Direct access to InnoDB/XtraDB for
CRUD operations• SQL: 105,000 qps (60% usr, 28%
sys)• memcached: 420,000 qps (8% usr,
88% sys)• HandlerSocket: 750,000 qps (45%
usr, 53% sys)
Dynamic Columns in MariaDB Server 5.3+• Allows you to create virtual columns with dynamic content for each row in
table• Basically a blob with handling functions (GET, CREATE, ADD, DELETE,
EXISTS, LIST, JSON)• Store different attributes for each item (like a web store). Hard to do relationally• In MariaDB 10: name support (instead of referring to columns by numbers,
name it), convert all dynamic column content to JSON array, interface with Cassandra
• https://kb.askmonty.org/en/dynamic-columns/ • INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE("column_name",
"value");
memcached in MySQL 5.6+• innodb_memcache daemon plugin contains an embedded
memcached with InnoDB as a storage backend• Data accessed using memcache’s key-value protocol• Access different tables at same time; or same table using different
columns as memcache key• PHP? PECL mysqlnd_memache extension• Works with NDBCLUSTER too• Security via SASL
node.js with NDBCLUSTER • end-to-end JavaScript development, from browser to server• native access to storage layer, bypassing SQL layer• Also asynchronous interface• MySQL Cluster 7.3 (http://www.clusterdb.com/mysql/mysql-cluster-
with-node-js)
CassandraSE• Access data in your Cassandra cluster from
MariaDB• CQL is great but now you can just work in
SQL without switching (think ORMs too)• Use cases: sensor data, log collection &
analysis, form versioning, time series data, user activity tracking
• MariaDB users want a globally replicated table that’s fault tolerant?
CONNECT• CONNECT will speak XML, JSON, BSON or even grab data over an
ODBC connection• You can CONNECT to Oracle (via ODBC), join results from
Cassandra (via CassandraSE) and have all your results sit in InnoDB• MariaDB Server 10.0 and greater
MySQL JSON import/export• EXPLAIN in MySQL 5.6 has JSON output now• mysqljsonimport• mysqljsonexport• Export/import a database using the above commands in JSON
format
So… what’s wrong with MySQL?• Out of box high availability is missing• Multi-master replication is missing • Out of box high scalability is missing• Out of the box sharding is missing• Synchronous geographic replication for anything besides
NDBCLUSTER• No SQL interface
Sharding192.168.0.1
User id int(10)
username char(15)password char(15)
email char(50)
192.168.0.2User id int(10)
username char(15)password char(15)
email char(50)
192.168.0.3User id int(10)
username char(15)password char(15)
email char(50)
192.168.0.1User id int(10)
username char(15)password char(15)
email char(50)
192.168.0.2
UserInfo login datetime
md5 varchar(32)guid varchar(32)
Better if INSERTheavy and there’sless frequentlychanged data
How do you Shard: alphabetically? n-users/shard? postal codes?
Sharding frameworks• Roll your own framework• Tungsten Replicator• SPIDER• Tumblr JetPants• MySQL Fabric/MySQL Router• MariaDB MaxScale• Vitess
Online schema changes• Facebook OCS• oak-online-alter• pt-online-schema-change• SoundCloud LHM (Large Hadron Migrator)• Github gh-ost
Galera Cluster (Percona XtraDB Cluster / MariaDB Galera Cluster)• High availability & scalability, with virtually synchronous replication
(transaction committed to all nodes or none), multi-master replication support, parallel replication (apply events on slaves in parallel), automatic node provisioning, data consistency (all slaves synced)
• Sounds rather NewSQL capable, no? :-)• Load balancing with ProxySQL / MariaDB MaxScale• Percona XtraDB Cluster 5.7: engineering within Percona, load
balancing with ProxySQL out of the box, integration with PMM, benefits of all the MySQL 5.7 feature-set (including encryption)
Group replication• Fully synchronous replication (update everywhere), self-healing, with
elasticity, redundancy• Single primary mode supported• MySQL InnoDB Cluster - a combination of group replication, Router, to
make magic!• Recent blogs:• https://www.percona.com/blog/2017/02/24/battle-for-synchronous-
replication-in-mysql-galera-vs-group-replication/• https://www.percona.com/blog/2017/02/15/group-replication-shipped-
early/
Handling Failure• MySQL-MMM• Severalnines ClusterControl• Orchestrator• MySQL MHA• Percona Replication Manager• Tungsten Replicator• 5.6: mysqlfailover, mysqlrpladmin• (MariaDB) Replication Manager
Orchestrator• Reads replication topologies, keeps state, continuous polling• Modify your topology — move slaves around• Nice GUI, JSON API, CLI
MySQL MHA• Like MMM, specialized solution for MySQL replication• Developed by Yoshinori Matsunobu at DeNA• Automated and manual failover options• Topology: 1 master, many slaves• Choose new master by comparing slave binlog positions• Can be used in conjunction with other solutions• http://code.google.com/p/mysql-master-ha/
Load Balancers for multi-master clusters• Synchronous multi-master clusters like Galera require load balancers• HAProxy• Galera Load Balancer (GLB)• MaxScale• ProxySQL
MySQL Router• Routing between applications and any backend MySQL servers• Failover• Load Balancing• Pluggable architecture (connection routing)
MariaDB MaxScale• “Pluggable router” that offers connection & statement based load
balancing• Possibilities are endless - use it for logging, writing to other
databases (besides MySQL), preventing SQL injections via regex filtering, route via hints, query rewriting, have a binlog relay, etc.
• Load balance your Galera clusters
ProxySQL• High Performance MySQL proxy with a GPL license• Performance is a priority - the numbers prove it• Can query rewrite• Sharding by host/schema or both, with rule engine + modification to
SQL + application logic
Facebook DocStore• Dynamic Columns• Indexes• Compile time option, with JSON support naturally• Never made it outside of the Facebook mysql-5.6 tree
MySQL 5.7 JSON• Provide native support for JavaScript applications• JSON data type• optimised for read intensive workloads, parse and validation on INSERT only,
dictionary, in-place updates• all native JSON types (Numbers, strings, bool, objects, arrays) with extensions (Date,
time, timestamp, datetime, other)• schemaless, UTF8MB4• Functions to handle JSON data• SELECT JSON_ARRAY(1, "abc", NULL, TRUE, CURTIME());• Indexing of JSON data• Also spatial GeoJSON functions
No SQL… with mysqlsh!• X Protocol, X Dev API• CRUD and SQL operations, authentication via SASL, streaming commands
via pipelines• mysqlsh: JavaScript, Python, good ‘ole SQL• Connect using a URI - colin@server:33060/testdb• db.city.insert(“ID”, “Name”, “CountryCode”).• You can now use MySQL as a document store (yes, a document collection)
with the mysqlx plugin• The basis of MySQL InnoDB Cluster (with MySQL Router and group
replication)
So… what’s wrong with MySQL?• Out of box high availability is missing -> MHA, MySQL Router,
mysqlfailover• Multi-master replication is missing -> Galera Cluster, Group
Replication• Out of box high scalability is missing -> ProxySQL, MySQL Router• Out of the box sharding is missing -> SPIDER, JetPants, vitess• Synchronous geographic replication for anything besides
NDBCLUSTER -> Galera Cluster, Group Replication• No SQL interface -> mysqlsh, Document Store
© 2017 Percona36
Thank you!Colin [email protected] / [email protected]://bytebot.net/blog | @bytebot on twitterslides: slideshare.net/bytebot