Moving Data from MySQL to your Big Data Platform

32
1 Moving Data from MySQL to your Big Data Platform Dave Stokes MySQL Community Manager Oracle Corporation [email protected] @Stoker Slideshare.net/davestokes

Transcript of Moving Data from MySQL to your Big Data Platform

Page 1: Moving Data from MySQL to your Big Data Platform

1

Moving Data from MySQL to your Big Data Platform

Dave StokesMySQL Community Manager

Oracle Corporation

[email protected]@Stoker

Slideshare.net/davestokes

Page 2: Moving Data from MySQL to your Big Data Platform

2

MySQL – Ubiquity

● Every has MySQL Someplace

● Simple no or low cost relational database management system

– Low opportunity cost

– Low operating cost

– Works with everything● Twenty Years Old

Page 3: Moving Data from MySQL to your Big Data Platform

3

What is Big Data?

● Ask ten folks and you will get ten different answers

Page 4: Moving Data from MySQL to your Big Data Platform

4

What is Big Data?

● Ask ten folks and you will get ten different answers

● Ask ten DBAs and you will get at least 11 different answers

And most of those answers will include the word 'Hadoop'

Page 5: Moving Data from MySQL to your Big Data Platform

5

BIG as in

● Non Relational data

– Graphical (relations not relational)

– Document

● Speed of data

– Drinking from the 'fire house'

● Size of Data

– Working set will not fit in memory

● Stream of bits (not really data .. yet)

– Have to store it before you can read it

Page 6: Moving Data from MySQL to your Big Data Platform

6

Options Before Moving off MySQL

● MySQL can easily handle some NoSQL

– DeNA's Handler Socket

– MySQL NoSQL access to InnoDB or NDB storage engines

– JSON

– Columnar Storage Engines

Page 7: Moving Data from MySQL to your Big Data Platform

7

MySQL can do key/value pair data

● Handlersocket

● MySQL NoSQL plugin

– Access InnoDB/NDB storage engine data as key/value pair while allowing simultaneous SQL access on same data

– Uses memcached protocol● Easy to retrofit to older databases● Easy install

– One plugin, one SQL script

Page 8: Moving Data from MySQL to your Big Data Platform

8

MySQL can do key/value pair data

● Handlersocket

● MySQL NoSQL plugin

– Access InnoDB/NDB storage engine data as key/value pair while allowing simultaneous SQL access on same data

– Uses memcached protocol● Easy to retrofit to older databases

● 9X faster than SQL– No SQL parsing or Optimizing– Drinking from the fire hose

Page 9: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9

Page 10: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10

Using telnet to send memcached commands and receive results through the ASCII protocol:● telnet 127.0.0.1

11211● set a11 10 0 9● 123456789● STORED● get a11● VALUE a11 0 9● 123456789● END● quit

Here we are storing '123456789', a string with the length of 9 characters, at key a11. The 10 and 0 are flags

Page 11: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11

Page 12: Moving Data from MySQL to your Big Data Platform

12

JSON

● HTTP Plugin for lightweight clients

– Key/Value

– SQL queries returned in JSON

– CRUD for JSON mapped tables● MySQL 5.7.7 Release Candidate

– JSON data type

– Server side functions, comparator

– Functional Indexes

Page 13: Moving Data from MySQL to your Big Data Platform

13

Columnar storage engines

● Compresses the heck out of your data

– Fast analytics

– Sweat spot very large

– Retains 'MySQL-ness'

● Calpont Infinidb

– Calpont out of business

– MariaDB supporting Infidb

● Infobright

– Very quick

Page 14: Moving Data from MySQL to your Big Data Platform

14

But you need something different

● You need to get data from MySQL to another technology

● Questions:

– Volume – how much stuff

– Frequency – how often, continuous

– What does consuming side require● Encoding, EOL, EOF, escaping, character sets, etc.

– Freshness – Is timing critical, affect analysis

– Staff abilities –

– Process control and oversight

– Intangibles

Page 15: Moving Data from MySQL to your Big Data Platform

15

So how do we get data from MySQL to X???

Page 16: Moving Data from MySQL to your Big Data Platform

16

Flat File – Lowest Common Denominator

● Use SELECT OUTFILE to serialize data into a flat file for later import

– Quick, easy, quality depends on your query, disk space● EXAMPLE

SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_table;

Page 17: Moving Data from MySQL to your Big Data Platform

17

Hadoop fs -put file /filesystem

● Use scp to copy data to data node

● /hadoop dfs -put ~/foo.txt /user/hadoop-user/input/foo.txt

– Puts data HDFS● How many data nodes do you have

– Easy with 1 or a few, not with dozens or more

Page 18: Moving Data from MySQL to your Big Data Platform

18

Squoop

● Squoop is a tool that can connect to MySQL to transfer data to HDFS.

– Can be set into a cron job for automation

● $squoop import --connect jdbc:mysql://localhost:3306/squoop --username root--password secret --table sales –query “SELECT * FROM salesdata WHERE order_date > '01-March-2015'”

– One way

– https://blog.safaribooksonline.com/2013/05/02/importing-data-from-relational-databases-into-hadoop/

Page 19: Moving Data from MySQL to your Big Data Platform

19

Flume

● Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

● The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

Page 20: Moving Data from MySQL to your Big Data Platform

20

ETL Tools – Extract, Transform & Load

● Pentaho

– Pentaho Data Integration

– General purpose● BIRT by Actuate

– General purpose● Apache Pig

– A = load 'passwd' using PigStorage(':')– B = foreach A generate $0 as id– store B into ‘id.out’

Page 21: Moving Data from MySQL to your Big Data Platform

21

MySQL Hadoop Applier

● Similar to MySQL replication

● One way

● Hint: Use date as part of key to purge data

– HDFS is append only, new rows append

– Use an after image time stamp as part of the primary key

Page 22: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22

Page 23: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23

The Hadoop Applier● Uses an API provided by libhdfs, a C library to manipulate

files in HDFS. The library comes precompiled with Hadoop distributions.

● It connects to the MySQL master to read the binary log and then:

● Fetches the row insert events occurring on the master

● Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required

● Appends it to a text file in HDFS.

● Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments.

Page 24: Moving Data from MySQL to your Big Data Platform

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24

Page 25: Moving Data from MySQL to your Big Data Platform

25

Other tools

● Continuent Tungsten

– Continuous feed to HDFS● Oracle Golden Gate

– Continuous feed to HDFS● Amazon Elastic Map Reduce

● Many, Many more

Page 26: Moving Data from MySQL to your Big Data Platform

26

Other Apache projects

● Ambari

● Avro

● Cassandra

● Chukwa

● Hive

● Mahout

● Pig

● Sparrk

● Tex

● Zookeeper

Page 27: Moving Data from MySQL to your Big Data Platform

27

Just Getting Started with Hadoop?

● Apache Bigfoot Project

– Proven components that work together reduced aggravation

– Very well documented

– Easy to setup and run

– Best way to 'get 'feet wet'

– http://Bigfoot.Apache.Org

Page 28: Moving Data from MySQL to your Big Data Platform

28

Tutorials

● Both Cloudera and Horton Works have great tutorials

– Do BOTH tutorials!

Page 29: Moving Data from MySQL to your Big Data Platform

29

ODBC/JDBC

● Just about every programming language and ETL (extract transfer load) program has an ODBC/JDBC connector

– Open Database Connectivity & Java Database Connector● Proven middle ware API that provides a layer

between databases and applications ● May have to 'roll you own' program to serialize data and

format for your Big Data's native feeding format

– Very rare these days!

– If you have to do this, you may be on the bleeding edge

Page 30: Moving Data from MySQL to your Big Data Platform

30

Page 31: Moving Data from MySQL to your Big Data Platform

31

Summary Q&A

● It is easy to move data from MySQL to NoSQL

– Ubiquity of MySQL, starting point

– Many tools that can access MySQL as a Source● Consider data flow issues

– Once, regular, consistent

– Do you need to go back to MySQL

– Collations, character sets may be important

Page 32: Moving Data from MySQL to your Big Data Platform

32

Contact Info

[email protected]

● @Stoker

● Slides at Slideshare.net/davestokes