MongoDB & Hadoop, Sittin' in a Tree
-
Upload
mongodb -
Category
Technology
-
view
3.290 -
download
0
Transcript of MongoDB & Hadoop, Sittin' in a Tree
K Young - CEO, Mortar
MongoDB + Hadoopsittin’ in a tree
OF THIS SESSION
Overview
Super-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig
SUPER-FAST INTRO
Hadoop
From Google researchBuilt for massive parallelizationBatch (for now)Widely applicable
SUPER-FAST INTRO
Hadoop
Social Graph
Predict
Detect
Genetics
SUPER-FAST INTRO
Hadoop
ON HADOOP
Pig
Less code Expressive codeCompiles to MRInsulates from APIPopular (LinkedIn, Twitter, Salesforce, Yahoo, Stanford University...)
BRIEF, EXPRESSIVE
LIKE PROCEDURAL SQL
Pig
(thanks: twitter hadoop world presentation)
FOR SERIOUS
The Same Script, In MapReduce
Alternatives to Hadoop
Write MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data store
MONGODB NATIVE MAPREDUCE
Alternatives to HadoopMONGODB AGGREGATION FRAMEWORK
Great when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok
MOTIVATIONS
MongoDB + Pig
Data storage and data processing are often separate concerns
Hadoop is built for scalable processing of large datasets
SIMILAR PHILOSOPHY
MongoDB, Pig
Poly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got its
name because Pigs are omnivorous)
MortarFAST INTRO
Open-source code-based dev framework for data, built on Hadoop and Pig
Inspired by Rails
Self-contained, organized, executable projects
> gem install mortar> git clone https://github.com/mortardata/mongo-pig-examples.git
LOADMONGO => PIG
Mongo-Hadoop connector
LOAD 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>' USING com.mongodb.hadoop.pig.MongoLoader();
STOREPIG => MONGO
STORE result INTO 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>'USING com.mongodb.hadoop.pig.MongoStorage( 'update [key1, key2, key3]', '{key1: 1, key2: 1, key3: 1}, {unique:false, dropDups: false}');
What’s my schema?GENERATE IT
Pig is schema-optional.No schema: document#'user'#'name'With schema: user.name
What’s in the collection?CHARACTERIZE IT
Hadoop-based utility describes your collection
• Field name
• Unique value count
• Example value
• Data type
• Example value count
AppendixLINKS
Reference:
http://help.mortardata.com/reference/loading_and_storing_data/MongoDB
Mongo-Hadoop connector
https://github.com/mortardata/mongo-hadoop
@kky@mortardata
help.mortardata.com
Lunch 1:20 – 2:05 Next Sessions at 2:05 5th Floor:
West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDB
West Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4
Juilliard Complex: Business Track: Business Track: How MongoDB Helps Telefonica Digital Accelerate Time to Market
Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session
7th Floor:
Empire Complex: Real-Time Integration Between MongoDB and SQL Databases
SoHo Complex: High Performance, Scalable MongoDB in a Bare Metal Cloud