Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Data Platforms
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
-
Upload
mongodb -
Category
Technology
-
view
110 -
download
3
description
Transcript of Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
MongoDB & Hadoop:Providing Business Insights
Thomas BoydSenior Solutions Architect, MongoDB
2
What is MongoDB?
The leading NoSQL database
Document Database
Open-Source
General Purpose
3
RDBMS
MongoDB Document Model
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
4
What is Hadoop?
“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”*
*source: hadoop.apache.org
• Large datasets• Analytics• Batch• Map-Reduce
5
Enterprise IT Stack
EDWHadoop
Man
agem
ent
& M
on
ito
rin
gS
ecurity &
Au
ditin
g
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
6
Consideration: Online vs. Offline
• Long-running• High-Latency• Availability is lower
priority
• Real-time• Low-latency• High availability
Online Offlinevs.
7
Consideration: Online vs. Offline
Online Offlinevs.
8
Hadoop is good for…
Risk Modeling Churn AnalysisRecommendation
Engine
Ad TargetingTransaction
AnalysisTrade
Surveillance
Network Failure Prediction
Search Quality Data Lake
9
MongoDB is good for…
360 Degree View of the Customer
Mobile & Social Apps
Fraud Detection
User Data Management
Content Management &
DeliveryReference Data
Product CatalogsMachine to
Machine AppsData Hub
10
MongoDB and Hadoop: Complementary
• “Data Lake”• In-depth analytics
• Real-time systems• Light-weight analytical
workloads
11
Use MongoDB+Hadoop Together
E-Commerce
• Products & Inventory• Real-time
recommendations• Customer profile• Session management• Customer clickstream• Fraud detection
• Transaction history• Clickstream history• Recommendation
model• Fraud modeling
Analysis
MongoDB Connector for
Hadoop
12
Example – Fraud Detection
Payments
• Fraud modeling
Nightly Analysis
MongoDB Connector for
Hadoop
Results Cache
• Online payments processing
3rd Party Data Sources
Fraud Detection
queryonly
query only
13
Customer example – Global Travel Firm
Travel
• Flights, hotels and cars
• Real-time offers• User profiles,
reviews• User metadata
(previous purchases, clicks, views)
• User segmentation• Offer recommendation
engine• Ad serving engine• Bundling engine
Algorithms
MongoDB Connector for
Hadoop
14
Customer example – MetLife
Insurance
• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn
detection
• Customer action analysis
• Churn prediction algorithms
Churn Analysis
MongoDB Connector for
Hadoop
15
Customer example – Criteo
Ad-Serving
• Catalogs and products
• User profiles• Clicks• Views• Transactions
• User segmentation• Recommendation
engine• Prediction engine
Algorithms
MongoDB Connector for
Hadoop
16
• Java Map-Reduce, Stream Map-Reduce, Pig, & Hive access to MongoDB– MongoDB as input
• mongo.job.input.format=com.hadoop.MongoInputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection1
– MongoDB as output• mongo.job.output.format=com.hadoop.MongoOutputFormat• mongo.input.uri=mongodb://my-db:27017/db1.collection2
– Using MongoDB backup files• mongo.job.output.format=com.hadoop.BSONFileOutputFormat• mapred.output.dir=file:///results.bson
– Xxx
What is MongoDB-Hadoop Connector?
17
• Version 1.1.0, July 2013
– Pig support
– Hive support
– Streaming support
– Read/Write MongoDB backups
– Update writes
– Much more….
Enhancing MongoDB-Hadoop Connector
• Version 1.2.0, December 2013
– Apache Hadoop 2.2 support
– Multiple collections as M-R
source
– Multiple mongos support
– Custom splitting support
– Performance improvements
18
• Rich query language
• Native secondary indexes
• Geospatial indexes & search
• Text indexes & search
• Aggregation framework
• Javascript Map-Reduce
• Client-side analytics
MongoDB Native Analytics
19
Resources
White paper: Big Data: Examples and Guidelines for the Enterprise Decision Maker
http://www.mongodb.com/lp/whitepaper/big-data-nosql
Recorded Webinar Series: Thrive with Big Data
http://www.mongodb.com/lp/big-data-series
Recorded Webinar: What’s New with MongoDB Hadoop Integration
http://www.mongodb.com/presentations/webinar-whats-new-mongodb-hadoop-integration Documentation: MongoDB Connector for
Hadoophttp://docs.mongodb.org/ecosystem/tools/hadoop/
Trouble Tickets http://jira.mongodb.org (project = Hadoop Integration)
Subscriptions, support, consulting, training https://www.mongodb.com/products/how-to-buy
Resource Location