Real-Time Integration Between MongoDB and SQL Databases

Distributed, fault-tolerant, transactional

Real-Time Integration: MongoDB and SQL Databases

Eugene DvorkinArchitect, WebMD

WebMD: A lot of data; a lot of traffic

~900 millions page view a month~100 million unique visitors a month

How We Use MongoDB

User Activity

Why Move Data to RDBMS?

Preserve existing investment in BI and data warehouse

To use analytical database such as VerticaTo use SQL

Why Move Data In Real-time?

Batch process is slow

No ad-hoc queries

No real-time reports

Challenge in moving data

Transform Document to Relational Structure Insert into RDBMS at high rate

Challenge in moving data

Scale easily as data volume and velocity increase

Our Solution to move data in Real-time: Storm

tem. Storm – open source distributed real-time computation system.

Developed by Nathan Marz - acquired by Twitter

Hadoop Storm

Our Solution to move data in Real-time: Storm

Why STORM?

JVM-based framework

Guaranteed data processing

Supports development in multiple

languages

Scalable and transactional

Overview of Storm cluster

Master Node

Cluster Coordination

run worker processes

Storm Abstractions

Tuples, Streams, Spouts, Bolts and Topologies

Tuples

(“ns:events”,”email:edvorkin@gmail.com”)

Ordered list of elements

Stream

Unbounded sequence of tuples

Example: Stream of messages from message queue

Read from stream of data – Queues, web logs, API calls, mongoDB oplogEmit documents as tuples

Source of Streams

BoltsProcess tuples and create new streams

Apply functions /transformsCalculate and aggregate data (word count!)Access DB, API , etc.Filter dataMap/Reduce

Process tuples and create new streams

Topology

Storm is transforming and moving data

MongoDB

How To Read All Incoming Data from MongoDB?

MongoDB

How To Read All Incoming Data from MongoDB?

Use MongoDB OpLog

What is OpLog?

Replication mechanism in MongoDBIt is a Capped Collection

Spout: reading from OpLog

Located at local database, oplog.rs collection

Operations: Insert, Update, Delete

Name space: Table – Collection name

Data object:

Sharded cluster

Automatic discovery of sharded cluster

Example: Shard vs Replica set discovery

Example: Shard discovery

Spout: Reading data from OpLog

How to Read data continuously from OpLog?

Spout: Reading data from OpLog

How to Read data continuously from OpLog?

Use Tailable Cursor

Example: Tailable cursor - like tail –f

Manage timestamps

Use ts (timestamp in oplog entry) field to track processed records

If system restart, start from recorded ts

SPOUT – Code Example

TOPOLOGY

Working With Embedded Arrays

Array represents One-to-Many relationship in RDBMS

Example: Working with embedded arrays

{_id: 1, ns: “person_awards”, o: { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }}

{ _id: 1, ns: “person_awards”,o: {award: 'Turing Award', year: 1977, by: 'ACM' }}

Example: Working with embedded arrays

public void execute(Tuple tuple) {

.........

if (field instanceof BasicDBList) {

BasicDBObject arrayElement=processArray(field)

......

outputCollector.emit("documents", tuple, arrayElement);

Parse documents with Bolt

{"ns": "people", "op":"i", o : { _id: 1, name: { first: 'John', last: 'Backus' }, birth: 'Dec 03, 1924’}

["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "DEc 03, 1924" ]]

@Override

public void execute(Tuple tuple) {

......

final BasicDBObject oplogObject =

(BasicDBObject)tuple.getValueByField("document");

final BasicDBObject document =

(BasicDBObject)oplogObject.get("o");

......

outputValues.add(flattenDocument(document));

outputCollector.emit(tuple,outputValues);

Write to SQL with SQLWriter Bolt

["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "Dec 03, 1924" ]

]insert into people (_id,name_first,name_last,birth) values

(1,'John','Backus','Dec 03,1924') ,

insert into people_awards

(_id,awards_award,awards_award,awards_by) values (1,'Turing

Award',1977,'ACM'),

insert into people_awards

(_id,awards_award,awards_award,awards_by) values (1,'National

Medal of Science',1975,'National Science Foundation')

@Override public void prepare(.....) {.... Class.forName("com.vertica.jdbc.Driver"); con = DriverManager.getConnection(dBUrl, username,password);

@Override public void execute(Tuple tuple) { String insertStatement=createInsertStatement(tuple); try { Statement stmt = con.createStatement(); stmt.execute(insertStatement); stmt.close();

Write to SQL with SQLWriter Bolt

Topology Definition

TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)

LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());

Topology Definition

StormSubmitter.submitTopology("OfflineEventProcess", conf,builder.createTopology())

Lesson learned

By leveraging MongoDB Oplog or

other capped collection, tailable cursor

and Storm framework, you can build

fast, scalable, real-time data

processing pipeline.

Resources

Book: Getting started with StormStorm Project wikiStorm starter projectStorm contributions projectRunning a Multi-Node Storm cluster tutorialImplementing real-time trending topicA Hadoop Alternative: Building a real-time data pipeline with StormStorm Use cases

Resources (cont’d)

Understanding the Parallelism of a Storm TopologyTrident – high level Storm abstraction A practical Storm’s Trident API Storm online forumMongo connector from 10gen Labs MoSQL streaming Translator in RubyProject source codeNew York City Storm Meetup

Questions

Eugene Dvorkin, Architect, WebMD edvorkin@webmd.net Twitter: @edvorkin LinkedIn: eugenedvorkin

Next Sessions at 2:505th Floor:

West Side Ballroom 3&4: Data Modeling Examples from the Real World

West Side Ballroom 1&2: Growing Up MongoDB

Juilliard Complex: Business Track: MetLife Leapfrogs Insurance Industry with MongoDB-Powered Big Data Application

Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session

7th Floor:

Empire Complex: How We Fixed Our MongoDB Problems

SoHo Complex: High Performance, High Scale MongoDB on AWS: A Hands On Guide

Real-Time Integration Between MongoDB and SQL Databases

Technology

Transcript of Real-Time Integration Between MongoDB and SQL Databases

NoSQL Databases : MongoDB vs Cassandra

MongoDB Reference Manual - no-SQL architecture

Mapping SQL to MongoDB · 2020. 7. 21. · Mapping SQL to MongoDB Converting to MongoDB Terms MYSQL EXECUTABLE ORACLE EXECUTABLE MONGODB EXECUTABLE mysqld oracle mongod mysql sqlplus

SQL Connectivity in a MongoDB World

Databases : SQL-Introduction

HSQ - DATABASES & SQL

No SQL MSATS MongoDB | SolidQ Summit 2014

NoSQL and SQL Databases for Mobile Applications. … - Fotache, Cogean.pdfNoSQL and SQL Databases for Mobile Applications. Case Study: MongoDB versus PostgreSQL Marin FOTACHE, Dragos

MongoDB for the SQL Server Pro

Introdução no sql mongodb java

INTRODUCTION TO MONGODB...INTRODUCTION TO MONGODB Intro to Databases / NOSQL INTUITION NoSQL vs. SQL SQL NoSQL NoSQL vs. SQL SQL NoSQL Table Based Documents, Key-Value pairs, Graph-based,

Transitioning from SQL to MongoDB

No-SQL Databases

NoSQL databases D BM G · NoSQL databases Introduction to MongoDB (part 2) D B M G MongoDB Databases and collections. insert, update, and delete operations. D B M G MongoDB: Databases

Mapping SQL to MongoDB

Webinar: Transitioning from SQL to MongoDB

LeanXcale’s disruptive technology v17...databases. LeanXcale supports queries across MongoDB, HBase, Neo4J, and any SQL RDBMS. Queries can combine the ease of SQL with the power

Databases, SQL and MS SQL Server

Basics of NoSQL Databases - MongoDB · MongoDB क § साथ काया करना स॓जीव बदौरयमा, कॐ० ઋव० फायाफ॓की •MongoDB

Mapping SQL to MongoDB - Amazon S3s3.amazonaws.com/info-mongodb-com/sql_to_mongo.pdf · Below are examples of SQL statements and the analogous statements in MongoDB JavaScript shell