MongoDB and In-Memory Computing

Post on 13-Apr-2017

211 views 1 download

Transcript of MongoDB and In-Memory Computing

Elevate Your Enterprise Architecture with an In-Memory

Computing Strategy

Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com

In-Memory Computing

How can we process data as fast as possible by leveraging in-memory speed at it’s best?

What are the possibilities if we could?

High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions.

Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.

Source: Investopedia

Speed Matters…

Speed Matters…Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon]

A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group]

A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]

How Fast?

Latency Unit

RAM access 100s ns

SSD access 100s µs

HDD access 10s ms

Normalized to 1 s

~6 min

~6 days

~12 months

Why Now?*Average $/GB

2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125

2005 2010 2013 2015$0

$20

$40

$60

$80

$100

$120

$140

$160

$180

$200

Last 10 Years…

“Generally affordable”

*http://www.statisticbrain.com/average-historic-price-of-ram/

Why Now?

2010 2013 2015$0.00

$2.00

$4.00

$6.00

$8.00

$10.00

$12.00

$14.00

“An Option at Scale”

*Average $/GB

2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125

Last 5 Years…

*http://www.statisticbrain.com/average-historic-price-of-ram/

"This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car.

The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.”

Source: T3.com

The possibilities…

Challenges: Scale

Challenges: Cost Viability

= $34,777/yr. ~$1.74M/yr. for infrastructure to support 100TB

Challenges: Cost Viability

Storage Type Avg. Cost ($/GB) Cost at 100TB ($)

RAM 5.00 500K

SSD 0.47-1.00 47K to 100K

HDD 0.03 3K

http://www.statisticbrain.com/average-cost-of-hard-drive-storage/

http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/

Challenges: DurabilityVolatile Memory

• What happens when things fail, and what data maybe loss?

• How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?

Challenges: Design Still Matters

on RAM

Scenario : ECommerce Modernization InitiativeBusiness Problems Technology Limitation

Customer experience is suffering during high traffic events.

Too expensive to scale system to support spike events.

Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth

Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare.

Lack of mobile customers in Europe and Asia has been attributed to latency issues.

Difficult to extend data architecture globally, so effort is put on hold

Scenario : ECommerce Modernization InitiativeBusiness Problems Technology LimitationBelow industry conversation rate performance has been attributed partly to poor personalization

Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization

“Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical.

Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough

Related to limitations above

Integrating data into data warehouse is slow and hard to maintain

OrdersProduct Catalog

Customer Data: Profile, Sessions,

Carts, PersonalizationInventory

NoSQLRDBMS

Platform Services

eCommerce Datastores Dependent External Data Sources and Integrations

CRM ERP PIM

Data warehouse

BI Tools

Platform API

Scenario : ECommerce Modernization Initiative

Customer Data: Profile, Sessions,

Carts, Personalization

NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier

databases…etc.Legacy:

Mainframe

Product Catalog

Silo Data-sources Problem

SLOW AND POOR SCALABILITY

NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier

databases…etc.Legacy:

Mainframe

Operational Single View

Operational Single ViewCustomer Data:

Profile, Sessions, Carts, Personalization

Product Catalog

Operational Single View

MongoDB Enterprise Data Hub

Operational Single View

{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]

}

{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’}

{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}

Documents in the same product catalog collection in MongoDB

Dynamic Schema

Flexible Data Model: facilitates agile development and continuous delivery methodologies

Scalability: scale-out dynamically as demand grows

Still Agile, Scalable and Simple

High Performance: • More predictable, and lower

latency on less in-memory infrastructure.

In-Memory Storage Engine

Infrastructure Optimization: • Assign a data subset on the

In-Memory SE via Zone Sharding.

• Optimize on cost vs. performance without silos.

.Rich Query Capability: • Full MongoDB Query and

Indexing Support.IN-MEMORY SE NODES WIREDTIGER NODES

WEST EAST

Update

SHARD 4TAG: EAST, WT

Local Read/Write with Strong Consistency

Session Data Geographically Localized, and with In-memory Engine Latency

SHARD 2TAG: WEST, WT

SHARD 3TAG: EAST, IN_MEM

SHARD 1TAG: WEST, IN_MEM

Durability and Fault-Tolerance:

• Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE.

• Full High Availability: automatic fail-over, cross geography.

In-Memory Storage Engine

NoSQLRDBMS

Platform Databases Dependent External Data Sources and Integrations

CRM ERP PIMPartner Sources: Supplier

databases…etc.Legacy:

Mainframe

Operational Unified View

Advance Personalization

1. TRAIN/RE-TRAIN ML MODELS

2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS

3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ETC.

Why ?Speed. By exploiting in-memory optimizations, Sparkhas shown up to 100x higher performance thanMapReduce running on Hadoop.

Simplicity. Easy-to-use APIs for operating on largedatasets. This includes a collection of sophisticatedoperators for transforming and manipulatingsemi-structured data.

Unified Framework. Packaged with higher-level libraries,including support for SQL queries, machine learning,stream and graph processing. These standard librariesincrease developer productivity and can be combined tocreate complex workflows.

Operational Single View

+Spark Connector

• Native Scala connector, certified by Databricks

• Exposes all Spark APIs &

libraries

• Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations

• Locality awareness to reduce data movement

Locality AwarenessCLUSTER MANAGER

TaskTask

TaskTask

Task

DRIVER PROGRAM

SPARK CONTEXT

Operational Single View

+Spark Connector

Blend client data from multiple internal and external sources to drive real time campaign optimization

MongoDB+Spark at China Eastern

180m fare calculations & 1.6 billion searches per day

Oracle database peaked at 200 searches per second.

Radically re-architect their fare engine to meet the required 100x growth in search traffic.

ETL

(Yesterday’s) Data at the Speed of Thought?

BI Connector

BI Connector

db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } }] )

SELECT SUM(price) AS totalFROM orders

Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com

Q&A