MongoDB and In-Memory Computing

Elevate Your Enterprise Architecture with an In-Memory

Computing Strategy

Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com

In-Memory Computing

How can we process data as fast as possible by leveraging in-memory speed at it’s best?

What are the possibilities if we could?

High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions.

Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.

Source: Investopedia

Speed Matters…

Speed Matters…Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon]

A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group]

A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]

How Fast?

Latency Unit

RAM access 100s ns

SSD access 100s µs

HDD access 10s ms

Normalized to 1 s

~6 min

~6 days

~12 months

Why Now?*Average $/GB

2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125

2005 2010 2013 2015$0

Last 10 Years…

“Generally affordable”

*http://www.statisticbrain.com/average-historic-price-of-ram/

Why Now?

2010 2013 2015$0.00

$10.00

$12.00

$14.00

“An Option at Scale”

*Average $/GB

2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125

Last 5 Years…

*http://www.statisticbrain.com/average-historic-price-of-ram/

"This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car.

The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.”

Source: T3.com

The possibilities…

Challenges: Scale

Challenges: Cost Viability

= $34,777/yr. ~$1.74M/yr. for infrastructure to support 100TB

Challenges: Cost Viability

Storage Type Avg. Cost ($/GB) Cost at 100TB ($)

RAM 5.00 500K

SSD 0.47-1.00 47K to 100K

HDD 0.03 3K

http://www.statisticbrain.com/average-cost-of-hard-drive-storage/

http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/

Challenges: DurabilityVolatile Memory

• What happens when things fail, and what data maybe loss?

• How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?

Challenges: Design Still Matters

on RAM

Scenario : ECommerce Modernization InitiativeBusiness Problems Technology Limitation

Customer experience is suffering during high traffic events.

Too expensive to scale system to support spike events.

Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth

Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare.

Lack of mobile customers in Europe and Asia has been attributed to latency issues.

Difficult to extend data architecture globally, so effort is put on hold

Scenario : ECommerce Modernization InitiativeBusiness Problems Technology LimitationBelow industry conversation rate performance has been attributed partly to poor personalization

Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization

“Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical.

Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough

Related to limitations above

Integrating data into data warehouse is slow and hard to maintain

OrdersProduct Catalog

Customer Data: Profile, Sessions,

Carts, PersonalizationInventory

NoSQLRDBMS

Platform Services

eCommerce Datastores Dependent External Data Sources and Integrations

CRM ERP PIM

Data warehouse

BI Tools

Platform API

Scenario : ECommerce Modernization Initiative

Customer Data: Profile, Sessions,

Carts, Personalization

NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier

databases…etc.Legacy:

Mainframe

Product Catalog

Silo Data-sources Problem

SLOW AND POOR SCALABILITY

NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier

Mainframe

Operational Single View

Operational Single ViewCustomer Data:

Profile, Sessions, Carts, Personalization

Product Catalog

MongoDB Enterprise Data Hub

Reference: Metlife Wall Presentation

{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]

{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’}

{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}

Documents in the same product catalog collection in MongoDB

Dynamic Schema

Flexible Data Model: facilitates agile development and continuous delivery methodologies

Scalability: scale-out dynamically as demand grows

Still Agile, Scalable and Simple

High Performance: • More predictable, and lower

latency on less in-memory infrastructure.

In-Memory Storage Engine

Infrastructure Optimization: • Assign a data subset on the

In-Memory SE via Zone Sharding.

• Optimize on cost vs. performance without silos.

.Rich Query Capability: • Full MongoDB Query and

Indexing Support.IN-MEMORY SE NODES WIREDTIGER NODES

WEST EAST

Update

SHARD 4TAG: EAST, WT

Local Read/Write with Strong Consistency

Session Data Geographically Localized, and with In-memory Engine Latency

SHARD 2TAG: WEST, WT

SHARD 3TAG: EAST, IN_MEM

SHARD 1TAG: WEST, IN_MEM

Durability and Fault-Tolerance:

• Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE.

• Full High Availability: automatic fail-over, cross geography.

In-Memory Storage Engine

NoSQLRDBMS

Platform Databases Dependent External Data Sources and Integrations

CRM ERP PIMPartner Sources: Supplier

Mainframe

Operational Unified View

Advance Personalization

1. TRAIN/RE-TRAIN ML MODELS

2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS

3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ETC.

Why ?Speed. By exploiting in-memory optimizations, Sparkhas shown up to 100x higher performance thanMapReduce running on Hadoop.

Simplicity. Easy-to-use APIs for operating on largedatasets. This includes a collection of sophisticatedoperators for transforming and manipulatingsemi-structured data.

Unified Framework. Packaged with higher-level libraries,including support for SQL queries, machine learning,stream and graph processing. These standard librariesincrease developer productivity and can be combined tocreate complex workflows.

+Spark Connector

• Native Scala connector, certified by Databricks

• Exposes all Spark APIs &

libraries

• Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations

• Locality awareness to reduce data movement

Locality AwarenessCLUSTER MANAGER

TaskTask

DRIVER PROGRAM

SPARK CONTEXT

+Spark Connector

Blend client data from multiple internal and external sources to drive real time campaign optimization

MongoDB+Spark at China Eastern

180m fare calculations & 1.6 billion searches per day

Oracle database peaked at 200 searches per second.

Radically re-architect their fare engine to meet the required 100x growth in search traffic.

(Yesterday’s) Data at the Speed of Thought?

BI Connector

db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } }] )

SELECT SUM(price) AS totalFROM orders

Resources for YouSpark Connector• Download: Spark Packages

GitHub • Documentation

• Whitepaper: Turning Analytics into Real-Time Action

• Education:M233: Getting Started with Spark and MongoDB

In-Memory Storage Engine• Download: Enterprise Server• Documentation

BI Connector• Download: BI Connector• Documentation

Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com

MongoDB and In-Memory Computing

Technology

Transcript of MongoDB and In-Memory Computing

How In Memory Computing Changes Everything

Neuromorphic Computing and Emerging Memory Technologies

Memory-Driven Computing - Carnegie Mellon University€¦ · Technology trends enabling Memory-Driven Computing – Converging memory and storage – Byte-addressable persistent memory

Tuning Linux for MongoDB - Percona · MongoDB •Document-oriented database first released in 2009 •Thread per connection model •Non-contiguous memory access pattern •Storage

Distributed computing and memory

MongoDB memory management demystified

An introduction to cloud computing with Amazon Web Services and MongoDB

DESIGNING EFFICIENT MEMORY FOR FUTURE COMPUTING …

TEC Report on SAP ASE: In-Memory Computing: Extreme ...€¦ · • Advanced in-memory computing or memory-optimized capabilities to maintain data within main memory, enabling high-performance

Memory Systems and Memory-Centric Computing Systems ... · Memory Systems and Memory-Centric Computing Systems July 9-13, 2018 Topic 1: Main Memory Trends and Basics Topic 2: Memory

Gain a MongoDB Advantage with the Percona Memory Engine...• Data from the MongoDB query profiler • Specialized MongoDB dashboards for graphing MMAPv1, InMemory, MongoRocks •

In-memory computing

EMERGING MEMORY SYSTEMS - School of Computing

High performance in-memory computing

Energy Efficient Processing-In-Memory€¦ · In-Memory Computing Cluster Memory & Logic Memory & Logic In-Memory Computing Unit Multiplication: 3.1pJ Addition: 0.1pJ ... Write current

Computing-in-Memory with Spintronics

Shared Memory Parallel Computing · CSci 493.65 Parallel Computing Chapter 10 Sharde Memory Parallel Computing Prof. Stewart Weiss Chapter 10 Shared Memory Parallel Computing Preface

In-Memory Computing Advantage

In-memory computing: Memory devices and applications

MOONSHOTS for in-memory computing