Post on 13-Apr-2017
Elevate Your Enterprise Architecture with an In-Memory
Computing Strategy
Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com
In-Memory Computing
How can we process data as fast as possible by leveraging in-memory speed at it’s best?
What are the possibilities if we could?
High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…
Speed Matters…Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months
Why Now?*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
2005 2010 2013 2015$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/
Why Now?
2010 2013 2015$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
“An Option at Scale”
*Average $/GB
2015 $4.372013 $5.52010 $12.372005 $1892000 $1,1071995 $30,8751990 $103,8801985 $859,3751980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/
"This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car.
The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.”
Source: T3.com
The possibilities…
Challenges: Scale
Challenges: Cost Viability
= $34,777/yr. ~$1.74M/yr. for infrastructure to support 100TB
Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
Challenges: DurabilityVolatile Memory
• What happens when things fail, and what data maybe loss?
• How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
Challenges: Design Still Matters
on RAM
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology Limitation
Customer experience is suffering during high traffic events.
Too expensive to scale system to support spike events.
Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth
Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has been attributed to latency issues.
Difficult to extend data architecture globally, so effort is put on hold
Scenario : ECommerce Modernization InitiativeBusiness Problems Technology LimitationBelow industry conversation rate performance has been attributed partly to poor personalization
Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization
“Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical.
Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and hard to maintain
OrdersProduct Catalog
Customer Data: Profile, Sessions,
Carts, PersonalizationInventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Scenario : ECommerce Modernization Initiative
Customer Data: Profile, Sessions,
Carts, Personalization
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Product Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY
NoSQLRDBMS CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Single View
Operational Single ViewCustomer Data:
Profile, Sessions, Carts, Personalization
Product Catalog
Operational Single View
MongoDB Enterprise Data Hub
Operational Single View
Reference: Metlife Wall Presentation
{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]
}
{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’}
{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}
Documents in the same product catalog collection in MongoDB
Dynamic Schema
Flexible Data Model: facilitates agile development and continuous delivery methodologies
Scalability: scale-out dynamically as demand grows
Still Agile, Scalable and Simple
High Performance: • More predictable, and lower
latency on less in-memory infrastructure.
In-Memory Storage Engine
Infrastructure Optimization: • Assign a data subset on the
In-Memory SE via Zone Sharding.
• Optimize on cost vs. performance without silos.
.Rich Query Capability: • Full MongoDB Query and
Indexing Support.IN-MEMORY SE NODES WIREDTIGER NODES
WEST EAST
Update
SHARD 4TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2TAG: WEST, WT
SHARD 3TAG: EAST, IN_MEM
SHARD 1TAG: WEST, IN_MEM
Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE.
• Full High Availability: automatic fail-over, cross geography.
In-Memory Storage Engine
NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIMPartner Sources: Supplier
databases…etc.Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN ML MODELS
2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS
3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ETC.
Why ?Speed. By exploiting in-memory optimizations, Sparkhas shown up to 100x higher performance thanMapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on largedatasets. This includes a collection of sophisticatedoperators for transforming and manipulatingsemi-structured data.
Unified Framework. Packaged with higher-level libraries,including support for SQL queries, machine learning,stream and graph processing. These standard librariesincrease developer productivity and can be combined tocreate complex workflows.
Operational Single View
+Spark Connector
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs &
libraries
• Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations
• Locality awareness to reduce data movement
Locality AwarenessCLUSTER MANAGER
TaskTask
TaskTask
Task
DRIVER PROGRAM
SPARK CONTEXT
Operational Single View
+Spark Connector
Blend client data from multiple internal and external sources to drive real time campaign optimization
MongoDB+Spark at China Eastern
180m fare calculations & 1.6 billion searches per day
Oracle database peaked at 200 searches per second.
Radically re-architect their fare engine to meet the required 100x growth in search traffic.
ETL
(Yesterday’s) Data at the Speed of Thought?
BI Connector
BI Connector
db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } }] )
SELECT SUM(price) AS totalFROM orders
Resources for YouSpark Connector• Download: Spark Packages
GitHub • Documentation
• Whitepaper: Turning Analytics into Real-Time Action
• Education:M233: Getting Started with Spark and MongoDB
In-Memory Storage Engine• Download: Enterprise Server• Documentation
BI Connector• Download: BI Connector• Documentation
Dylan TongPrincipal Solutions Architectdylan.tong@mongodb.com
Q&A