Post on 20-Aug-2015
Common MongoDB Use-Cases
Kevin Hanson
Solutions Architect, 10gen
@hungarianhc ~ kevin@10gen.com
Intro to NoSQL and
MongoDB
(completed)
How to Get Started
with your MongoDB
Pilot Project
(August 7th)
Folllow-up:
@hungarianhc
kevin@10gen.com
Today Last 10 years
Emerging NoSQL Space
RDBMS
Data
Warehouse NoSQL
RDBMS
Data
Warehouse
The beginning
RDBMS
Qualities of NoSQL
Workloads
Flexible data models
• Lists, Nested Objects
• Sparse schemas
• Semi-structured data
• Agile Development
High Throughput
• Lots of reads
• Lots of writes
Large Data Sizes
• Aggregate data size
• Number of objects
Low Latency
• Both reads and writes
• Millisecond latency
Cloud Computing
• Run anywhere
• No assumptions about hardware
• No / Few Knobs
Commodity Hardware
• Ethernet
• Local disks
MongoDB was designed for
this
Flexible data models
• Lists, Nested Objects
• Sparse schemas
• Semi-structured data
• Agile Development
High Throughput
• Lots of reads
• Lots of writes
Large Data Sizes
• Aggregate data size
• Number of objects
Low Latency
• Both reads and writes
• Millisecond latency
Cloud Computing
• Run anywhere
• No assumptions about hardware
• No / Few Knobs
Commodity Hardware
• Ethernet
• Local disks
• JSON based
object model
• Dynamic
schemas
• Replica Sets to
scale reads
• Sharding to
scale writes
• 1000’s of shards
in a single DB
• Partitioning of
data
• In-memory
cache
• Scale-out
working set
• Scale-out to
overcome
hardware
limitations
• Designed for
“typical” OS and
local file system
Example customers
User Data Management High Volume Data Feeds
Content Management Operational Intelligence Product Data Management
USE CASES THAT
LEVERAGE NOSQL
High Volume Data Feeds
• More machines, more sensors, more data
• Variably structured
Machine Generated
Data
• High frequency trading Stock Market
Data
• Multiple sources of data
• Each changes their format constantly
Social Media Firehose
High Volume Data Feed
Data
Sources
Asynchronous writes
Flexible document
model can adapt to
changes in sensor
format
Write to memory with
periodic disk flush
Data
Sources Data
Sources Data
Sources
Scale writes over
multiple shards
Operational Intelligence
• Large volume of state about users
• Very strict latency requirements Ad Targeting
• Expose report data to millions of customers
• Report on large volumes of data
• Reports that update in real time
Customer Facing
Dashboards
• Need to join the conversation _now_ Social Media Monitoring
Operational Intelligence
Dashboards
API
Low latency reads Parallelize queries
across replicas and
shards
In database
aggregation
Flexible schema
adapts to changing
input data Can use same cluster
to collect, store, and
report on data
Behavioral Profiles
1
2
3
See Ad
See Ad
4
Click
Convert
{ cookie_id: “1234512413243”, advertiser:{ apple: { actions: [ { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, { purchase: ‘laptop’, time: 354 } ] } } }
Rich profiles
collecting multiple
complex actions
Scale out to support
high throughput of
activities tracked
Indexing and
querying to support
matching, frequency
capping
Dynamic schemas
make it easy to track
vendor specific
attributes
Product Data
• Diverse product portfolio
• Complex querying and filtering
E-Commerce Product Catalog
• Scale for short bursts of high volume traffic
• Scalable, but consistent view of inventory Flash Sales
Product Data
{ sku: “00e8da9b”, type: “MP3”, details: { artist: “John Coltrane”, title: “A love supreme”, length: 123 } }
{ sku: “00a9f3a”, type: “Book”, details: { author: “David Eggers”, title: “You shall know our velocity”, isbn: “0-9703355-5-5” } }
Flexible data model
for similar, but
different objects
Indexing and rich
query API for easy
searching and sorting
db.products. find({ “details.author”: “David Eggers” }). sort({ “title” : -1 });
Content Management
• Comments and user generated content
• Personalization of content, layout News Site
• Generate layout on the fly for each device that connects
• No need to cache static pages
Multi-Device rendering
• Store large objects
• Simple modeling of metadata Sharing
Content Management
{ camera: “Nikon d4”, location: [ -122.418333, 37.775 ] }
{ camera: “Canon 5d mkII”, people: [ “Jim”, “Carol” ], taken_on: ISODate("2012-03-07T18:32:35.002Z") }
{ origin: “facebook.com/photos/xwdf23fsdf”, license: “Creative Commons CC0”, size: { dimensions: [ 124, 52 ], units: “pixels” } }
Flexible data model
for similar, but
different objects
Horizontal scalability
for large data sets
Geo spatial indexing
for location based
searches GridFS for large
object storage
User Data Management
• User state and session management
Video Games
• Scale out to large graphs
• Easy to search and process Social Graphs
• Authentication, Authorization and Accounting
Identity Management
User Game State
Flexible documents
supports new game
features without
schema migration
Sharding enables
whole data set to be
in memory, ensuring
low latency
JSON data model
maps well to
HTML5/JS & Flash
based clients
Easy to store entire
player state in a
single document.
Social Graph
Social Graphs
Documents enable
disk locality of all
profile data for a user
Sharding partitions
user profiles across
available servers
Native support for
Arrays makes it easy
to store connections
inside user profile
IS MY USE CASE A GOOD
FIT FOR MONGODB?
Good fits for MongoDB
Application Characteristic Why MongoDB might be a good fit
Large number of objects to
store
Sharding lets you split objects across multiple
servers
High write or read throughput Sharding + Replication lets you scale read and
write traffic across multiple servers
Low Latency Access Memory Mapped storage engine caches
documents in RAM, enabling in-memory
performance. Data locality of documents can
significantly improve latency over join based
approaches
Variable data in objects Dynamic schema and JSON data model enable
flexible data storage without sparse tables or
complex joins
Cloud based deployment Sharding and replication let you work around
hardware limitations in clouds.
Thanks!