Post on 15-Jul-2015
GHX proprietary information: Please do not copy or distribute
GHX Use of MongoDBBill Perkins: Director of Data and Architecture
Jeff Sherard: Manager of Database Services
1
What is a GHX?
• 500K+ documents a day– Scaling to many millions
• Saved $5.5 Billion dollars over last 5 years
Healthcare Providers
Healthcare Providers
Healthcare Providers
Healthcare Providers
1000’s
Healthcare Providers
Healthcare Providers
Healthcare Suppliers
100’s
85% of US marketPurchase OrdersPurchase Order AcknowledgementAdvanced Shipment NoticeInvoiceContractItem MasterVendor Master…
GHX proprietary information: Please do not copy or distribute
Yah so, tell me what the Mongo selection process felt like?
4
Weighted Selection Criteria
8
Criteria Weight
Scalability
Development throughput
Flexible Data Model
Developer Impact
Database Services Impact
3
1
3
2
2
Applicability of Different NoSQL Data Models
Characteristics (1-4, lower is better)
Data Model
Pe
rfo
rman
ce
Scal
abili
ty
Flex
ibili
ty
Co
mp
lexi
ty
Ind
exSu
pp
ort
Examples Applicability
Key-Value Stores
1 3 1 1 4
Oracle NoSQL , Redis, Riak, Membrain, LevelDB, Voldemort, Castle, Aerospike
LOW Stores a key associated with a binary. Good for cache. We will need multiple keys on objects,
Column Store
1 1 2 3 3Cassandra, HBase,Hypertable, Acuru
MEDIUM Very scalable, good tooling support, no built-in second index, steep learning curve
Document Store
2 2 1 1 1MongoDB, Couchbase, RavenDB, RethinkDB, CouchDB
HIGH Fields everything, flexible model, supports multiple indexes. Low learning. Matches our model.
Graph Database
2 3 4 4 2Neo4j, InfiniteGraph, OrientDB
LOW Very good at quick storage. Queries are harder. Horizontal scale is questionable. Our data is not graph oriented.
9
DBs that made first cut
Platform Applicability Community / Market Traction
Oracle noSQL
LOW Low traction, potential community is high due to Oracle. This is BerkleyDB with an Oracle wrapper. May be good for cache.
Cassandra MEDIUM High traction, good community. Companies are using this to solve very large problems. Scales well. Does not support secondary indexes natively. Good tool support. Good integration with ETL / BI tools.
Hbase MEDIUM High traction, good community. No one is using this successfully for OLTP problems. Very complex.
MongoDB HIGH High traction, high community. Lots of features including many types of indexes. Leading in deployments and growing fastest. Developer friendly. Tooling for operations is command line.
10
GHX proprietary information: Please do not copy or distribute
So there, what was it like when Mongo battled Cassandra?
11
Performance Test Results• Mongo config:
– Primary and 2 replicas
• Cassandra config:
– 3 node ring
• Mongo outperformed Cassandra in each test
• Different architectures: Mongo performs best in read heavy scenarios, while Cassandra is the opposite.
• Mongo’s latency was better on each test.
13
0
5000
10000
15000
20000
25000
30000
35000
40000
99/1 90/10 70/30 50/50 30/70
OPS by Read/Write %
MongoDB Datastax
0
1
2
3
4
5
6
7
99/1 90/10 70/30 50/50 30/70
Latency by Read/Write %
MongoDB Datastax
Mongo: Index test results• Added 3 secondary indexes to mongo collection
• YCSB test reflects the cost of index management in writes, but does not use secondary indexes for reads
14
99/1 90/10 70/30 50/50 30/70
Primary Key Only 36274 29244 21042 17381 15270
Add 3 Indexes 31151 23297 15158 11742 9142
Performance Drop 14% 20% 28% 32% 40%
0
5000
10000
15000
20000
25000
30000
35000
40000
Op
era
tio
ns
pe
r Se
con
d
0
1000
2000
3000
4000
5000
6000
7000
8000
99/1 30/70 50/50 70/30 90/10
Read/write ratio
PerformancePerformance w/ indexes
Thro
ugh
pu
tLa
ten
cy
Mongo: Sharding test results
• 90/10 read / write ratio
• This was not the same configuration as previous results. We are interested in the gain, not the raw value
15
Application Server(YCSB)
MongoS (router)
MongoD(shard 1)
MongoD(shard 2)
Application Server(YCSB)
MongoS (router)
Shards Ops/Second Gain %
1 12K ----
2 22.6K 88.3%
Rolling up Results to Match our Requirements
16
Criteria Weight Mongo Cassandra
Scalability
Development throughput
Flexible Data Model
Developer Impact
Database Services Impact
3
1
3
2
2
GHX proprietary information: Please do not copy or distribute
Yah so, what does your data look like?
17
Event Driven Work Broker
Work Broker
Worker A
Worker A
Worker A
Worker A
Worker A
Worker B
Worker A
Worker A
Worker C
2. Broker assigns it toexactly one worker
Event A
1. Reads events from Mongo
Work Event B
3. Work Log and Event B generated by Worker A
4. Repeat…
Distributed Event Broker
• Many readers reading from mongo trying to assign to workers on as fast as they can take the work
…
Assignment Approach 1: FindandModify
• Steps
– FindandModify
• Find event of certain type that are not assigned
• Mark them as assigned
• Do this BATCH_SIZE times
• Result
– Really high WriteLock% (90+)
– Upset stakeholder
– **mongo did not recommend this
Assignment Approach 2: N Updates and a find
• Steps
– Update event of certain type that are not assigned
• Mark them as assigned with batch ID
• Do this BATCH_SIZE times
– Find all with batch ID
• Result
– Lots of loss # modify > # find
– Upset stakeholder
– Only tested this, never used in prod
Assignment Approach 3: Find-Modify-Find
• Steps– Find event _ID where event unassigned and of
certain type limit BATCH_SIZE
– Modify set of events found by writing a batch ID to assign it
– Find the modified events by batch ID
• Result– WriteLock% went down
– Lots of loss• Count of # found > # modified > # found
Assignment Approach 4: Don’t do it
• Steps– Assign workers a range of events on startup
– Mark every event with a random number token (1-1000)
– Worker updates an event with token that is in its range
• Results (moving to this now)– No conflict or loss expected
– WorkerManager rebalances ranges, monitors health of workers
Throughput Challenges
• 25MM customer documents * 20 activities per doc * 4K per activity = 2TB per day throughput
• 90,000 operations per second at peak load
Scaling
• 7 to 10 shards (21-30 Mongod)– 9,000 to 12,700 db operations per shard per second
Execution
< 1 day
Audit
30 days
Archive
quite a while
GHX proprietary information: Please do not copy or distribute
Yah good, how do you deploy this thing?
26
GHX proprietary information: Please do not copy or distribute
So there, how does ops feel about Mongo after being in prod
for 9 months?
29