MongoDB and the MEAN Stack

Post on 18-Jun-2015

5.751 views 0 download

Tags:

Transcript of MongoDB and the MEAN Stack

Ger Hartnett & Alan Spencer MongoDB Dublin

2

• Fictional story of a startup using MongoDB & MEAN stack to build IoT application

• We’ll take a devops perspective - show you what to watch out for a framework like MEAN

• Tips you can use to help development team focus on the right things when close to production

• Questions • How many from operations? • How many from development?

Overview

3

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Context: IoT & MEAN

Internet of Things

“The rise of device oriented development … new architectural and workflow challenges … distinctly different from … web and mobile development so far.” - Morten Bagai

Big Data => Humongous Data

6

Internet of Things

• Bosch: “IoT brings root and branch changes to the world of business”

• Richard Kreuter's Webinar May 2013

• Earlier bootcamp looked at sharding IoT

Photo by jurvetson - Creative Commons Attribution License - http://www.flickr.com/photos/jurvetson/916142

7

Express - web app framework/router

Angular - browser HTML/JS MVC

Node - javascript application server

MongoDB - the database

MEAN stack

Photo by benmizen - Creative Commons ShareAlike License - http://www.flickr.com/photos/benmizen/9456440635

8

Valeri Karpov - MongoDB Kernel Tools Team http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/ MEAN.io http://mean.io

Learn more about MEAN

9

We invest in technical new hires

Everyone does “bootcamp”

NYC for 2 weeks - product internals

Then work on a longer project 3-4 weeks

In our case: wanted to do a bit of everything, capacity planning, iterate user-stories, MongoDB a component

About MongoDB Bootcamp

The Application

11

!!!!!!!!

• IoT example 3 from Richard’s Webinar

Location based advertising - IoMT

Customer

Advertiser

AdvertiserAdvertiser

12

US1 - customer looks for advertisers near US2 - advertiser wants to see how many customers saw offer US3 - find hot spots where many customers but few advertisers

User Stories - for the application

Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589

exports.all = function(req, res) {!! findQuery = { near: [ Number(req.query.lng), Number(req.query.lat) ],!! ! maxDistance: Number(req.query.dist) };!! Advertiser.geoSearch({kind:"pub"}, findQuery, !! ! function (err, advertisers) {! // error handling!! !! res.jsonp(advertisers);!! ! });!}

13

Document / Model / Controller

Model (advertiser.js) Document{ name: ‘Long Hall’, pos: [-6.265535, 53.3418364], kind: “pub” }

AdvertiserSchema = new Schema({! name: { type: String,! default: ‘’},! pos: [Number],! kind: { type: String,! default: ‘place’},!});

Controller (advertisers.js)Haystack examples sent us in wrong direction initially

14

CRUD interface & Mongoose

CRUD interface !Raised & fixed bug in Mongoose, pull request merged

15

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

16

MongoDB shell scripts 9 advertisers, small area, distance 10km MongoDB has 5 kinds of geo query 3 kinds of geo index geoSearch (haystack) looked much better than others (our 1st mistake) TIP: performance is sensitive to test data & query

US1 Initial Measurements

17

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

The good thing about frameworks is… !they do lot’s of things for developers !!!…and the bad thing about frameworks? !they do lot’s of things for developers

19

To find out what’s happening - debug

Console

Mongoose: clients.findOne({ _id: ObjectId(“…”) })!Mongoose: advertisers.geoHaystack({…[-6.267765, 53.34087]})!

We used Express passport-http to add Basic-Digest auth (client id lookup) It can be hard to figure out what a framework like express/mongoose really does Tip: mongoose.set('debug', true) - detailed logging

20

Find out what’s happening - profiler

db.system.profile.find{"op":"query", "ns":"tings.clients",...!{“op":"command", "command":{"geoSearch"...!

{"op" :"update","ns":"tings.sessions"...!

Tip: The MongoDB profiler shows operations really happening on DB, check with dev

exports.all = function(req, res) {!. . .!! ! ! req.session = null;!! !! res.jsonp(advertisers);!}

10% performance improvement

Where did that come from?

Fixing it is not obvious

Back to the application

22

US1 - customer looks for advertisers near • Need to store

customer location US2 - advertiser wants to see how many customers near

US2 means we built on US1

Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589

Being a startup we decided to take a naive pragmatic approach: • Store all samples • US2 aggregates on-demand

23

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

1 hour of raw samples @ 2k RPS = 7.2M documents !Aggregation on 7.2M raw samples took 1 second on our instances Significant impact • Run every 2 seconds

RPS dropped by factor of 4! (single instance)

24

US2 - Aggregation of Raw Samples

Query Aggregate

Raw Insert

Samples

Aggregate

25

US2 - Pre aggregation

Query Aggregate

Raw Insert

Samples

Query Aggregate

Pre Aggregate

!Update

Samples

Aggregate Aggregate

An MMS type approach Document for advertiser-customer-month !Using update multi-true (more on this later) !Query now only needs to aggregate unique customers

26

MongoDB shell scripts More realistic data - old measurements repeated locations 110k advertisers with clusters in DUB and NYC Performance best for near and nearSphere (2x better than Haystack)

US1 measurements revisited

27

• Express/Mongoose/Node • Customer Lookup • Find ($near) • Save Sample DB • Save Sample File • Preagg=multiple docs (6) • Preagg=multi-update 1 doc

Where does the time go?

28

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

MongoD

29

Deployment

Chrome:PostmanNodeJS

HAproxy

NodeLoad

NodeJS

NodeJS

NodeJS MongoD

30

Scaling

31

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

2 - HAproxy

1 - number of Node.JS

3 - load gen threads/BW

MongoD

33

Pattern: “slam dunk optimization"

Chrome:PostmanNodeJS

HAproxy

NodeLoad

NodeJS

NodeJS

NodeJS MongoD*

3

2

1

34

1. Increase number of Node.JS 2. Increase perf of proxy/balancer instance

HAproxy more balanced than Amazon ELB 3. Tweak Nodeload (generates/measures REST)

Nodeload concurrency 3x Node servers Run Nodeload on same machine as HAproxy

Development recommendation: Postman chrome ext - generates REST / Basic Auth

Performance tips

Back to the application

36

US3 Overview

What are the top 10 hot sales areas? • What is an “area”…? Requirements • Little impact, easy to calculate • Approx. Regular size • Optimal approx. distance - “bounding areas” • Plays nice with sharding Internals of haystack, 2dsphere? Polygon? MGRS?

37

US3 - Hot box - Sales, go sell!

38

• 4QFJ123678 precision level 100m

MGRS - Military Grid Reference System

Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridHawaiiSchemeAARealigned.png

39

MGRS - But at the poles…

39 Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridNorthPole.png

Introducing the ‘box’

x

41

• Reinvented the sphere • Long/lat -> box number • Tailored to specific distance • Boxes are at least 1km • Search in current and 8

neighbouring boxes !

• Filter outside circle in JS • Performed relatively well • Can be used to shard

The “box” - the poor-man’s MGRS

42

Replication

43

Impact of Replication

Secondary reads !Worked for this app !Beware - don’t try this at home!

44

Apply the production notes

Change from default readahead Disable NUMA & THP ext4 or XFS noatime Load test workload on different configurations Instance Store / EBS (PIOPs) SSDs / spinning rust AWS instance types

Recap

46

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started but profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

5 Things we Learned

Next Steps

48

Plan to publish as blog post series and github project !Check blog.mongodb.org !Continue to explore…

Next Steps

49

Hadoop/YARN for aggregations Use “box” to geo-shard Try 2.6 bulk updates Dynamic angular-google-maps with socket-io Implement in another framework (Go/Clojure) to load MongoDB with less hardware Find balance between batch and pre-aggregation (see next slide)

Next Steps - continuation

50

Introduction to MEAN - Valeri Karpov http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/

MEAN.io http://mean.io

Richard Kreuter's webinar - M2M http://www.mongodb.com/presentations/webinar-realizing-promise-machine-machine-m2m-mongodb

Building MongoDB Into Your Internet of Things http://blog.mongohq.com/building-mongodb-into-your-internet-of-things-a-tutorial/

Schema design for time series data (MMS) http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

Learn More & Thank You