Weather of the Century: Design and Performance

38
Consulting Engineer, MongoDB André Spiegel #MongoDB The Weather of the Century: Design and High Performance

description

This talk walks you through how you can use MongoDB to store and analyze worldwide weather data from the entire 20th century in a graphical application.

Transcript of Weather of the Century: Design and Performance

Page 1: Weather of the Century: Design and Performance

Consulting Engineer, MongoDB

André Spiegel

#MongoDB

The Weather of the Century:Design and High Performance

Page 2: Weather of the Century: Design and Performance

What was the weatherwhen you were born?

Page 3: Weather of the Century: Design and Performance
Page 4: Weather of the Century: Design and Performance

Data Format: Raw and in MongoDB

0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...

{ "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" }}

Page 5: Weather of the Century: Design and Performance

Data Format: Raw and in MongoDB

0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...

{ "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" }}

Station Identifier(»NYC Central Park«)

Page 6: Weather of the Century: Design and Performance

How Big Is It?

• 2.5 billion data points

• 4 Terabyte (1.6k per document)

• “moderately big”

Page 7: Weather of the Century: Design and Performance

How to do this with MongoDB?

Page 8: Weather of the Century: Design and Performance

First Deployment

• A single server with a really big disk

Application mongod

i2.8xlarge

251 GB RAM

6 TB SSD

c3.8xlarge

Page 9: Weather of the Century: Design and Performance

Second Deployment

• A really big cluster where everything is in RAM

Application / mongos

...

100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

c3.8xlarge

Page 10: Weather of the Century: Design and Performance

Second Deployment

• A really big cluster where everything is in RAM

Application / mongos

...

100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

Page 11: Weather of the Century: Design and Performance

Now... how much would you pay?

..

Page 12: Weather of the Century: Design and Performance

Now... how much would you pay?

..

$60,000 / yr

Page 13: Weather of the Century: Design and Performance

Now... how much would you pay?

..

$60,000 / yr

$700,000 / yr

Page 14: Weather of the Century: Design and Performance

Use Cases

• Bulk loading– getting all data into the system

• Latency and throughput for queries– point in space-time– one station, one year– the whole world, once upon a time

• Aggregation and Exploration– warmest and coldest day ever, etc.

Page 15: Weather of the Century: Design and Performance

Bulk Loading: Principles

• On the application side:– batch size– number of client threads– use unordered bulk writes

• On the server side:– Journaling off ( temporarily! )– Index later– In cluster: pre-split, no balancing

Page 16: Weather of the Century: Design and Performance

Bulk Loading: Single Server

batchsize

threads

throughput

Page 17: Weather of the Century: Design and Performance

Bulk Loading: Single Server

batchsize

threads

throughput

8 threads,batch size 100→ 85,000 doc/s

Page 18: Weather of the Century: Design and Performance

Bulk Loading: Single Server

• Settings: 8 threads

batch size 100

• Total loading time: 10 h 20 min

• Documents per second: 70,000

• Index build time: 7 h 40 min (ts_1_st_1)

Page 19: Weather of the Century: Design and Performance

Bulk Loading: Cluster

Page 20: Weather of the Century: Design and Performance

Bulk Loading: Cluster144 threads,batch size 200→ 220,000 doc/s

Page 21: Weather of the Century: Design and Performance

Bulk Loading: Cluster

• Shard Key: Station ID, hashed

• Settings: 10 mongos @ 144 threads

batch size 200

• Total loading time: 3 h 10 min

• Documents per second: 228,000

• Index build time: 5 min (ts_1_st_1)

Page 22: Weather of the Century: Design and Performance

Queries: Point in Space-Timedb.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})

Page 23: Weather of the Century: Design and Performance

Queries: Point in Space-Time

single server cluster0

0.20.40.60.8

11.21.41.6

avg95th99th

ms

max. throughput:

40,000/s 610,000/s(10 mongos)

db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})

Page 24: Weather of the Century: Design and Performance

Queries: One Station, One Yeardb.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})

Page 25: Weather of the Century: Design and Performance

single server cluster0

1000

2000

3000

4000

avg95th99th

ms

Queries: One Station, One Year

max.throughput: 20/s 430/s

(10 mongos)

targeted query

db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})

Page 26: Weather of the Century: Design and Performance

Queries: The Whole World, Once Upon...db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})

Page 27: Weather of the Century: Design and Performance

single server cluster0

2000

4000

6000

8000

avg95th99th

ms

Queries: The Whole World, Once Upon...

max.throughput: 8/s

310/s(10 mongos)

scatter/gather query

db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})

Page 28: Weather of the Century: Design and Performance

Analytics and Exploration

• Analytics means ad-hoc queries for whichwe do not have an index– Find all tornados– Maximum reported temperature

• We cannot just index everything– memory– write performance

Page 29: Weather of the Century: Design and Performance

Analytics: Find all Tornados

db.data.find ({ "presentWeatherObservation.condition" : "99"})

Page 30: Weather of the Century: Design and Performance

Analytics: Find all Tornados

db.data.find ({ "presentWeatherObservation.condition" : "99"})

1 h 28 minSingle Server

Page 31: Weather of the Century: Design and Performance

Analytics: Find all Tornados

db.data.find ({ "presentWeatherObservation.condition" : "99"})

47 sCluster

1 h 28 minSingle Server

Page 32: Weather of the Century: Design and Performance

Analytics: Maximum Temperature

db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } }])

Page 33: Weather of the Century: Design and Performance

Analytics: Maximum Temperature

db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } }])

61.8 °C = 143 °F

Page 34: Weather of the Century: Design and Performance

Analytics: Maximum Temperature

db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } }])

61.8 °C = 143 °F

4 h 45 minSingle Server

Page 35: Weather of the Century: Design and Performance

Analytics: Maximum Temperature

db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } }])

61.8 °C = 143 °F

2 minCluster

4 h 45 minSingle Server

Page 36: Weather of the Century: Design and Performance

Summary: Single Server

Pro

• Cost-effective

• Very good latency for single queries

Con

• Some operations are prohibitive:– Indexing– Table Scans

Page 37: Weather of the Century: Design and Performance

Summary: Cluster

Con

• High cost

Pro

• High throughput

• Very good latency for single queries

• Scatter-gather yields significant speed-up

• Analytics are possible

..

Page 38: Weather of the Century: Design and Performance

Thank you.