Real-world queuing strategies - Nosql Search Roadshow
Transcript of Real-world queuing strategies - Nosql Search Roadshow
![Page 1: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/1.jpg)
“Hold this for a moment” Real-world queuing strategies
David Dawson Director of Technology
Marcus Kern VP, Technology
![Page 2: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/2.jpg)
Powerful mobile technology that puts ideas in motion – an mCMS and a mobile campaign platform available for both: self-service or managed service.
VELTI PLATFORMS
The complete mobile engagement solution. We help brands progress along their mobile roadmap, from fast growth pilots to optimisation of current assets and revenue growth.
MOBILE MARKETING
Cultivate relationships that build excitement through fun and interesting experiences they want to participate in. From on-pack promos to premium competitions.
LARGE SCALE CAMPAIGNS
Rewards based performance marketing, aimed at increasing customer lifetime value, revenue growth and acquisition of insightful consumer analytics. We provide both the programme and loyalty fulfillment.
LOYALTY MCRM
We build your mCRM engine that builds opt in customers into a mobile database and pushes it through the measurement tool so we can show you what you spend and what you gain.
The complete mobile advertising solution. Our own ad network & exchange, equipped with dynamic “real time” analytics of all your mobile activity using our Visualise tool, all under one roof.
VELTI MEDIA
Our instantly available predictive analysis and personalisation tool provides a single view of your brand from all your dispersed data points and overlays sales data in real time so you can manage your mobile campaigns “in action”.
BRAND BLOTTER
WHAT DO WE DO ?
![Page 3: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/3.jpg)
3 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Velti Technologies • Erlang • RIAK & leveldb • Redis • Ubuntu
• Ruby on Rails • Java • Node.js • MongoDB • MySQL
![Page 4: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/4.jpg)
4 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Two parts to this story • Queuing Strategies
• Optimizing hardware
![Page 5: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/5.jpg)
5 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Background • Building apps rely on messaging + queuing > 8 years • Multiple iterations, also combining NoSQL to help scale • Started with Mysql
– Operations friendly – Although struggled to scale as we our throughputs increased
• What about existing solutions ( e.g. RabbitMQ ) – Features missing until recently – More confidence from a iterative approach of our existing design
![Page 6: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/6.jpg)
6 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a Robust Queue
Q Workers Q
• Reliable + Replicated • Scheduled jobs + Retries • High performance ( >10,000 tx/sec ) • Multiple producers and consumers ( > 100 ) • Easy to debug + Operations friendly
Sender Receiver
Q
Producers Consumers
![Page 7: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/7.jpg)
7 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
~
Test Harness
Connection pool
Reporting thread
Configuration
Producer threads
Consumer threads
Harness
Mysql + lock
Mysql + Redis
Mysql
Implementations
• Built using Jruby – Fast ( Hotspot ) – Threads without the GIL
• Pluggable design – Multiple implementations
• Configurable variables – Batch size – Number of Producers and
Consumers – Number of itterations
• Reporting
![Page 8: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/8.jpg)
8 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1 • Mysql only ( v5.5 ) percona • Innodb ( xtradb ) • Replication • 1 x table ( ‘queue’ )
– Id ( primary key, auto_inc, int ) – Worker_id ( int ) – Process_at ( datetime ) – Payload ( varchar ) – Index ( worker_id, process_at )
• Dedicated hardware – Harness: HP DL365 ( 12 cores ) – Mysql: HP DL365 ( 12 cores )
![Page 9: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/9.jpg)
9 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )
Update queue set worker_id=123 where worker_id=0 and process_at <= now() limit 10
Select * from queue where worker_id=123
Update queue set worker_id=-1 where id=2
Update queue set worker_id=-1 where id=3
Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )
Multiple write operations Batched update / read operations
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 0 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 0 2012-01-01 01:01:00 { json }
3 0 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 123 2012-01-01 01:01:00 { json }
3 123 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 -1 2012-01-01 01:01:00 { json }
3 123 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 -1 2012-01-01 01:01:00 { json }
3 -1 2012-01-01 01:01:00 { json }
![Page 10: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/10.jpg)
10 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
0"
500"
1,000"
1,500"
2,000"
2,500"
3,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1,"Producers:"1,"Consumers:"1"Pop"on"Pop"off"
Total"popped"ON:"500000"BB"Total"popped"OFF:"500000"
![Page 11: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/11.jpg)
11 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
0"
2,000"
4,000"
6,000"
8,000"
10,000"
12,000"
14,000"
16,000"
18,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac2on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1,"Producers:"10,"Consumers:"10"Pop"on"Pop"off"
Total"popped"ON:"500000"DD"Total"popped"OFF:"500000"
![Page 12: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/12.jpg)
12 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
0"
5,000"
10,000"
15,000"
20,000"
25,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1,"Producers:"30,"Consumers:"30"Pop"on"Pop"off"
Total"popped"ON:"500000"BB"Total"popped"OFF:"136587"
![Page 13: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/13.jpg)
13 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
0"
5,000"
10,000"
15,000"
20,000"
25,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
Total"popped"ON:"304321"BB"Total"popped"OFF:"35579"
![Page 14: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/14.jpg)
14 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #1
0"
5000"
10000"
15000"
20000"
25000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac/on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1"queue"'pop<on'"with"varying"Producer"levels"
1"Producer"10"Producers"30"Producers"50"Producers"
![Page 15: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/15.jpg)
15 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #2 • Same Mysql setup as implementation #1 • Although we wrap a lock around the point of
most contention ( batch update ) – Select get_lock( str, timeout ) – Select release_lock( str )
![Page 16: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/16.jpg)
16 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #2 ( mysql + Lock )
Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )
Update queue set worker_id=123 where worker_id=0 and process_at > now() limit 10
Select * from queue where worker_id=123
Update queue set worker_id=-1 where id=2
Update queue set worker_id=-1 where id=3
Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )
Multiple write operations Batched update / read operations
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 0 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 0 2012-01-01 01:01:00 { json }
3 0 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 123 2012-01-01 01:01:00 { json }
3 123 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 -1 2012-01-01 01:01:00 { json }
3 123 2012-01-01 01:01:00 { json }
id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }
2 -1 2012-01-01 01:01:00 { json }
3 -1 2012-01-01 01:01:00 { json }
Select get_lock( ‘queue’,-1 )
Select release_lock( ‘queue’ )
![Page 17: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/17.jpg)
17 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #2
0"
5,000"
10,000"
15,000"
20,000"
25,000"
30,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"2,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
Total"popped"ON:"500000"BB"Total"popped"OFF:"500000"
![Page 18: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/18.jpg)
18 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #2
0"
5000"
10000"
15000"
20000"
25000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac/on
s"per"se
cond
"
Time"[mm:ss]"
Mode"2"queue"'pop<on'"with"varying"Producer"levels"
1"Producer"10"Producers"30"Producers"50"Producers"
![Page 19: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/19.jpg)
19 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #3 • Same Mysql setup as implementation #1 • 1 x table ( ‘queue’ )
– Id ( primary key, auto_inc, int ) – Status( enum ) – Process_at ( datetime ) – Payload ( varchar )
• 1 x Redis using the following data structures – SortedSet ( range query, schedule jobs ) – Queue ( fast push / pop sematics )
• Dedicated hardware – Harness: HP DL365 ( 12 cores ) – Mysql + Redis: HP DL365 ( 12 cores )
![Page 20: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/20.jpg)
20 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #3
Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )
Update queue set status=‘working’ where id in ( 2,3 )
Update queue set status=‘finished’ where id = 2 Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’)
Multiple write operations Batched update / read operations
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
2 ‘pending’ 2012-01-01 01:01:00 { json }
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
2 ‘pending’ 2012-01-01 01:01:00 { json }
3 ‘pending’ 2012-01-01 01:01:00 { json }
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
2 ‘working’ 2012-01-01 01:01:00 { json }
3 ‘working’ 2012-01-01 01:01:00 { json }
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
2 ‘finished’ 2012-01-01 01:01:00 { json }
3 ‘working’ 2012-01-01 01:01:00 { json }
id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }
2 ‘finished’ 2012-01-01 01:01:00 { json }
3 ‘finished’ 2012-01-01 01:01:00 { json }
RedisQueue.push( 2, ‘2012-01-01 01:01:00’ )
RedisQueue.push( 3, ‘2012-01-01 01:01:00’ )
RedisQueue.pop( ‘2012-01-01 01:01:00’ , 10 )
Update queue set status=‘finished’ where id = 2
![Page 21: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/21.jpg)
21 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #3
queue_name = ‘queue’ + scheduled_time
rpush( queue_name, id_of_mysql_insert )
zadd( ‘q_set’, scheduled_time, queue_name )
queue_name = redis.zrangebyscore('q_set', 0, current_time, :limit => [0,1] )
Item = lpop( queue_name )
If item.nil? Zrem( ‘q_set’, queue_name )
• Redis Sorted Sets O(log N ) complexity – Zadd/ zrangebyscore /zrem – Used to store the name of the queue and
when it should be processed • Redis Queues O(1) complexity
– Rpush / lpop – User to store the items that need to be
processed
RedisQueue.push
RedisQueue.pop
future
Queues Sorted Set
now
![Page 22: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/22.jpg)
22 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #3
0"
2,000"
4,000"
6,000"
8,000"
10,000"
12,000"
14,000"
16,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac2on
s"per"se
cond
"
Time"[mm:ss]"
Mode"3,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
Total"popped"ON:"500000"DD"Total"popped"OFF:"500000"
![Page 23: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/23.jpg)
23 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Implementation #3
0"
5,000"
10,000"
15,000"
20,000"
25,000"
30,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"3,"Producers:"100,"Consumers:"50"Pop"on"Pop"off"
Total"popped"ON:"1000000"BB"Total"popped"OFF:"1000000"
![Page 24: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/24.jpg)
24 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Comparison
0"
5,000"
10,000"
15,000"
20,000"
25,000"
30,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"1,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
0"
5,000"
10,000"
15,000"
20,000"
25,000"
30,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"2,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
0"
5,000"
10,000"
15,000"
20,000"
25,000"
30,000"
00:00"
00:15"
00:30"
00:45"
01:00"
01:15"
01:30"
01:45"
02:00"
02:15"
02:30"
02:45"
03:00"
03:15"
03:30"
03:45"
04:00"
04:15"
04:30"
04:45"
05:00"
Transac0on
s"per"se
cond
"
Time"[mm:ss]"
Mode"3,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"
![Page 25: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/25.jpg)
25 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Summarize Results
• Simplest Option • 1 Moving part • Easy to diagnose • Tried and tested
• Prone to deadlocking • Contention • Slowest solution
Implementation #1
• Less deadlocks • Easy to diagnose • Removed Contention • Big speed boost
• Still deadlocks ( rare ) • Yet to be proven in
production
Implementation #2
+
![Page 26: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/26.jpg)
26 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Summarize Results
• Fastest • No Contention • Predictable • Tried and tested • Dynamic queues
• Most complicated • Recovery scripts • Multiple moving parts • Transactions are hard
Implementation #3
• Currently limited by speed of Mysql • Try a distributed key-value store
– Recovery? – Eventual consistency?
Future Considerations
+
+
![Page 27: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/27.jpg)
27 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Two parts to this story • Queuing Strategies
• Optimizing hardware
![Page 28: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/28.jpg)
28 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Hardware optimisation
Photograph and Logo © 2010 Time Out Group Ltd.
• Observed ‘time outs’ App ó RIAK DB
• Developed sophisticated balancing mechanisms to code around them, but they still occurred
• Especially under load
![Page 29: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/29.jpg)
29 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Nature of the problem • Delayed responses of up to 60 seconds! • Our live environment contains:
– 2 x 9 App & RIAK Nodes – HP DL385 G6 – 2 x AMD Opteron 2431 (6 cores)
• We built a dedicated test environment to get to the bottom of this:
– 3 x App & RIAK Nodes – 2 x Intel Xeon (8 cores)
Looking for contention…
![Page 30: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/30.jpg)
30 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Contention options • CPU
• Disk IO
• Network IO
Less than 60%
utilisation
?
?
• Got SSD (10x), Independent OEM • RIAK (SSD) / Logs/OS (HDD)
• RIAK I/O hungry • Use second NICs/RIAK VLAN
![Page 31: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/31.jpg)
31 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Memory contention / NUMA • Looking at the 60% again
– Non-Uniform Memory Access (NUMA) is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor. - Wikipedia
• In the 1960s CPUs became faster then memory • Race for larger cache memory & Cache algorithms • Multi processors accessing the same memory leads to
contention and significant performance impact • Dedicate memory to processors/cores/threads • BUT, - most memory data is required by more then one
process. => cache coherent access (ccNUMA) • Linux threading allocation is challenged • Cache-coherence attracts significant overheads, especially
for processes in quick succession!
![Page 32: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/32.jpg)
32 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Gain control! - NUMACTL • Processor affinity – Bind a particular process type to a specific processor • Instruct memory usage to use different memory banks • For example: numactl --cpunodebind 1 –interleave all erl• Get it here: apt-get install numactl
• => No timeouts • => 20%+ speed increase when running App & RIAK • => Full use of existing hardware
![Page 33: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/33.jpg)
33 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
How about load testing ? • Our interactive voting platform required load testing • Requiring 10,000’s connections / second • Mixture of Http / Https • Session based requests
– Login a user – Get a list of candidates – Get the balance – Vote for a candidate if credit available
![Page 34: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/34.jpg)
34 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Load testing - lessons learned WAN
FW
LAN
LB
Servs
ASA5520 limited at 3-4k new connections per second ⇒ Replaced with ASA5585
(Spec 50k/s, Tested 20k/s)
HAProxy on 2xDL120 ⇒ # of Linux procs 1 -> 4 ⇒ Added conn. Throttle 4k/
server
6 x DL360 G6 ⇒ Apache Cipher reduction ⇒ K/A consumed all threads
-> reduced & disabled ⇒ Ulimit per proc 1k -> 65k
nn x AWS ⇒ Tsung SSL
SessionID bug
![Page 35: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/35.jpg)
35 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Load testing Tools • ab ( apache bench )
– Easy to use – Lots of documentation – Hard to distribute ( although we did find “bees with machine guns” )
• https://github.com/newsapps/beeswithmachineguns )
– We experienced Inconsistent results with our setup – Struggled to create the complex sessions we required
• httperf – Easy to use – Lots of documentation – Hard to distribute ( no master / slave setup )
![Page 36: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/36.jpg)
36 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Load testing Tools • Write our own
– Will do exactly what we want – Time
• Tsung – Very configurable – Scalable – Easier to distribute – Already used in the department – Steep learning curve – Setting up a large cluster requires effort
![Page 37: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/37.jpg)
37 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Tsung • What is Tsung?
– Open-source multi-protocol distributed load testing tool – Written in erlang – Can support multiple protocols: HTTP / SOAP / XMPP / etc.
– Support for sessions – Master slave setup for distributed load testing
– Very configurable – Scalable – Easier to distribute – Already used in the department – Steep learning curve – Setting up a large cluster requires effort
![Page 38: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/38.jpg)
38 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Distributed Tsung • Although Tsung provided us almost everything we needed • We still had to setup lots of instances manually • This was time consuming / error prone • We needed a tool to alleviate and automate this • So we built……
![Page 39: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/39.jpg)
39 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Ion Storm • Tool to setup a Tsung cluster on multiple EC2 instances • With co-ordinated start stop functionality • Written in ruby, using the rightscale gem:
rightaws.rubyforge.org • Which uploads the results to S3 after each run
![Page 40: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/40.jpg)
40 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Performance • From a cluster of 20 machines we achieved
– 20K HTTPS / Sec – 50K HTTP / Sec – 12K Session based request ( mixture of api calls ) / Sec
• Be warned though – Can be expensive to run through EC2 – Limited to 20 EC2 instances unless you speak to Amazon nicely – Have a look at spot instances
![Page 41: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/41.jpg)
41 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Open Sourced! • Designed and built by two Velti engineers
– Ben Murphy
– David Townsend
• Try it out:
[email protected]:mitadmin/ionstorm.git
![Page 42: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/42.jpg)
42 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Two parts to this story • Queuing Strategies
• Optimizing hardware
![Page 43: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/43.jpg)
43 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
David Dawson +44 7900 005 759 [email protected]
Marcus Kern +44 7932 661 527 [email protected]
If you’d like to work with or for Velti please contact the Velti :
Questions?
Thank You
![Page 44: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/44.jpg)
44 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet • Fast
– Over 1,000 credits / sec – Over 10,000 debits / sec ( votes )
• Scalable – Double hardware == Double performance
• Robust / Recoverable – Transactions can not be lost – Wallet balances recoverable in the event of multi-server failure
• Auditable – Complete transaction history
![Page 45: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/45.jpg)
45 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #1 • Use RIAK Only
– Keep things simple – Less moving parts
• A wallet per user containing:
– Previous Balance – Transactions with unique IDs – Rolling Balance – Credits ( facebook / itunes ) – Debits ( votes )
Key = dave@mig
Previous Balance = 2
1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8
Purchase of Credits A Vote
![Page 46: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/46.jpg)
46 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #1 • RIAK = Eventual Consistency
– In the event of siblings – Deterministic due to unique transactions ID’s – Merge the documents and store
Key = dave@mig
Previous Balance = 2
1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9
Key = dave@mig
Previous Balance = 2
1-abcd-1234 (+5) = 7 1-abcd-1236 (-1) = 6
Key = dave@mig
Previous Balance = 2
1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8
![Page 47: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/47.jpg)
47 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #1
• Compacting the wallet – Periodically – In event it grows to large
Key = dave@mig
Previous Balance = 2
1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8
… 1-abcd-9999 (+1) = 78
Key = dave@mig
Previous Balance = 78
Compactor
![Page 48: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/48.jpg)
48 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #1
• Our experiences – Open to abuse – As wallet grows, performance decreases – Risk of sibling explosion – User can go over drawn
![Page 49: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/49.jpg)
49 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #2
• Introduce REDIS – REDIS stores the balance – RIAK stores individual transactions
Credit (2)
Key: dave@mig Value: 78
Key: dave@mig Value: 80
Debit (1) Key: dave@mig Value:79
Key = dave@mig:1-abcd-1235 Value: +2
Key = dave@mig:1-abcd-1236 Value: -1
Key = dave@mig:1-abcd-1234 Value: +1
![Page 50: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/50.jpg)
50 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #2
• Keeping it all in sync – Periodically compare REDIS and RIAK
• Disaster Recovery
– Rebuild all balances in REDIS – Using transactions from RIAK
![Page 51: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/51.jpg)
51 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a wallet - attempt #2
• Our experiences – It works – Fast 10,000 votes / sec ( 6 x HP DL385 ) – Used wallet recovery ( Data Center Power Fail )
• The future – Possible use of levelDB backend for RIAK – Faster wallet recovery
![Page 52: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/52.jpg)
52 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Battle Stories #2
• Building a wallet • Optimizing your hardware stack
• Building a robust queue – final version
![Page 53: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/53.jpg)
53 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Building a Queue • Fast
– > 1000 msg /sec
• Scalable – Double the machines, double the capacity
• Recoverable – In the event of a failure, all messages can be recovered
![Page 54: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/54.jpg)
54 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Design • Queues stored in memory ( volatile )
– Hand rolled our own using ETS ( erlang ) – We needed to add complex behavior such as scheduling – Overflow protection by paging to disk
• Copy of the data and state stored in a shared data store – RIAK ticked all the boxes – Scalable – Robust – Fast
![Page 55: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/55.jpg)
55 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Previously • We explored RIAK to store and recover the queues using:
– Index’s ( levelDB ) • Latencies too unpredictable • Performance was less than half of bitcask
– Key Filtering ( bitcask ) • Write overhead too expensive as we had to update the key not the value ( delete and insert ) • Real world performance under load was not great
– Map Reduce across all key ( bitcask ) • Great for small data sets • Forget it as your data set get’s into the 10 of millions
![Page 56: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/56.jpg)
56 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
New Approach • With a little help from the Basho guys we came up with a new
approach
• Predictable keys + Snapshots ( bitcask ) – Simple – Smallish impact on performance – It worked – And it scales
![Page 57: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/57.jpg)
57 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Our Architecture
Client_Node
Q Erlang Node
Riak Node 1
Q Erlang Node
Riak Node 2
Q Erlang Node
Riak Node 3
Router Node Operator Node
• Each Node has it’s own Queue • Each Node lives on it’s own physical machine • RIAK runs as a cluster on all of the nodes
Basic SMS Gateway topology
![Page 58: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/58.jpg)
58 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Predictable Key • Key: “node : date : counter “
– node: the name of the originating node for the request e.g “client_node” – date: e.g. “2012-01-01” – counter: number of message since the beginning of the day. “3000”
• Value: <message : current_node >
– message: the original request e.g. “send sms” – current_node: the current node the message is located e.g. “router_node”
![Page 59: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/59.jpg)
59 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Snapshot • Every 1000 messages • Take a snapshot of the counter
– Key: “ client_node : 2012-01-01 : snapshot ” – Value: 5000
• This is then used to help determine an upper limit for the recovery – Which will be discussed in more detail in a couple of slides
![Page 60: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/60.jpg)
60 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Queue – incoming node
RIAK Cluster
Local Memory Queue
Predictable Key Generator Request
Receiver <message>
Generate Key
< key > = “node : date : counter “
Persist Push
Request
<key> <message : current_node>
Sender
Pop
<key> <message : current_node>
![Page 61: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/61.jpg)
61 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Queue – intermediate node
RIAK Cluster
Local Memory Queue
Request Receiver
Persist Push
Request
<key> <message : current_node>
Sender Pop
<key> <message : current_node>
<key> <message : previous_node>
![Page 62: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/62.jpg)
62 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Queue – outgoing node
RIAK Cluster
Local Memory Queue
Request Receiver
Persist
Push Request
<key> <message : current_node>
Sender
Pop <key> <message : current_node>
<key> <message : previous_node>
<key> <message : current_node>
delete
![Page 63: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/63.jpg)
63 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Recovery • Take all originating nodes e.g. “client_node” • Using the current date e.g. “2012-01-01” • Use the snapshot to get the last current count recorded e.g. “3000”
– Key: “ client_node : 2012-01-01 : snapshot ” – Value: 3000
• Rebuild by walking the keys from: – from the value: 1 – to the current count + ( 2 x snapshot interval ): 5000 – Across all originating nodes and dates < 5 days
![Page 64: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/64.jpg)
64 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Testing • Benchmarking with 3 x HP365’s ( AMD )
– Production has 18 x HP360’s
• Sustained 2000 req/sec ( 8 x RIAK ops per request ) – Linear scaling in testing
• Recovered 5 million messages in < 1 hour after crashing a node – Whilst processing 500 req/sec sustained
![Page 65: Real-world queuing strategies - Nosql Search Roadshow](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc876a8d33c02b785e3b50/html5/thumbnails/65.jpg)
65 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen
Production • Currently live and used for our SMS Gateway • No noticeable drop in performance when under peak loads • Plan to be used in our other products • Hopefully our final soloution