Enterprise Grade Streaming under 2ms on Hadoop
-
Upload
hadoop-summit -
Category
Technology
-
view
196 -
download
2
Transcript of Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming Under 2ms On Hadoop
@vijaysbhat
2
3
VS.
4
5
6
7
X (predictor)Spend amount, geo
Y (response)
Simple Velocity Advanced
8
9
10
11
Hard Metrics Goal
Latency < 40msIdeally < 16ms
Throughput Goal of 2000 events / second
Durability No loss, every message gets exactly one response
Availability 99.5% uptime (downtime of 1.83 days / year);Ideally 99.999% uptime (downtime of 5.26 minutes / year)
Scalability Can add resources, still meet latency requirements
Integration Transparently connected to existing systems – Hardware, Messaging, HDFS
Soft Metrics Goal
Open Source All components licensed as open source
Extensibility Rules can be updated, model is regularly refreshed
12
13
Onyx
14
Enterprise Readiness
RoadmapPerformance
Community
15
16
17
18
19
20
21
YARN
22
23
24
Failure Handling
25
26
• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM
Thread Local on ~54M eventsPercentiles (in ms)
Throughput CountAvg
(ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec54,126,122 0.19 1 1 1 2 2 5 6
Performance
27
Durability
• Two physically independent pipelines on the same cluster processing identical data
• For the same tuple, we find the best-case time between two pipelines– 39 records out of 5.2M exceeded 16ms – 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other
• 99.99925% success rate – “Five Nines”•Average Latency of 0.0981ms
28
@vijaysbhatlinkedin.com/in/vijaysbhat