Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY...
Transcript of Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY...
![Page 1: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/1.jpg)
Flying Faster with Heron
KARTHIK RAMASAMY @KARTHIKZ
#TwitterHeron
![Page 2: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/2.jpg)
BEGIN
END
OVERVIEW
!I
MOTIVATION
(II
HERON PERFORMANCE
KV
OPERATIONAL EXPERIENCES
ZIV
TALK OUTLINE
HERON
bIII
![Page 3: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/3.jpg)
OVERVIEW
![
![Page 4: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/4.jpg)
TWITTER IS REAL TIME
G
Emerging break out trends in Twitter (in the
form #hashtags)
Ü
Real time sports conversations related
with a topic (recent goal or touchdown)
"
Real time product recommendations based
on your behavior & profile
real time searchreal time trends real time conversations real time recommendations
Real time search of tweets
s
ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
![Page 5: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/5.jpg)
GUARANTEED MESSAGE
PROCESSING
HORIZONTAL SCALABILITY
ROBUST FAULT
TOLERANCE
CONCISE CODE- FOCUS
ON LOGIC
/b \ Ñ
TWITTER STORM
Streaming platform for analyzing realtime data as they arrive, so you can react to data as it happens.
![Page 6: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/6.jpg)
STORM TERMINOLOGYTOPOLOGY
Directed acyclic graph
Vertices=computation, and edges=streams of data tuples
SPOUTS
Sources of data tuples for the topology
Examples - Event Bus/Kafka/Kestrel/MySQL/Postgres
BOLTS
Process incoming tuples and emit outgoing tuples
Examples - filtering/aggregation/join/arbitrary function
,
%
![Page 7: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/7.jpg)
STORM TOPOLOGY
%
%
%
%
%
SPOUT 1
SPOUT 2
BOLT 1
BOLT 2
BOLT 3
BOLT 4
BOLT 5
![Page 8: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/8.jpg)
WORD COUNT TOPOLOGY
% %TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT
Live stream of Tweets
LOGICAL PLAN
![Page 9: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/9.jpg)
WORD COUNT TOPOLOGY
% %TWEET SPOUT
TASKSPARSE TWEET BOLT
TASKSWORD COUNT BOLT
TASKS
%%%% %%%%
When a parse tweet bolt task emits a tuple which word count bolt task should it send to?
![Page 10: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/10.jpg)
STREAM GROUPINGS
Random distribution of tuples
Group tuples by a field or multiple
fields
Replicates tuples to all tasks
SHUFFLE GROUPING FIELDS GROUPING ALL GROUPING
Sends the entire stream to one task
GLOBAL GROUPING
/ - ,.
![Page 11: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/11.jpg)
WORD COUNT TOPOLOGY
% %TWEET SPOUT
TASKSPARSE TWEET BOLT
TASKSWORD COUNT BOLT
TASKS
%%%% %%%%
SHUFFLE GROUPING FIELDS GROUPING
![Page 12: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/12.jpg)
MOTIVATION
(
![Page 13: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/13.jpg)
STORM ARCHITECTURE
Nimbus
ZK CLUSTER
SUPERVISOR
W1 W2 W3 W4
SUPERVISOR
W1 W2 W3 W4
TOPOLOGY SUBMISSION ASSIGNMENT
MAPS
SLAVE NODE SLAVE NODE
MASTER NODE
Multiple Functionality Scheduling/Monitoring Single point of failure
Storage Contention
No resource reservation and isolation
![Page 14: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/14.jpg)
STORM WORKER
TASK4
TASK5
EXECUTOR2
TASK2
TASK3
TASK1
EXECUTOR1
JVM
PR
OC
ESS
Complex hierarchy
Difficult to tune
Hard to debug
![Page 15: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/15.jpg)
DATA FLOW IN STORM WORKERS
In QueueIn QueueIn QueueIn QueueIn Queue
TCP Receive Buffer
In QueueIn QueueIn QueueIn QueueOut Queue
Outgoing Message Buffer
User Logic Thread
User Logic Thread
User Logic Thread
User Logic Thread
User Logic Thread
User Logic Thread
User Logic Thread
User Logic Thread
User Logic ThreadSend Thread
Global Send Thread
TCP Send Buffer
Global Receive Thread
Kernel
Queue Contention
Multiple Languages
![Page 16: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/16.jpg)
OVERLOADED ZOOKEEPER
zk
S1
S2
S3
Scaled up
W
W
WSTORM
zk
Handled unto to 1200 workers per cluster
![Page 17: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/17.jpg)
67%
33%
OVERLOADED ZOOKEEPER
KAFKA SPOUT
Offset/partition is written every 2 secs
STORM RUNTIME
Workers write heart beats every 3 secs
Analyzing zookeeper traffic
![Page 18: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/18.jpg)
OVERLOADED ZOOKEEPER
zk
S1
S2
S3
Heart beat daemons
W
W
WSTORM
zk
5000 workers per cluster
HHH
KVKVKV
![Page 19: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/19.jpg)
shared pool
storm cluster
STORM - DEPLOYMENT
![Page 20: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/20.jpg)
shared pool
storm cluster
joe’s topology
isolated pools
STORM - DEPLOYMENT
![Page 21: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/21.jpg)
STORM - DEPLOYMENT
shared pool
storm cluster
joe’s topology
isolated pools
jane’s topology
![Page 22: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/22.jpg)
STORM - DEPLOYMENT
shared pool
storm cluster
joe’s topology
isolated pools
jane’s topology
dave’s topology
![Page 23: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/23.jpg)
g
G
STORM ISSUES
LACK OF BACK PRESSURE
Drops tuples unpredictably
EFFICIENCY
Serialization program consumes 75 cores at 30% CPU
Topology consumes 600 cores at 20-30% CPU
NO BATCHING
Tuple oriented system - implicit batching by 0MQ !
![Page 24: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/24.jpg)
EVOLUTION OR REVOLUTION?
FUNDAMENTAL ISSUES- REQUIRE EXTENSIVE REWRITING
Several queues for moving data
Inflexible and requires longer development cycle
USE EXISTING OPEN SOURCE SOLUTIONS
Issues working at scale/lacks required performance
Incompatible API and long migration process
,
fix storm or develop a new system?
![Page 25: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/25.jpg)
HERONb
![Page 26: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/26.jpg)
HERON DESIGN GOALSFULLY API COMPATIBLE WITH STORM
Directed acyclic graph
Topologies, spouts and bolts
USE OF MAIN STREAM LANGUAGES
C++/JAVA/Python
"
d
#TASK ISOLATION
Ease of debug ability/resource isolation/profiling
![Page 27: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/27.jpg)
HERON ARCHITECTURE
Topology 1
TOPOLOGY SUBMISSION
Scheduler
Topology 2
Topology 3
Topology N
![Page 28: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/28.jpg)
TOPOLOGY ARCHITECTURE
Topology Master
ZK CLUSTER
Stream Manager
I1 I2 I3 I4
Stream Manager
I1 I2 I3 I4
Logical Plan, Physical Plan and Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics Manager
Metrics Manager
![Page 29: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/29.jpg)
TOPOLOGY MASTER
ASSIGNS ROLE MONITORING METRICS
b \ Ñ
Solely responsible for the entire topology
![Page 30: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/30.jpg)
TOPOLOGY MASTER
Topology Master
ZK CLUSTER
Logical Plan, Physical Plan and Execution State
PREVENT MULTIPLE TM BECOMING MASTERS!
! ALLOWS OTHER PROCESS TO DISCOVER TM
![Page 31: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/31.jpg)
STREAM MANAGER
ROUTES TUPLES BACK PRESSURE ACK MGMT
Ñ
Routing Engine
/ ,
![Page 32: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/32.jpg)
STREAM MANAGER
% %
S1 B2 B3
%
B4
![Page 33: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/33.jpg)
S1 B2
B3
STREAM MANAGER
Stream Manager
Stream Manager
Stream Manager
Stream Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
O(n2) O(k2)
B4
![Page 34: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/34.jpg)
S1 B2
B3
STREAM MANAGER
Stream Manager
Stream Manager
Stream Manager
Stream Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
tcp back pressure
B4
SLOWS UPSTREAM AND DOWNSTREAM INSTANCES
![Page 35: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/35.jpg)
S1 B2
B3
STREAM MANAGER
Stream Manager
Stream Manager
Stream Manager
Stream Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
spout back pressure
B4
S1 S1
S1S1
![Page 36: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/36.jpg)
S1 B2
B3
STREAM MANAGER
Stream Manager
Stream Manager
Stream Manager
Stream Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
stage by stage back pressure
B4
S1 S1
S1S1 B2 B2
B2B2
![Page 37: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/37.jpg)
STREAM MANAGER
PREDICTABILITY
Tuple failures are more deterministic
SELF ADJUSTS
Topology goes as fast as the slowest component
!
!
back pressure advantages
![Page 38: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/38.jpg)
HERON INSTANCE
RUNS ONE TASK EXPOSES API COLLECTS METRICS
|
Does the real work!
p
>>
>
![Page 39: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/39.jpg)
HERON INSTANCE
Stream Manager
Metrics Manager
Gateway Thread
Task Execution Thread
data-in queue
data-out queue
metrics-out queue
![Page 40: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/40.jpg)
OPERATIONAL EXPERIENCES
K$
![Page 41: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/41.jpg)
HERON DEPLOYMENTTopology 1
Topology 2
Topology 3
Topology N
Heron Tracker
Heron VIZ
Heron Web
ZK CLUSTER
Aurora Services
Aurora Scheduler
Observability
![Page 42: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/42.jpg)
HERON SAMPLE TOPOLOGIES
![Page 43: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/43.jpg)
SAMPLE TOPOLOGY DASHBOARD
![Page 44: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/44.jpg)
Large amount of data produced every day
Large cluster Several topologies deployed
Several billion messages every day
HERON @TWITTER
1 stage 10 stages
3x reduction in cores and memory
STORM is decommissioned
![Page 45: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/45.jpg)
HERON PERFORMANCE
x
9
![Page 46: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/46.jpg)
HERON PERFORMANCESettings
COMPONENTS EXPT #1 EXPT #2 EXPT #3 EXPT #4
Spout 25 100 200 300
Bolt 25 100 200 300
# Heron containers 25 100 200 300
# Storm workers 25 100 200 300
![Page 47: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/47.jpg)
HERON PERFORMANCEm
illion
tupl
es/m
in
0
350
700
1050
1400
Spout Parallelism25 100 200 500
Storm Heron
Word count topology - Acknowledgements enabled
late
ncy
(ms)
0
625
1250
1875
2500
Spout Parallelism25 100 200 500
Storm Heron
10-14x
Throughput Latency
5-15x
![Page 48: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/48.jpg)
HERON PERFORMANCE#
core
s us
ed
0
625
1250
1875
2500
Spout Parallelism25 100 200 500
Storm Heron
Word count topology - CPU usage
2-3x
![Page 49: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/49.jpg)
HERON PERFORMANCEThroughput and CPU usage with no acknowledgements - Word count topology
milli
on tu
ples
/min
0
1250
2500
3750
5000
Spout Parallelism25 100 200 500
Storm Heron
# co
res
used
0
625
1250
1875
2500
Spout Parallelism25 100 200 500
Storm Heron
![Page 50: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/50.jpg)
HERON EXPERIMENTRTAC topology
% %CLIENT EVENT
SPOUTDISTRIBUTOR
BOLTUSER COUNT
BOLT
%AGGREGATOR
BOLT
SHUFFLE GROUPING
FIELDS GROUPING
FIELDS GROUPING
![Page 51: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/51.jpg)
HERON PERFORMANCE
Acknowledgements enabled
# co
res
used
0
100
200
300
400
Storm Heron
CPU usage - RTAC Topology
No acknowledgements
# co
res
used
0
100
200
300
400
Storm Heron
![Page 52: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/52.jpg)
HERON PERFORMANCEla
tenc
y (m
s)
0
17.5
35
52.5
70
Storm Heron
Latency with acknowledgements enabled - RTAC Topology
![Page 53: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/53.jpg)
CURIOUS TO LEARN MORE…
1
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real-time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage – all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now the de facto stream data processing engine inside Twitter, and in this paper we also share our experiences from running Heron in production. In this paper, we also provide empirical evidence demonstrating the efficiency and scalability of Heron. ACM Classification H.2.4 [Information Systems]: Database Management—systems
Keywords Stream data processing systems; real-time data processing.
1. INTRODUCTION Twitter, like many other organizations, relies heavily on real-time streaming. For example, real-time streaming is used to compute the real-time active user counts (RTAC), and to measure the real-time engagement of users to tweets and advertisements. For many years, Storm [16, 20] was used as the real-time streaming engine inside Twitter. But, using Storm at our current scale was becoming increasingly challenging due to issues related to scalability, debug-ability, manageability, and efficient sharing of cluster resources with other data services.
A big challenge when working with Storm in production is the issue of debug-ability. When a topology misbehaves – which could be for a variety of reasons including load changes, misbehaving user code, or failing hardware – it is important to quickly determine the root-causes for the performance degradation. In Storm, work from multiple components of a topology is bundled into one operating
system process, which makes debugging very challenging. Thus, we needed a cleaner mapping from the logical units of computation to each physical process. The importance of such clean mapping for debug-ability is really crucial when responding to pager alerts for a failing topology, especially if it is a topology that is critical to the underlying business model.
In addition, Storm needs dedicated cluster resources, which requires special hardware allocation to run Storm topologies. This approach leads to inefficiencies in using precious cluster resources, and also limits the ability to scale on demand. We needed the ability to work in a more flexible way with popular cluster scheduling software that allows sharing the cluster resources across different types of data processing systems (and not just a stream processing system). Internally at Twitter, this meant working with Aurora [1], as that is the dominant cluster management system in use.
With Storm, provisioning a new production topology requires manual isolation of machines, and conversely, when a topology is no longer needed, the machines allocated to serve that topology now have to be decommissioned. Managing machine provisioning in this way is cumbersome. Furthermore, we also wanted to be far more efficient than the Storm system in production, simply because at Twitter’s scale, any improvement in performance translates into significant reduction in infrastructure costs and also significant improvements in the productivity of our end users.
We wanted to meet all the goals outlined above without forcing a rewrite of the large number of applications that have already been written for Storm; i.e. compatibility with the Storm and Summingbird APIs was essential. (Summingbird [8], which provides a Scala-idiomatic way for programmers to express their computation and constraints, generates many of the Storm topologies that are run in production.)1
After examining various options, we concluded that we needed to design a new stream processing system to meet the design goals outlined above. This new system is called Heron. Heron is API-compatible with Storm, which makes it easy for Storm users to migrate to Heron. All production topologies inside Twitter now run on Heron. Besides providing us significant performance improvements and lower resource consumption over Storm, Heron also has big advantages in terms of debug-ability, scalability, and manageability.
In this paper, we present the design of Heron, and also present results from an empirical evaluation of Heron. We begin by briefly describing related work in the next section. Then, in Section 3, we describe Storm and motivate the need for Heron.
1 Work done while consulting for Twitter.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SIGMOD’15, May 31–June 4, 2015, Melbourne, Victoria, Australia. ACM 978-1-4503-2758-9/15/05. http://dx.doi.org/10.1145/2723372.2723374
239
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT This paper describes the use of Storm at Twitter. Storm is a real-time fault-tolerant and distributed stream data processing system. Storm is currently being used to run various critical computations in Twitter at scale, and in real-time. This paper describes the architecture of Storm and its methods for distributed scale-out and fault-tolerance. This paper also describes how queries (aka. topologies) are executed in Storm, and presents some operational stories based on running Storm at Twitter. We also present results from an empirical evaluation demonstrating the resilience of Storm in dealing with machine failures. Storm is under active development at Twitter and we also present some potential directions for future work.
1. INTRODUCTION Many modern data processing environments require processing complex computation on streaming data in real-time. This is particularly true at Twitter where each interaction with a user requires making a number of complex decisions, often based on data that has just been created.
Storm is a real-time distributed stream data processing engine at Twitter that powers the real-time stream data management tasks that are crucial to provide Twitter services. Storm is designed to be:
1. Scalable: The operations team needs to easily add or remove
nodes from the Storm cluster without disrupting existing data flows through Storm topologies (aka. standing queries).
2. Resilient: Fault-tolerance is crucial to Storm as it is often deployed on large clusters, and hardware components can fail. The Storm cluster must continue processing existing topologies with a minimal performance impact.
3. Extensible: Storm topologies may call arbitrary external functions (e.g. looking up a MySQL service for the social graph), and thus needs a framework that allows extensibility.
4. Efficient: Since Storm is used in real-time applications; it must have good performance characteristics. Storm uses a number of techniques, including keeping all its storage and computational data structures in memory.
5. Easy to Administer: Since Storm is at that heart of user interactions on Twitter, end-users immediately notice if there are (failure or performance) issues associated with Storm. The operational team needs early warning tools and must be able to quickly point out the source of problems as they arise. Thus, easy-to-use administration tools are not a “nice to have feature,” but a critical part of the requirement.
We note that Storm traces its lineage to the rich body of work on stream data processing (e.g. [1, 2, 3, 4]), and borrows heavily from that line of thinking. However a key difference is in bringing all the aspects listed above together in a single system. We also note that while Storm was one of the early stream processing systems, there have been other notable systems including S4 [5], and more recent systems such as MillWheel [6], Samza [7], Spark Streaming [8], and Photon [19]. Stream data processing technology has also been integrated as part of traditional database product pipelines (e.g. [9, 10, 11]).
Many earlier stream data processing systems have led the way in terms of introducing various concepts (e.g. extensibility, scalability, resilience), and we do not claim that these concepts were invented in Storm, but rather recognize that stream processing is quickly becoming a crucial component of a comprehensive data processing solution for enterprises, and Storm
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SIGMOD’14, June 22–27, 2014, Snowbird, Utah, USA. Copyright © 2014 ACM 978-1-4503-2376-5/14/06…$15.00. http://dx.doi.org/10.1145/2588555.2595641
147
![Page 54: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/54.jpg)
CONCLUSIONSIMPLIFIED ARCHITECTURE
Easy to debug, profile and support
HIGH PERFORMANCE
7-10x increase in throughput
5-10x improvement in latency
"
%
#EFFICIENCY
3-5x decrease in resource usage
![Page 55: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/55.jpg)
&
#ThankYouFOR LISTENING
![Page 56: Flying Faster with Heron - QCon San Francisco · Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron. BEGIN END OVERVIEW I! MOTIVATION II (HERON PERFORMANCE V K OPERATIONAL](https://reader030.fdocuments.us/reader030/viewer/2022011912/5f9ce0d511b9b56ee405d206/html5/thumbnails/56.jpg)
QUESTIONS
and
ANSWERS
R' Go ahead. Ask away.