From Rivulets to Rivers: Elastic Stream Processing in Heron

Post on 05-Apr-2017

48 views 0 download

Transcript of From Rivulets to Rivers: Elastic Stream Processing in Heron

From Rivulets to Rivers:Elastic Stream Processing in Heron

Bill Graham , Twitter - @billgrahamAshvin Agrawal, MicrosoftAvrilia Floratou, Microsoft

Prediction is very difficult, especially if it’s about the future.- Nils Bohr

We cannot direct the wind, but we can adjust the sails.- Dolly Parton

Outline

Heron Overview

Elastic Scaling Opportunities

Current Implementation

Work in Progress – Auto-scaling

Heron

A realtime, distributed, fault-tolerant stream processing engine.

About Heron Developed by Twitter in 2014 Open sourced in May 2016 Storm API compatible Isolation at all levels:

- Topology- Container- Task (process-based)

At least once, at most once semantics Backpressure Low resource overhead (< 10%)

Logical Topology

Spout 1

Spout 2

Bolt 1

Bolt 2

Bolt 3

Bolt 4

Bolt 5

Physical Execution

Spout 1

Spout 2

Bolt 1

Bolt 2

Bolt 3

Bolt 4

Bolt 5

Packing Plan How to distribute instances onto containers? IPacking.pack()

Topology SubmissionContainer 2

Container 0

Stream Manager

S1 S2 B3

B4 B5 B6

TopologyMaster

HeronClient

HeronScheduler

Container 3

Stream Manager

S1 B2 B3

B4 B5 B6

Container 1

Stream Manager

S1 S2 B3

B4 B5 B6

heron submit

PackingPlan

Instances RegisterStream Manager RegistersData Flows

Containers AllocatedProcesses Initialize

Data Rate Variations

Parallelism Challenges

Anticipating component parallelism is difficult

Changing parallelism is costly - O(hour)

- code change, review, merge, build, kill, submit

Tuning for load spikes or valleys is manual - O(day)

Under-provisioning leads to back pressure leads to support costs

Over-provisioning is the norm

Over-provisioning

25%

40%

CPU Used

CPU Requested

Elastic Scaling Opportunity

Reduce administration cost

Reduce support cost

Reduce hardware cost

Provide better SLA

User Tasks Heron System Tasks

Ordinary Topology Management Process

Kill Topology

Submit Topology

Create Packing

AcquireResource

s

Monitor / Estimate

Build State

Start Topology

Install Topology

Time Consuming Tasks

ReleasesResources

Low-cost Topology “update”

32 2 34 4

User Tasks Heron System Tasks

Optimized Topology Scale-up Process

KillTopology

Submit Topology

Create Packing

AcquireResources

Monitor / Estimate

Build State

Start Topology

Install Topology

Update Topology

PauseTopology

Un-PauseTopology

Add / Reduce

ResourcesPrepare

Components

heron “update” …

Minimizes Disruption

Aggressively Prunes Containers

Aims to Maintain Uniform Component Distribution

$ heron update my_cluster/user/dev MyTopology \--component-parallelism=bolt1:20 \--component-parallelism=bolt2:40

Available in 0.14.5

Execution Time O(mins)

Customizable Through IRepacking.repack()

Current Limitations

Automated state transition not yet supported

- Component scaling event notification : IUpdatable.update()

- Example: KafkaSpout queue partition mappings

Fields group routing might change

- Workaround: pause topology > cache flush interval before scaling

Algorithmic Auto-Scaling

User Tasks Heron System TasksUser Tasks Heron System Tasks

Algorithmic Auto-Scaling …

Submit Topology

Create Packing

AcquireResources

Monitor / Estimate

Build State

Start Topology

Install Topology

Update Topology

PauseTopology

Un-PauseTopology

Add / Reduce

ResourcesPrepare

Components

Auto-Scaling

Heron should automatically identify

variations in the incoming load and

react to them.

Dhalion periodically observes the state of the topology and determines

whether resources should be scaled up or

down.

Heron uses Dhalion to adjust to external shocks.

Dhalion is a framework that provides

self-regulating capabilities to Heron and will be

open-sourced in the near future.

Using Dhalion to Auto-Scale

Pending Packets Detector

Backpressure Detector

Processing Rate Skew Detector

Resource Underprovisioning

Diagnoser

Data Skew Diagnoser

Slow Instances Diagnoser

Resolver Invocation

Diagnosis

Symptom Detection Diagnosis Generation

Bolt Scale Down

Resolver

Bolt Scale Up Resolver

Restart Instances Resolver

Resource Overprovisioning

Diagnoser

Data Skew Resolver

Sym

ptom

s

Resolution

Metrics

Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed

Initial Results

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40Spout Splitter Bolt

Time (in minutes)

Nor

mal

ized

Thr

ough

put Scale

Down

Scale Up

S1

S2

S3Dhalion is able to adjust the

topology resources on-the-fly when workload spikes occur.

Our policy eventually reaches a healthy state where

backpressure is not observed and the overall throughput is

maximized.

Future Plans

Use Dhalion to enforce throughput and latency

SLOsand to auto-tune Heron

topologies.

Open-source Dhalion and the auto-scaling

policy as part of Heron.

Combine scaling with stateful stream

processing.

Get Involved

http://github.com/twitter/heron

http://heronstreaming.io

@heronstreaming

Questions?