Streaming benchmark

20
Streaming Benchmark Vinaya M S Insight Data Science

Transcript of Streaming benchmark

StreamingBenchmark

Vinaya MSInsightDataScience

m4.large:3

r3.large:4

r3.large:4

r3.large:1

Program

Readtweets

Parsetweets

Countcharacters

Store(r_ts,w_ts)

Howmanytweetsareprocessed/second?

Insertread_ts Insertwrite_ts

100tweets/second

Subsetof100tweets/second

Latency:

∑ (𝑤𝑟𝑖𝑡𝑒() − 𝑟𝑒𝑎𝑑())./010234./056786 /total_tweets_processed

winStart:windowstarttime;winEnd:windowEndtime

RESULTS

Whataboutstorm?

• Totalnumberoftweetsprocessedinstormaremore.•~41000in160sec(Flink:~18500in160sec)

Storm:

Maximise perwindowbasedthroughput/latency??

•Needsperformancetunings.

FewtuningsIconsidered

§Numberofcomputebolts

§Consumerspouts.

§Javaheapsize.Playsanimportantrole.

§Tried2ofthegroupings.

Challenges:

•Visualizingthestormdatadistribution.

•NTPserversync.

•SQLqueries.

AboutMe

3yearsexperience.

MS- ComputerScience

Enjoy:TableTennisCooking

Thankyou😊

Recommendation

Storm• Lowlevelapis:• builder.setBolt("parse-bolt",newParseTweetBolt(),parallel*3).shuffleGrouping("tweet-spout");

• Betterunderstandingofarchitectureisrequired.

• Configurationtunings

Flink• Hidesdetails:• flatMap(newParseTweetBolt())

• Storm:Roundrobinscheduling.

• Flink:Eachtaskslotintaskmanagercanrunonepipelineofparalleltask.

Storm: