Storm distributed cache workshop

Post on 22-Jan-2018

74 views 3 download

Transcript of Storm distributed cache workshop

Storm Distributed Cache WorkshopHow to efficiently distribute mutable BLOBs into Apache Storm

Problem (Apache Storm < v1.x)

Topology Resources:

● Dictionaries, ML Models, Geolocation Data, etc...

Typically packaged in topology JAR:

● Immutable: Any change require re-packaging & deployment

● Fine for small files

● Large files negatively impact on topology startup time

Solution (Apache Storm v1.x)

Storm Distributed Cache:

● Allows sharing of files (BLOBs) among topologies

● Files can change over the lifetime of the topology

● Files can be updated from command line or programmatically

● Allows for files from several KB to several GB in size

● Allows for compression(e.g. Zip, Tar, Gzip)

Storm Distributed Cache

Two Implementations:

● LocalFSBlobStore:

○ Stores data on Nimbus local file system

○ Supports Replication Factor (not needed for HDFS-backed implementation)

● HdfsBlobStore:

● Stores data on HDFS file system

Nimbus in High Availability

Nimbus in High Availability

HA Nimbus:

● Increase overall availability on Nimbus

● Nimbus hosts can join/leave at any time

● Leverages Distributed Cache API

● JAR, Config and Serialized Topology uploaded to Distr. Cache

● Replication guarantees availability of all files

Storm Distributed Cache (Create)

Storm Distributed Cache (Submit)

Storm Distributed Cache (Update)

It is possible for the cached files to be updated while topologies are running. In the current

versions it is the user’s responsibility to check whether a new file is available

Storm Distributed Cache (Reading BLOBs)

Hands-On

Intrastructure

+

Twitter producer

Apache Kafka

Aggregate

(WordCount)

+DistCache

Topology

Storm DistCache Topology

Kafka Spout

Storm Distributed Cache

+

wordsToTrack.list

Apache Kafka

Sentence

SplitterCounter

Aggregate

(WordCount)

Tick

Stream

(Signal)

Example

Checkout project:

● https://github.com/rrafanell/storm-distcache-example

● Follow the steps described in the README

Requirements:

● Java Oracle JDK 1.8 or similar

● Maven

● Docker

Code Inspection

Example (Starting the Infrastructure)

Storm UI: http://localhost:8080

Example (Configuring The Twitter-producer)

Example (Running The Twitter-producer)

Example (Uploading BLOBs)

Example (Checking BLOBs)

Example (Running the Topology)

Example (Running the Topology)

Example (Updating the BLOBs & reloading on-the-fly)

Example (Shutting down the Infrastructure)

Storm Distributed Cache Workshop

THANK YOU!

Local FS Blob Store