Punch clock for debugging apache storm

Post on 19-Feb-2017

218 views 0 download

Transcript of Punch clock for debugging apache storm

Punch clock for Apache storm

<just an idea>

Punch clock (a.ka. time clock)

Punch clock (a.ka. time clock)● You have a card per person.

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

● The person punches OUT with the card

when he/she leaves the office.

Punch clock (a.ka. time clock)● You have a card per person.

● The person punches IN with the card when

he/she enters the office.

● The person punches OUT with the card

when he/she leaves the office.

● The punch clock records the time of

entry/exit on the card

MotivationTo Find out …

MotivationTo Find out …

1. When did the Person enter / exit the office ?

MotivationTo Find out …

1. When did the Person enter / exit the office ?

2. Who is still in office ?

Change of Context …

“Apache Storm”Tuples going In & Out

of Spouts/Bolts

MotivationDebugging Apache Storm*

* Debugging Storm Transactional Topologies

Debugging Transactional Topologies

Debugging Transactional Topologies

1. Spout emits a batch of data(tuples) which forms a


Debugging Transactional Topologies

1. Spout emits a batch of data(tuples) which forms a


2. Every Bolt in the topology processes that batch of data


MotivationTo Find out …

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

MotivationTo Find out …

1. When did the batch enter/exit the Spout/Bolt ?

2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

a. On which host are they stuck ?

b. In which Spout/Bolt are they stuck ?

Possible Solution(s):

Possible Solution(s): Add a log statement before and after the critical section.

Possible Solution(s): Add a log statement before and after the critical section.

log.info(“Inserting data into database ….”); // ← entering

datasource.insert(table, tuples); // ←the real work

log.info(“Inserted data into database.”); //← exiting

Possible Solution(s): Add a log statement before and after the critical section.

log.info(“Inserting data into database ….”); // ← entering

datasource.insert(table, tuples); // ←the real work

log.info(“Inserted data into database.”); //← exiting


Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work,

Elastic Search Kibana ?

Possible Solution(s):

Use http://riemann.io/index.html

This was Suggested by my friend angad. I have not looked at this though.

My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.

My IdeaBatch of Tuples Punch IN and Punch Out in a bolt / spout.

Punch In - Put into hashmap (or any other suitable data structure)

Punch Out - Remove from hashmap (or any other suitable data structure)

My Idea: Batch of Tuples Punch In and Punch Out in a spout.

In the emitBatch of Transactional Spout:

PunchClock.getInstance().punchIn(punchCardId); // ←Punch In

collector.emit(tuples); // ←Emit tuple(s)

PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out

Batch of Tuples Punch IN and Punch Out in a bolt .

In the prepare method of Transactional Bolt:

punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch

Card for txn

In the execute method of Transactional Bolt:

PunchClock.getInstance().punchIn(punchCardId); // ← Punch In

In the finishBatch method of Transactional Bolt:

PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out

My Idea:


but it’s a simple Put / Remove call to a hashmap.

When compared to logging it’s cheaper

Is it intrusive ?

Punch Clocks

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

● Since we have multiple JVM we have multiple Punch Clocks.

Punch Clocks● Spouts / Bolts housed in a storm worker jvm.

● One Punch Clock per JVM.

● Since we have multiple JVM we have multiple Punch Clocks.

● Batches move across storm workers & we have multiple JVM,

○ We need to aggregate the data across Punch Clocks.

○ Expose Punch Clock via JMX.


thank you


