L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 ·...
Transcript of L13:Distributed Stream Processingcds.iisc.ac.in/wp-content/uploads/L13.Storm_.pdf · 2018-01-04 ·...
Indian Institute of ScienceBangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
Department of Computational and Data Sciences
©Yogesh Simmhan & Partha Talukdar, 2016This work is licensed under a Creative Commons Attribution 4.0 International LicenseCopyright for external content used with attribution is retained by their original authors
L13:Distributed Stream Processing
Yogesh Simmhan3 0 M a r , 2 0 1 6
SE256:Jan16 (2:1)
CDS.IISc.in | Department of Computational and Data Sciences
Announcements Cluster is down until Mar 31, 6PM Assignment B deadline extended till Apr 3, by midnight
‣ NO Extension will be given!
Assignment B submission instructions posted‣ There will not be any demo. We should be able to run the
program ourselves. The report should be self-contained.Equal weightage will be given to the program and the report.
‣ You will be evaluated on correctness of the program and its outputs, its speed/scalability, accuracy of the results, and analysis of the algorithm/results/scalability in the report.
Final Exam on Apr 27, Wed in the AM Project topics posted
‣ Topic proposal by Apr 4‣ Project due by Apr 28‣ Project demos on Apr 30
30-Mar-16 2
CDS.IISc.in | Department of Computational and Data Sciences
Stream are Commonplace (too)
Web & Social Networks‣ Twitter, Facebook, Internet packets
Cybersecurity‣ Telecom call logs, financial transactions, Malware
Internet of Things‣ Smart Transport/Power/Water networks
‣ Smart watch/phone/TV/…
30-Mar-16 3
CDS.IISc.in | Department of Computational and Data Sciences
IISc Smart Campus: Water Management Plan pumping operations for reliability
‣ Avoid underflow/overflow of water‣ 12 hrs to fill a large OHT, scarcity in summer weeks
Provide safer water‣ Leakages, contamination from decades old network
Reduce water usage for sustainability‣ IISc average: 400 Lit/day, Global standard: 135 Lit/day‣ Lack of visibility on usage footprint, sources‣ Opportunities for water harvesting, recycling
Lower the cost‣ Reduce cost for water use & electricity for pumping
30-Mar-16 4
Over Head Tank (OHT)
Ground Level Reservoir (GLR)
BWSSB Main Inlet
OHT 8
GLR 13
Inlet 4+3
IISc Campus• 440 Acres, 8 Km Perimeter• 50 buildings: Office, Hotel,
Residence, Stores• 10,000 people• Water Use: 40 Lakh Lit/Day• 10MW Power Consumed
30-Mar-16 5
CDS.IISc.in | Department of Computational and Data SciencesOver Head Tanks (OHT)
TPH (near Mechanical) JNT Auditorium Chemical Stores Opposite to CENSE
Opposite to NESARA
Behind old C-Mess
Opposite to Cense (new)
E Type Quarters 30-Mar-16 6
CDS.IISc.in | Department of Computational and Data Sciences
Custom Level + Quality Sensor
Fig. 7. System block diagram
Fig. 8. Scatter plot of measured vs actual distance
Fifty readings are taken every 60cm, and histograms are
plotted at each distance. Finally, all the histograms are merged
into a single scatter plot as shown in figure 8. The spots in this
graph indicate the concentration of readings in an area. Clearly,
accuracy is better at shorter distances. We determined that our
sensor has an accuracy within ±1.5% of full scale, with improved
accuracy at shorter distances. Further experiments at a
swimming pool indicated the sensors performed with the same
accuracy as that observed with the Aluminium board. A picture
of the installation of the sensor on a tank is shown in figure 8.
The system is installed in a Ground Level Reservoir (GLR)
in the campus, with a repeater placed on a nearby Overhead Tank
(OHT) and an access point setup a few hundred metres away.
The data collected every minute from the tank is encapsulated
into packets and transmitted to the repeater, which then forwards
it to the access point. The AP unpacks the received packet and
uploads the level data online to a Google App Engine
application. Figure 10 shows the collected data for half a day. A
few glitches in the data can be seen that can be smoothed after
acquisition. Jitter in readings is only 1-2cm.
Fig. 9. Installation on a tank
Fig. 10. Level data collected from a tank
VII. CONCLUSIONS
We have described an IoT system consisting of custom
sensors interconnected via a sub-GHz based wireless network.
The sensor sub-system uses ultrasonic ranging to determine
water levels in large tanks in the distribution system. Optimised
circuitry and algorithm allowed achieving a range of up to 10m
with an accuracy within ±1.5% of full scale, using low cost
transducers. Initial deployment results are encouraging and as a
next step, we will instrument all the tanks. A further addition is
to include control valves so that we can create a smart
distribution system which distributes just enough water to each
tank to satisfy the local demands.
ACKNOWLEDGMENT
We thank Mr. Alok Rawat for the design of the enclosure.
We thank Mr. Sheetal Kumar and Ms. Anjana of Dept. of Civil
Engineering, Indian Institute of Science, for discussions. We
thank the Robert Bosch Centre for Cyber Physical Systems at
Indian Institute of Science, Bangalore, for funding the project.
REFERENCES
[1] UNDP. Human Development Report, Beyond Scarcity: Power,
Poverty and the Global Water Crisis. United Nations
Development Programme, 2006.
[2] UNDP. Human Development Report, Sustaining Human
Progress: Reducing Vulnerability and Building Resilience.
United Nations Development Programme, 2014.
30-Mar-16 7
CDS.IISc.in | Department of Computational and Data Sciences
Backend
30-Mar-16
ECE BuildingW
W
8
CDS.IISc.in | Department of Computational and Data Sciences
Aadhaar Enrolment
Input is stream of identity enrolment packets
Output is a UIDAI ID (success) or rejection
Each task tagged with Latency (ms)
Each edge tagged with Selectivity‣ Input:output rate, probability of path taken
30-Mar-16Benchmarking Fast Data Platforms for the ‘Aadhaar’ Biometric Database, Shukla, et al, Workshop on Big Data Benchmarking (WBDB), 2015 9
CDS.IISc.in | Department of Computational and Data Sciences
Aadhaar Authentication
30-Mar-16Benchmarking Fast Data Platforms for the ‘Aadhaar’ Biometric Database, Shukla, et al, Workshop on Big Data Benchmarking (WBDB), 2015 10
CDS.IISc.in | Department of Computational and Data Sciences
Stream Processing Continuous DataflowsHow do you “compose” analytics that run
continuously over streaming data?
Application defined as Directed Acyclic Graph (DAG)‣ Vertices are Tasks
‣ Edges are streams
‣ Streams carry tuples/events (name:value) or messages(opaque)
‣ Tasks process one or more messages/tuples and generate zero or more messages/tuples
‣ Message routing?
30-Mar-16 11
CDS.IISc.in | Department of Computational and Data Sciences
Event Processing Programming Models Query Based
‣ Complex Event processing‣ SQL like languages
Programming APIs
Queries or the Programs run on a continuous stream, unlike Hadoop where your data is static for the Batch processor
Need to address diverse streams – Unbounded sequence of events
ExamplesVideo Camera framesTweetsLaser scans from a robotLog data
© Programming Models for IoT and Streaming Data, Qiu
CDS.IISc.in | Department of Computational and Data Sciences
Distributed Stream Processing Systems
Aurora – Early Research System
Borealis – Early Research System
Apache Storm
Apache S4
Apache Samza
Google MillWheel
Amazon Kinesis
LinkedIn Databus
Facebook Puma/Ptail/Scribe/ODS
Azure Stream Analytics
© Programming Models for IoT and Streaming Data, Qiu
CDS.IISc.in | Department of Computational and Data Sciences
Apache StormSee Nathan Marz’s Slides
http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html
30-Mar-16 14
CDS.IISc.in | Department of Computational and Data Sciences
Reliable Processing
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 15
CDS.IISc.in | Department of Computational and Data Sciences
Reliable Processing
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 16
CDS.IISc.in | Department of Computational and Data Sciences
Storm Cluster View
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 17
CDS.IISc.in | Department of Computational and Data Sciences
Fault Tolerance
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 18
CDS.IISc.in | Department of Computational and Data Sciences
Fault Tolerance
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 19
CDS.IISc.in | Department of Computational and Data Sciences
Fault Tolerance
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 20
CDS.IISc.in | Department of Computational and Data Sciences
Fault Tolerance
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 21
CDS.IISc.in | Department of Computational and Data Sciences
Fault Tolerance
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 22
CDS.IISc.in | Department of Computational and Data Sciences
Parallelism
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 23
CDS.IISc.in | Department of Computational and Data Sciences
Parallelism
© Mahender Immadi & Thirupathi Guduru, Cerner30-Mar-16 24
CDS.IISc.in | Department of Computational and Data Sciences
Other Streaming Platforms
Apache Spark Streaming
IBM InfoSphere Streams
Yahoo S4
30-Mar-16 25
CDS.IISc.in | Department of Computational and Data Sciences
Reading
Ankit Toshniwal, et al. Storm@twitter. In ACM SIGMOD, 2014
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters, Zaharia, et al, USENIX HotCloud, 2012, https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/zaharia
Leonardo Neumeyer, et al, S4: Distributed Stream Computing Platform. In ICDMW 2010
30-Mar-16 26