Big Data Hadoop Apex App for Device to Mobile, GPS Tracking with DataTorrent
-
Upload
datatorrent -
Category
Technology
-
view
203 -
download
0
Transcript of Big Data Hadoop Apex App for Device to Mobile, GPS Tracking with DataTorrent
Big Data Hadoop Apex App for device to mobile, GPS tracking @DataTorrent
Venkatesh Kottapalli. Software Engineer
Vikram Patil. Software Engineer
Agenda:
● Introduction to Apex
● Use cases for GPS Tracking
● General requirements for GPS Tracking App
● Application Architecture using Apache Apex
● Further App Details
● Resources
Apache Apex - Stream Processing
Easily Operable - Exposes an easy API for developing Operators (part of an
application) and Applications
Highly Scalable - Scales statically as well as dynamically
Highly Performant - Can reach single digit millisecond end-to-end latency
Fault Tolerant - Automatically recovers from failures - without manual
intervention
Stateful - Guarantees that no state will be lost
Apex Malhar library
YARN - Native - Uses Hadoop YARN framework for resource negotiation
Apache Apex Platform Overview
An Apex Application is a DAG(Directed Acyclic Graph)
A DAG is composed of vertices (Operators) and edges (Streams).
A Stream is a sequence of data tuples which connects operators at end-points called Ports
An Operator takes one or more input streams, performs computations & emits one or more output
streams
● Each operator is USER’s business logic, or built-in operator from our open source
library
● Operator may have multiple instances that run in parallel
Apex - Native Hadoop Integration
• YARN is the resource manager
• HDFS used for storing any persistent state
Usecases:
● Track fleet vehicles while they are in transit for path safety or any kind of
frauds.
● Bus tracking for Government / Private Transportations to adjust routes
dynamically according to traffic situations.
● Track wild animals using gps enabled collars or devices
● Track inventory of items including cars, refrigerators, expensive retail goods
etc.
● Location based transportation apps. Ex - Uber, Lyft
● Location based gaming apps. Ex - Pokemon go
● Location based utility apps. Ex - Find my friends
General Requirements:
● Accept data from millions of devices through Tcp sockets
or over MQTT protocol.
● Once data is ingested, it need to be processed in realtime
to identify trends or events.
● Based on event priority, customer need to be informed
about it as well historical data need to be stored for
analysis or further review.
Overall Application Architecture:
App Details:
● Http Rest API support
● Websocket Support for clients to receive real-time
updates from App.
● Receive device data from millions of devices using tcp
socket at configured time interval.
[ Device data = location and device identification + (
temperature / pressure / battery status etc ) ]
● Device data parsing + processing to make it actionable in
real-time.
GPS Data Processing App
Websocket App
Http Server App
Communication between apps● Any config updates by the end user will be received by the http load receiver and
published onto a kafka topic which is then consumed by the GPS tracking app and the configuration is updated in memory in real time
Data Persistence
● Cassandra Output Operator● Cassandra Input Operator● Event Archival
Resources●http://apex.apache.org/
●Learn more: http://apex.apache.org/docs.html
●Subscribe - http://apex.apache.org/community.html
●Download - http://apex.apache.org/downloads.html
●Follow @ApacheApex - https://twitter.com/apacheapex
●Meetups – http://www.meetup.com/pro/apacheapex/
●More examples: https://github.com/DataTorrent/examples
●Slideshare: http://www.slideshare.net/ApacheApex/presentations
●https://www.youtube.com/results?search_query=apache+apex
●Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/
Q&A
Thank you