Deep Dive into Apache Apex App Development
-
Upload
apache-apex -
Category
Technology
-
view
375 -
download
3
Transcript of Deep Dive into Apache Apex App Development
![Page 1: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/1.jpg)
Deep Dive Into Apache Apex Application
Chaitanya Chebolu
![Page 2: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/2.jpg)
Application Development Model
2
▪A Stream is a sequence of data tuples▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance is single-threaded▪Directed Acyclic Graph (DAG) is made up of operators and streams
Directed Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
Filtered Stream
Enriched Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
![Page 3: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/3.jpg)
3
Typical application example
![Page 4: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/4.jpg)
4
DAG TypesO1 O2
O3
O4
O5• Logical Plan● Logical representation of
computation● Defines operators, streams and
dataflow
• Physical Plan● Deployable plan on cluster● Contains partition information of operators● Has ready-to-deploy serialized operatorinstances
Logical DAG
O1P1
O1P2
O1P3
O2P1
O2P2
O2P3
U
O3
O4
O5
Physical DAG
![Page 5: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/5.jpg)
5
➔ All operators in DAG go through
this life-cycle
➔ Managed by Apex Platform
➔ Governed by control tuples
Operator Lifecycle
![Page 6: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/6.jpg)
6
➔ Setup
◆ Start of operator lifecycle
◆ Do any initialization here
➔ beginWindow
◆ Marks starting of window
➔ endWindow
◆ Marks end of window
➔ teardown
◆ Do any finalization here
◆ End of operator lifecycle
Operator Lifecycle (contd...)
![Page 7: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/7.jpg)
7
Operator Lifecycle (contd...)➔ emitTuples
◆ Called for Input Adapters
◆ Called in an infinite while
loop by platform
➔ process
◆ Called for Generic Operators
and Output Adapters
◆ Associated to to a port
◆ Called for every incoming
tuple
![Page 8: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/8.jpg)
8
Operator Lifecycle (contd...)➔ OutputPort::emit
◆ Special method not part of
operator lifecycle
◆ To be called by operator code
◆ Emits the tuples to next
operator
◆ Bound by Window
![Page 9: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/9.jpg)
9
Input Operator (Adapter)
Output Operator (Adapter)
Generic Operators
LOGSReader Parser Counter OutputHDFS
Defining DAG
![Page 10: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/10.jpg)
10
• MyApplication implements StreamingApplicationᵒ Provide implementation for populateDAGᵒ Stitch the DAG
APIs : Application
![Page 11: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/11.jpg)
11
• SampleInputOperator implements InputOperator
ᵒ Define output ports ᵒ Define emitTuples method.
ᵒ Define beginWindow, endWindow, setup, teardown
APIs : InputOperator
![Page 12: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/12.jpg)
12
• SampleOperator extends BaseOperatorᵒ Define input ports, output ports ᵒ Define process methods
ᵒ Optional : Define beginWindow, endWindow, setup, teardown
APIs : GenericOperator, OutputOperator
![Page 13: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/13.jpg)
Application Specification (Java)
13
DAG API (compositional)
![Page 14: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/14.jpg)
Writing an Operator
14
![Page 15: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/15.jpg)
15
Writing an Operator
![Page 16: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/16.jpg)
Operator Library
16
RDBMS• Vertica• MySQL• Oracle• JDBC
NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode
Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi
File Systems• HDFS/ Hive• NFS• S3
Parsers• XML • JSON• CSV• Avro• Parquet
Transformations• Filters• Rules• Expression• Dedup• Enrich
Analytics• Dimensional Aggregations
(with state management for historical data + query)
Protocols• HTTP• FTP• WebSocket• MQTT• SMTP
Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter
![Page 17: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/17.jpg)
17
Java : 1.7.xmvn : 3.0 + git : 1.7 +Apache hadoop : How to : Single node cluster Apache Apex Core
ᵒ git clone [email protected]:apache/apex-core.gitᵒ cd apex-core/ᵒ git checkout masterᵒ mvn clean install -DskipTests
Apache Apex Malharᵒ git clone [email protected]:apache/apex-malhar.gitᵒ cd apex-malhar/ᵒ git checkout masterᵒ mvn clean install -DskipTests
DataTorrent RTS community edition
Building Apache Apex
![Page 18: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/18.jpg)
Monitoring ConsoleLogical View
18
Physical View
![Page 19: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/19.jpg)
Real-Time Dashboards
19
![Page 20: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/20.jpg)
Q&A
20
![Page 21: Deep Dive into Apache Apex App Development](https://reader036.fdocuments.us/reader036/viewer/2022081517/5871368b1a28abf0568b5d91/html5/thumbnails/21.jpg)
Resources
21
• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare:
http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+ape
x• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/