Apache Apex Introduction with PubMatic
-
Upload
apache-apex -
Category
Technology
-
view
560 -
download
1
Transcript of Apache Apex Introduction with PubMatic
Apache ApexArchitecture
2
Apex Platform Overview
3
Apache Malhar Library
4
Native Hadoop Integration
• YARN is the resource manager
• HDFS used for storing any persistent state
5
Application Programming Model
Directed Acyclic Graph (DAG)
A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance in single-threaded
Directed Acyclic Graph (DAG) is made up of operations and streams
Filtered
Stream
Output Stream
Tuple Tuple
Filtered Stream
Enriched Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
6
Application Specification
Apex Engine
Core Features
8
Partitioning and Scaling Out
• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning• MxN partitioning • Unifiers
9
Advanced Windowing Support
Application window Sliding window and tumbling window Checkpoint window No artificial latency
10
Stateful Fault Tolerance Supported out of the box
– Application state– Application master state– No data loss
Automatic recovery Lunch test Buffer server
11
Processing Semantics At least once At most once Exactly once
12
Data Locality Stream locality for placement of operators
– Rack local – Distributed deployment– Node local – Data does not traverse NIC– Container local – Data doesn’t need to be serialized– Thread local – Operators run in same thread
Data locality
13
Dynamic Updates
Dynamic topology updates– Properties of operators can be changed– New operators can be added
14
ResourcesApache Apex Community Page
Apache Apex LinkedIn Group
15
Help Us Name the Apex MascotPoll on Meetup Page