Apache Apex Introduction with PubMatic

15
Apache Apex Architecture

Transcript of Apache Apex Introduction with PubMatic

Page 1: Apache Apex Introduction with PubMatic

Apache ApexArchitecture

Page 2: Apache Apex Introduction with PubMatic

2

Apex Platform Overview

Page 3: Apache Apex Introduction with PubMatic

3

Apache Malhar Library

Page 4: Apache Apex Introduction with PubMatic

4

Native Hadoop Integration

• YARN is the resource manager

• HDFS used for storing any persistent state

Page 5: Apache Apex Introduction with PubMatic

5

Application Programming Model

Directed Acyclic Graph (DAG)

A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library

• Operator has many instances that run in parallel and each instance in single-threaded

Directed Acyclic Graph (DAG) is made up of operations and streams

Filtered

Stream

Output Stream

Tuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

Page 6: Apache Apex Introduction with PubMatic

6

Application Specification

Page 7: Apache Apex Introduction with PubMatic

Apex Engine

Core Features

Page 8: Apache Apex Introduction with PubMatic

8

Partitioning and Scaling Out

• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning• MxN partitioning • Unifiers

Page 9: Apache Apex Introduction with PubMatic

9

Advanced Windowing Support

Application window Sliding window and tumbling window Checkpoint window No artificial latency

Page 10: Apache Apex Introduction with PubMatic

10

Stateful Fault Tolerance Supported out of the box

– Application state– Application master state– No data loss

Automatic recovery Lunch test Buffer server

Page 11: Apache Apex Introduction with PubMatic

11

Processing Semantics At least once At most once Exactly once

Page 12: Apache Apex Introduction with PubMatic

12

Data Locality Stream locality for placement of operators

– Rack local – Distributed deployment– Node local – Data does not traverse NIC– Container local – Data doesn’t need to be serialized– Thread local – Operators run in same thread

Data locality

Page 13: Apache Apex Introduction with PubMatic

13

Dynamic Updates

Dynamic topology updates– Properties of operators can be changed– New operators can be added

Page 14: Apache Apex Introduction with PubMatic

14

ResourcesApache Apex Community Page

Apache Apex LinkedIn Group