Eron Wright - Introducing Flink on Mesos
-
Upload
flink-forward -
Category
Data & Analytics
-
view
192 -
download
6
Transcript of Eron Wright - Introducing Flink on Mesos
![Page 2: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/2.jpg)
2of 15
What is Apache Mesos?
• A popular cluster manager (similar to YARN)• Makes available CPU, memory, & disk resources• Unique capabilities for storage services• Emerging as a foundation for data-centric, converged
infrastructure
• Provides a programming model for using cluster resources• A Mesos program is called a “framework”
• Packaged into an open-source distribution called DCOS• Prescribes best practices related to Mesos frameworks,
related services, etc.
![Page 3: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/3.jpg)
3of 15
Why Flink on Mesos?
• Flink works best on a cluster manager– Easy to scale each job independently– Externalize scheduling logic (fairness, quota, …)– Good job isolation
• Flink can benefit from unique Mesos capabilities– Disk resources– Dynamic resource management– Unique management features (e.g. inverse offers for controlled downscaling & maintenance)
![Page 4: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/4.jpg)
Demo
![Page 5: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/5.jpg)
Flink Master Process
![Page 6: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/6.jpg)
6of 15
Introduction
Flink Master Process
• The Flink Master Process is:– The “Application Master” for a single Flink cluster– A Mesos framework!
• Hosts numerous components:– Job Manager– Resource Manager (acts as Mesos scheduler)– Artifact Server (HTTP server for Mesos fetcher)
• Responsible for TM scaling and recovery– Handles JobManager scale change requests– Stores task state in ZooKeeper
host1host2
Master
JM
RM
HTTPD
TM TM
Mesos
![Page 7: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/7.jpg)
7of 15
How it Works
Flink Master Process
• Offer handling:– Uses Netflix Fenzo as an optimizer– Gathers offers until all tasks launched
• Recovery:– Stores intentional state in ZooKeeper– Master uses leader election– Mesos allows some time for recovery before killing
tasks
• Monitoring:– Detects task failure; launches replacement
automatically.
host1host2
Master
TM TM
4. Launch
Mesos2. Resource Offers
1. Register
5. Fetch (HTTP)
6. Status update
3. Optimize
![Page 8: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/8.jpg)
8of 15
Configuration
Flink Master Process (Con’t)
• Framework Info– mesos.resourcemanager.framework.secret– mesos.resourcemanager.framework.principal– mesos.resourcemanager.framework.role
• Mesos Master Info– mesos.master: (IP address or ZK lookup info)– mesos.failover-timeout
Note: no port configuration is necessary; Mesos automatically assigns ports.
![Page 9: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/9.jpg)
Dispatcher
![Page 10: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/10.jpg)
10of 15
Introduction
Dispatcher
• A highly-available service for launching Flink clusters.
• A Mesos framework!
• Accessed via REST by the CLI
• DCOS compatibility: – HTTP-based– Accessible via the Admin Router– (future) JWT authentication
• Aligned with FLIP-6
host1
1D
1C
1B
1A
host2
2D
2C
2B
2A
host3
3D
3C
3B
3A
host4
4D
4C
4B
4ADispatcher
Master
TM TM
TMTM
Master
CLI
TM
Mesos
![Page 11: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/11.jpg)
11of 15
Framework Hierarchy
Dispatcher (Con’t)
• Nesting of frameworks is a common Mesos pattern. Here, Marathon launches the dispatcher, which launches the Flink Master Process, etc.
• Architecturally, it avoids a dependency on the Marathon API. For example, Aurora could be used here in place of Marathon.
Dispatcher
Master
Marathon
TM
(Task)
(Task)
(Task)
![Page 12: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/12.jpg)
12of 15
Launching a Session
Dispatcher (Con’t)
• Use: mesos-session.sh
• CLI uploads files to dispatcher via HTTP– Flink Configuration– Supplemental files (--ship)– Keytabs– Certificates
• Dispatcher adds additional elements:– Configuration
› ZooKeeper Namespace
– Flink JAR– …
host1
1D
1C
1B
1A
host2
2D
2C
2B
2A
host3
3D
3C
3B
3A
host4
4D
4C
4B
4ADispatcher
Master
TM TM
CLI
HTTP(S)
TM
HTTP(S)
Mesos
![Page 13: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/13.jpg)
13of 15
Dispatcher Deployment Modes
Dispatcher (Con’t)
• Dispatcher is usable in two ways
• Remote Mode:– Recommended for detached execution
• Local Mode:– Recommended for simple, interactive sessions
(e.g. flink shell)
3C
3B
3A
4C
4B
4ADispatcher
Master
Master
CLI
HTTP(S)
3C
3B
3A
4C
4B
4A
Master
CLI + Dispatcher
Local Mode Remote Mode
![Page 14: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/14.jpg)
Summary
![Page 15: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/15.jpg)
15of 15
Future Directions
• Dynamic Scaling– Add/remove Task Managers in response to scale changes over a job’s lifetime– Support Mesos maintenance procedures (e.g. inverse offers)
• Dispatcher Evolution (FLIP-6)– Generalize to support all deployment scenarios, unified CLI – Provide a centralized Web UI (incl. job history)– Authentication Support (e.g. OAuth 2.0)
• Docker Image Support– Tracking the “Mesos unified containerizer”
• Mesos Disk Support– Allocate multiple disks for Task Manager temp space– Scale up the I/O
![Page 16: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/16.jpg)
16of 15
Project Status
• Targeted for: Flink 1.2
• Contributors:– Eron Wright (Dell EMC)– Maximilian Michels (data Artisans)
• Design Doc: – Mesos Integration on Google Docs
• JIRAs:– FLINK-1984 – Integrate Flink with Apache Mesos
• Code:– https://github.com/EronWright/flink/tree/feature-FLINK-1984-T2
![Page 17: Eron Wright - Introducing Flink on Mesos](https://reader031.fdocuments.us/reader031/viewer/2022030317/5871387f1a28abf0568b63a7/html5/thumbnails/17.jpg)