Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Building Enterprise Grade Applications in Yarn with Apache Twill
-
Upload
cask-data-inc -
Category
Technology
-
view
83 -
download
2
Transcript of Building Enterprise Grade Applications in Yarn with Apache Twill
![Page 1: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/1.jpg)
Build Enterprise Grade Applications in YARN with
Poorna Chandra [email protected]
Big Data App MeetupJuly 27, 2016
![Page 2: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/2.jpg)
Agenda● Hadoop YARN● Challenges in building enterprise applications● Apache Twill● Architecture● Features● Real World Enterprise Use Case - CDAP● Roadmap● Q & A
2
![Page 3: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/3.jpg)
First: The NEWS
to the Apache Twill Community!!!
Apache Twill is now a Top-Level Project of the ASF
Announcement: https://s.apache.org/Rzsf
3
![Page 4: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/4.jpg)
Apache Hadoop® YARN● MapReduce NextGen aka MRv2● Resource management vs job scheduling/monitoring● New ResourceManager manages the global assignment of compute
resources to applications● Introduce concept of ApplicationMaster per application to communicate
with ResourceManager for compute resource management● Enables more than MR jobs on cluster - like Apache Spark, etc.
4
![Page 5: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/5.jpg)
How YARN Application Works
5
![Page 6: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/6.jpg)
YARN is powerful, but...● Every application needs to write boilerplate code
○ Negotiate resources from RM○ Talk to NM to run jobs○ Monitor running jobs
● Every application needs to handle ○ High availability ○ Long running applications
■ Security aspects - delegation token expiry○ Easy scalability
6
![Page 7: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/7.jpg)
● Provides abstraction for YARN to reduce complexity to develop complex and large scale distributed applications
● Adds simplicity to the power of YARN○ Java thread-like programming
model● Reduces boilerplate code● Offers common needs for distributed
enterprise-grade application development○ Lifecycle management○ High Availability○ Scalability○ Service discovery
Simplification with Apache Twill
7
![Page 8: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/8.jpg)
Hello World in TwillDefine a TwillRunnable
public class HelloWorldRunnable extends AbstractTwillRunnable {
@Override
public void run() {
LOG.info("Hello World. My first distributed application.");
}
}
8
![Page 9: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/9.jpg)
Hello World in TwillLaunch it!
public class HelloWorld {
public static void main(String[] args) throws Exception {
TwillRunnerService twillRunner =
new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181");
twillRunner.startAndWait();
TwillController controller = twillRunner.prepare(new HelloWorldRunnable());
controller.start();
controller.awaitTermination();
//...
}
}9
![Page 10: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/10.jpg)
Major Features● Service Discovery● Placement Policy● Elastic Scaling● Command Messages● State Recovery
10
![Page 11: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/11.jpg)
11
Service Discovery
![Page 12: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/12.jpg)
Placement Policy● Placement policy can be used to address
○ Performance○ Availability○ Resource conflict
● Exposes container placement policy from YARN● Will allow Twill to allocate containers in specific racks and host based on
DISTRIBUTED deployment mode
12
![Page 13: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/13.jpg)
Elastic Scaling● Ability to add or reduce number of YARN containers to run the
application● Scale your application based on load● No need to restart the application● Twill API TwillController.changeInstances is used to accomplish
this task
13
![Page 14: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/14.jpg)
14
Command Messages
![Page 15: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/15.jpg)
15
State Recovery
![Page 16: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/16.jpg)
Real World Enterprise Usages - CDAP● Cask Data Application Platform (CDAP) - http://cdap.io
○ Open source application and integration framework for big data○ Simplifies and enhances data application development and management
■ APIs for simplification, portability and standardization● Works across wide range of Hadoop versions and all common distros
■ Built-in System services, such as metrics and logs aggregation, dataset
management, and distributed transaction service for common big data applications needs
○ Extensions to enhance user experience■ Hydrator - Interactive data pipeline construction■ Tracker - Metadata discovery and data lineage
16
![Page 17: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/17.jpg)
Apache Twill in CDAP● CDAP runs different types of processes on YARN
○ Long running daemons○ REST services○ Real-time transactional streaming framework○ Workflow execution
● CDAP only interacts with Twill○ Greatly simplifies the CDAP code base○ Just a matter of minutes to add support for new type of work to run on YARN
● Twill support of common needs○ Service discovery○ Leader election○ Elastic scaling○ Security
17
![Page 18: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/18.jpg)
CDAP Architecture
18
![Page 19: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/19.jpg)
Service Discovery● CDAP exposes all functionalities through REST● Almost all CDAP HTTP services are running in YARN
○ No fixed host and port○ Bind to ephemeral port○ Announce the host and port through Twill
■ Unique service name for a given service type
● Router inspects the request URI to derive a service name○ Uses Twill discovery service client to locate actual host and port○ Proxy the request and response
19
![Page 20: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/20.jpg)
Long Running Applications● All CDAP services on YARN are long running
○ Transaction server, metrics and log processing, real-time data ingestion, …
● Many user applications are long running too○ Real-time streaming, HTTP service, application daemon
● Secure cluster, specifically Kerberos enabled cluster○ All all Hadoop services use delegation token
■ NN, RM, HBase Master, Hive, KMS, ... ○ YARN containers don’t have the keytab, hence can’t update the token
20
![Page 21: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/21.jpg)
Long Running Applications in Twill● Twill provides support for updating delegation tokens
○ TwillRunner.scheduleSecureStoreUpdate
● Update delegation tokens from the launcher process (kinit process)○ Acquires new delegation tokens periodically○ Serializes tokens to HDFS○ Notifies all running applications about the update
■ Through command message○ Each runnable refreshes delegation tokens by reading from HDFS
■ Requires a non-expired HDFS delegation token
● New launcher process will discover all Twill apps from ZK○ Can run HA launcher processes using leader election support from Twill
21
![Page 22: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/22.jpg)
Scalability● Many components in CDAP are linearly scalable, such as
○ Streaming data ingestion (through REST endpoint)○ Log processing
■ Reads from Kafka, writes to HDFS○ Metrics processing
■ Reads from Kafka, writes to timeseries table○ User real-time streaming DAG○ User HTTP service
● Twill supports adding/reducing YARN containers for a given TwillRunnable○ No need to restart application○ Guarantees a unique instance ID is assigned
■ Application can use it for partitioning
● Dynamic scaling using service discovery22
![Page 23: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/23.jpg)
High Availability● In production environment, it is important to have high availability● Twill provides couple means to achieve that
○ Running multiple instances of the same TwillRunnable○ Use dynamic service discovery to route requests○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally
■ Killed container will be removed from the service discovery■ Restarted container will be added to the service discovery
○ Built-in leader election support to have active-passive type of redundancy■ Tephra service use that, as it requires only having one active server
○ Placement policy to make sure that instances run on different hosts
23
![Page 24: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/24.jpg)
Apache Twill in Enterprise● CDAP, which uses Twill, is being used by large enterprises in production● Apache Twill runs on different cluster types
○ AWS, Azure, bare metal, VMs
● Compatible with wide range of Hadoop versions○ Vanilla Hadoop 2.0 - 2.7○ HDP 2.1 - 2.3○ CDH 5○ MapR 4.1 - 5.1
24
![Page 25: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/25.jpg)
Roadmap● Generalize to run on more frameworks
○ Apache Mesos, Kubernetes
● Smarter containers management○ Run simple runnable in AM○ Multiple runnables in one container
● Fine-grained control of containers lifecycle○ When to start, stop and restart on failure
● Smaller footprint○ Optional Kafka, optional ZooKeeper
25
![Page 26: Building Enterprise Grade Applications in Yarn with Apache Twill](https://reader030.fdocuments.us/reader030/viewer/2022021420/587065791a28ab48378b4e31/html5/thumbnails/26.jpg)
Thank you!● Apache Twill is Open Source
○ http://twill.apache.org ○ [email protected] ○ @ApacheTwill
● Contributions are welcome!
26