Writing Yarn Applications Hadoop Summit 2012
-
Upload
hortonworks -
Category
Technology
-
view
108 -
download
1
description
Transcript of Writing Yarn Applications Hadoop Summit 2012
© Hortonworks Inc. 2011
Writing Application Frameworks on Apache Hadoop YARN
Hitesh Shah
Page 1
© Hortonworks Inc. 2011
Hitesh Shah - Background
• Member of Technical Staff at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems.
Page 2Architecting the Future of Big Data
© Hortonworks Inc. 2011
Agenda
•YARN Architecture and Concepts•Writing a New Framework
Page 3Architecting the Future of Big Data
© Hortonworks Inc. 2011
YARN Architecture
• Resource Manager–Global resource scheduler–Hierarchical queues
• Node Manager–Per-machine agent–Manages the life-cycle of container–Container resource monitoring
• Application Master–Per-application–Manages application scheduling and task execution–E.g. MapReduce Application Master
Page 4Architecting the Future of Big Data
© Hortonworks Inc. 2011
YARN Architecture
Page 5Architecting the Future of Big Data
© Hortonworks Inc. 2011
YARN Concepts
• Application ID–Application Attempt IDs
• Container–ContainerLaunchContext
• ResourceRequest–Host/Rack/Any match–Priority–Resource constraints
• Local Resource–File/Archive–Visibility – public/private/application
Page 6Architecting the Future of Big Data
© Hortonworks Inc. 2011
What you need for a new Framework
•Application Submission Client–For example, the MR Job Client
•Application Master–The core framework library
•Application History ( optional )–History of all previously run instances
•Auxiliary Services ( optional )–Long-running application-specific services running on the
NodeManager
Page 7Architecting the Future of Big Data
© Hortonworks Inc. 2011
Use Case: Distributed Shell
• Take a user-provided script or application and run it on a set of nodes in the Cluster
• Input:
–User Script to execute–Number of containers to run on–Variable arguments for each
different container–Memory requirements for the
shell script–Output Location/Dir
Page 8Architecting the Future of Big Data
NodeManager
Shell Script
NodeManager
Shell Script
NodeManager
DS AppMaster
© Hortonworks Inc. 2011
Client: RPC calls
• Uses ClientRM Protocol
• Get a new Application ID from the RM
• Application Submission
• Application Monitoring
• Kill the Application?
Page 9Architecting the Future of Big Data
© Hortonworks Inc. 2011
Client
• Registration with the RM–New Application ID
• Application Submission–User information–Scheduler queue–Define the container for the Distributed Shell App Master via
the ContainerLaunchContext
• Application Monitoring–AppMaster host details with tokens if needed, tracking url–Application Status (submitted/running/finished)
Page 10Architecting the Future of Big Data
© Hortonworks Inc. 2011
Defining a Container
• ContainerLaunchContext class–Can run a shell script, a java process or launch a VM
• Command(s) to run• Local resources needed for the process to run
–Dependent jars, native libs, data files/archives
• Environment to setup–Java Classpath
• Security-related data–Container Tokens
Page 11Architecting the Future of Big Data
© Hortonworks Inc. 2011
Application Master: RPC calls
• AMRM and CM protocols
• Register AM with RM
• Ask RM to allocate resources
• Launch tasks on allocated containers
• Manage tasks to final completion
• Inform RM of completion
Page 12Architecting the Future of Big Data
© Hortonworks Inc. 2011
Application Master
• Setup RPC to handle requests from Client and/or tasks launched on Containers
• Register and send regular heartbeats to the RM
• Request resources from the RM.
• Launch user shell script on containers as and when allocated.
• Monitor status of user script of remote containers and manage failures by retrying if needed.
• Inform RM of completion when application is done.
Page 13Architecting the Future of Big Data
© Hortonworks Inc. 2011
AMRM#allocate
• Request:–Containers needed
– Not a delta protocol
– Locality constraints: Host/Rack/Any
– Resource constraints: memory
– Priority-based assignments
–Containers to release – extra/unwanted?– Only non-launched containers
• Response:–Allocated Containers
– Launch or release
–Completed Containers– Status of completion
Page 14Architecting the Future of Big Data
© Hortonworks Inc. 2011
YARN Applications
• Data Processing:–OpenMPI on Hadoop –Spark (UC Berkeley)
– Shark ( Hive-on-Spark )
–Real-time data processing– Storm ( Twitter )
– Apache S4
–Graph processing – Apache Giraph
• Beyond data:–Deploying Apache HBase via YARN (HBASE-4329)–Hbase Co-processors via YARN (HBASE-4047)
Page 15Architecting the Future of Big Data
© Hortonworks Inc. 2011
References
•Doc on writing new applications:–WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0-alpha/ )
Page 16Architecting the Future of Big Data
© Hortonworks Inc. 2011
Questions?
Architecting the Future of Big DataPage 17
Thank You!
Hitesh [email protected]
© Hortonworks Inc. 2011
Appendix: Code Examples
Architecting the Future of Big DataPage 18
© Hortonworks Inc. 2011
Client: Registration
ClientRMProtocol applicationsManager;
YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress = NetUtils.createSocketAddr(
yarnConf.get(YarnConfiguration.RM_ADDRESS));
applicationsManager = ((ClientRMProtocol)
rpc.getProxy(ClientRMProtocol.class,
rmAddress, appsManagerServerConf));
GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
applicationsManager.getNewApplication(request);
Page 19Architecting the Future of Big Data
© Hortonworks Inc. 2011
Client: App Submission
ApplicationSubmissionContext appContext;
ContainerLaunchContext amContainer;
amContainer.setLocalResources(Map<String, LocalResource> localResources);
amContainer.setEnvironment(Map<String, String> env);
String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2 “;
amContainer.setCommands(List<String> commands);
Resource capability; capability.setMemory(amMemory); amContainer.setResource(capability);
appContext.setAMContainerSpec(amContainer);
SubmitApplicationRequest appRequest;
appRequest.setApplicationSubmissionContext(appContext);
applicationsManager.submitApplication(appRequest);
Page 20Architecting the Future of Big Data
© Hortonworks Inc. 2011
Client: App Monitoring
• Get Application Status
GetApplicationReportRequest reportRequest =
Records.newRecord(GetApplicationReportRequest.class); reportRequest.setApplicationId(appId);
GetApplicationReportResponse reportResponse =
applicationsManager.getApplicationReport(reportRequest);
ApplicationReport report = reportResponse.getApplicationReport();
• Kill the application
KillApplicationRequest killRequest =
Records.newRecord(KillApplicationRequest.class);
killRequest.setApplicationId(appId);
applicationsManager.forceKillApplication(killRequest);
Page 21Architecting the Future of Big Data
© Hortonworks Inc. 2011
AM: Ask RM for Containers
ResourceRequest rsrcRequest;
rsrcRequest.setHostName("*”); // hostname, rack, wildcard
rsrcRequest.setPriority(pri);
Resource capability; capability.setMemory(containerMemory);
rsrcRequest.setCapability(capability)
rsrcRequest.setNumContainers(numContainers);
List<ResourceRequest> requestedContainers;
List<ContainerId> releasedContainers;
AllocateRequest req;
req.setResponseId(rmRequestID);
req.addAllAsks(requestedContainers);
req.addAllReleases(releasedContainers);
req.setProgress(currentProgress);
AllocateResponse allocateResponse = resourceManager.allocate(req);
Page 22Architecting the Future of Big Data
© Hortonworks Inc. 2011
AM: Launch Containers
AMResponse amResp = allocateResponse.getAMResponse();
ContainerManager cm = (ContainerManager)rpc.getProxy
(ContainerManager.class, cmAddress, conf);
List<Container> allocatedContainers = amResp.getAllocatedContainers(); for (Container allocatedContainer : allocatedContainers) {
ContainerLaunchContext ctx;
ctx.setContainerId(allocatedContainer .getId());
ctx.setResource(allocatedContainer .getResource());
// set env, command, local resources, …
StartContainerRequest startReq;
startReq.setContainerLaunchContext(ctx);
cm.startContainer(startReq);
}
Page 23Architecting the Future of Big Data
© Hortonworks Inc. 2011
AM: Monitoring Containers
• Running ContainersGetContainerStatusRequest statusReq;
statusReq.setContainerId(containerId);
GetContainerStatusResponse statusResp =
cm.getContainerStatus(statusReq);
• Completed ContainersAMResponse amResp = allocateResponse.getAMResponse();
List<Container> completedContainersStatus =
amResp.getCompletedContainerStatuses();
for (ContainerStatus containerStatus : completedContainers) {
// containerStatus.getContainerId()
// containerStatus.getExitStatus()
// containerStatus.getDiagnostics()
}
Page 24Architecting the Future of Big Data
© Hortonworks Inc. 2011
AM: I am done
FinishApplicationMasterRequest finishReq;
finishReq.setAppAttemptId(appAttemptID);
finishReq.setFinishApplicationStatus
(FinalApplicationStatus.SUCCEEDED); // or FAILED
finishReq.setDiagnostics(diagnostics);
resourceManager.finishApplicationMaster(finishReq);
Page 25Architecting the Future of Big Data