YARN Ready - Integrating to YARN using Slider Webinar
-
Upload
hortonworks -
Category
Technology
-
view
989 -
download
5
description
Transcript of YARN Ready - Integrating to YARN using Slider Webinar
© Hortonworks Inc. 2014
YARN Ready – Apache Slider
Provisioning, Managing, and Monitoring YARN Applications
Sumit Mohanty
@smohanty (@hortonworks)
Steve Loughran
@steveloughran (@hortonworks)
Page 1
© Hortonworks Inc. 2014
Agenda
• Long running applications on YARN• Introduction to Slider• Writing a Slider Application• Key Slider Features• Conclusion• Q/A
Page 2
© Hortonworks Inc. 2014
Applications on Yarn
Page 3
© Hortonworks Inc. 2014
YARN runs code across the cluster
Page 4
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager“The RM”
HDFS
YARN Node Manager
• Servers run YARN Node Managers• NM's heartbeat to Resource Manager• RM schedules work over cluster• RM allocates containers to apps• NMs start containers• NMs report container health
© Hortonworks Inc. 2014
Client creates App Master
Page 5
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager“The RM”
HDFS
YARN Node Manager
ClientApplication Master
© Hortonworks Inc. 2014
AM asks for containers
Page 6
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
Container
© Hortonworks Inc. 2014
YARN notifies AM of failures
Page 7
HDFS
YARN Node Manager
HDFS
YARN Node Manager
Container
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
© Hortonworks Inc. 2014
Long Running Applications
Page 8
© Hortonworks Inc. 2014
Management
Page 9
• Application instance must be managed– (Install/Configure/Start)–Restart–Reconfigure/Rolling update–Stop/Graceful stop–Status–Activate/deactivate/rebalance
• Upgrade–Long running applications need to provide upgrade support,
preferably rolling upgrade
© Hortonworks Inc. 2014
Registration and Discovery
Page 10
• Application must declare itself–URLs–Host/port–Config (client config)
• Application must be discoverable–Registry–Name-based lookups–Regularly updated
• Client support–Callback if “data” changes; thick clients–Configurable gateway; thin clients
© Hortonworks Inc. 2014
Monitoring
Page 11
• Metrics– Instantaneous metrics (jmx)–Time-series metrics (ganglia)–Configure Ganglia or other metrics stores
• Alerts–Based of jmx/port scan/container status–Configure Nagios or other alerting mechanism
© Hortonworks Inc. 2014
Logs and Events
Page 12
• Logs–Continuous log gathering–Single view for logs across all containers
• Lifecycle Events– Integration with Application Timeline Server
© Hortonworks Inc. 2014
In addition to …
Page 13
• Security–Configured for security–Token renewal
• High Availability–On a highly available cluster (NN, RM HA)– Itself highly available (multi-master)
• Packaging• Configurability• …
© Hortonworks Inc. 2014
Apache Slider
Page 14
© Hortonworks Inc. 2014
Why?
• Many mature applications exist• Full YARN-integration takes effort• Running under YARN delivers access to all the data in HDFS –and the CPU power alongside it
• As Hadoop stack evolves, more to integrate with• Management tools –e.g. Ambari– exist to monitor applications in-cluster
Page 15
© Hortonworks Inc. 2014
Slider is an in-incubation project with one goal:
Make it possible and easy to deploy and manage existing applications on a YARN cluster
Page 16
Status: Currently in Tech Preview
GA with the next HDP release, tentatively November
© Hortonworks Inc. 2014
Slider view of an Application
Page 17
• An application is a set of components• A component is a daemon/launched exe
– configuration– scripts, data files, etc.
• Component may have one or more instances• Component instances are managed
–By extension, the app instance is
• Example–HBase Application (3 components)
– HBase Master– HBase RegionServer– HBase REST service
© Hortonworks Inc. 2014
YARN Containers with Slider
Page 18
YARN Node Manager
Component (container)AppMaster (container)
YARN Node Manager
HDFS
Slider Agent
Application
Slider AppMaster
Slider Client
HDFS
HDFS
YARN Resource Manager
© Hortonworks Inc. 2014
Application by Slider
Page 19
SliderApp Package
SliderCLI
HDFS
YARN Resource Manager“The RM”
HDFS
YARN Node Manager
Agent Component
HDFS
YARN Node Manager
Agent Component
Similar to any YARN application
1. CLI starts an instance of the AM
2. AM requests containers
3. Containers activate with an Agent
4. Agent gets application definition
5. Agent registers with AM
6. AM issues commands
7. Agent reports back, status,configuration, etc.
8. AM publishes endpoints, configurations
Application Registry
App Master/Agent Provider
1
2
3
3
4
4
5 5
6
8
7
6
7
© Hortonworks Inc. 2014
Slider AppMaster/Agent/Client
Page 20
• AppMaster–Common YARN interactions–Common *-client interactions–Publishing needs
• Agent–Configure and start–Re-configure and restart–Heartbeats & failure detection–Port allocations and publishing–Custom commands if any (e.g. graceful-stop)
• Client–App life cycle commands (flex, status, …)
© Hortonworks Inc. 2014
Memcached on YARNSample Slider App
Page 21
© Hortonworks Inc. 2014
Other Application Packages
Page 23
• Reference doc for Memcached Application–http://slider.incubator.apache.org/docs/slider_specs/
hello_world_slider_app.html
• Slider github repo has other app–Accumulo–HBase–Storm–Memcached-windows
© Hortonworks Inc. 2014
Other Capabilities
Page 24
© Hortonworks Inc. 2014
App Packaging Capabilities
Page 25
• Dynamic port allocation and sharing• Inter-component dependency
–Specify the start order of components
• Exports–Construct arbitrary name value pairs–E.g. URLs (org.apache.slider.monitor: http://${HBASE_MASTER_HOST}:$
{site.hbase-site.hbase.master.info.port}/master-status)
• Default HDFS and ZK isolation
© Hortonworks Inc. 2014
Application Registry
Page 26
• A common problem (not specific to Slider)– https://issues.apache.org/jira/browse/YARN-913
• Currently,– Apache Curator based– Register URLs pointing to actual data– AM doubles up as a webserver for published data
• Plan– Registry should be stand-alone– Slider is a consumer as well as publisher– Slider focuses on declarative solution for Applications to publish data– Allows integration of Applications independent of how they are hosted
© Hortonworks Inc. 2014
Plan: YARN Service Registry
# YARN-wide registry in Zookeeper# Services listed by (user, service class, name)/yarnRegistry/users/sumit/slider/cluster1
# Ephemeral liveness node /yarnRegistry/users/sumit/slider/cluster1/live
# service entry lists bindings: URLs, IPC (host, port), ZK
# individual components have own (ephemeral) entries & endpoints/yarnRegistry/users/sumit/slider/cluster1/components/appmaster
# ZK R/W API, REST read-only API
Page 27
© Hortonworks Inc. 2014
Security
Page 28
• Applications validated to work in Kerberos secured cluster–Secure cluster created and keytabs available to application
components–Security parameters specified in application configuration–User obtains TGT (kinit) prior to Slider application creation–E.g. HBase 0.98.4
• Agent-AM SSL communication–One-Way by default–Two-Way can be enabled
• Work initiated on ticket renewal for long running applications–YARN, HDFS
© Hortonworks Inc. 2014
Failure Handling
Page 29
• Application Component Failure–Component instance restarted
• AppMaster failure–YARN restarts the AppMaster, Slider reconstructs states, registry–App lifecycle commands are temporarily unavailable
• NodeManager failure–App remains unaffected
• ResourceManager/NodeManager failures with HA–App remains unaffected
© Hortonworks Inc. 2014
Windows and Linux Support
Page 30
• Feature set parity on both platforms• Similar packaging constructs
–Typically, only path spec needs to change
• Both Linux and Windows Server as a platform for–Client (host slider-client)–Cluster (host hadoop cluster)
© Hortonworks Inc. 2014
Join in: Bring your favorite Applications to YARN
Page 31
© Hortonworks Inc. 2014
Slider-ifying a new application
1. Grab slider: http://slider.incubator.apache.org/downloads/
2. Look at App Package docs:http://slider.incubator.apache.org/docs/slider_specs/
3. Look at source code examples under app-packages
4. Start with memcached/memcached-windows
Page 32
© Hortonworks Inc. 2014
YARN API vs. Slider
Page 33
• Native YARN app– Your own AppMaster is in charge: container placement, fault handling– You can implement an IPC API for callers to manipulate the application– AppMaster can send out event notifications
Ideal for large-scale distributed algorithms, with specific placement and scheduling needs
• Slider App– Slider AppMaster handles YARN integration with best-effort placement
history, fault handling (recreate component instance)– Simple API/Web UI for cluster manipulation, endpoint listing– Lots of failure and security testing– You only need to write the App package (& test)
Long-lived applications where failures can addressed by restarting elsewhere, with flexing decisions by admins
© Hortonworks Inc. 2014
Everyone is welcome
• Useful Links–Website
– http://slider.incubator.apache.org/
–Dev Mailing Lists– [email protected]
– JIRA– https://issues.apache.org/jira/browse/SLIDER
• Current and Upcoming Releases– Slider 0.30 (May)– Slider 0.40 (July)– Slider 0.50 (planned)
Page 34
© Hortonworks Inc. 2014
Q/Ahttp://slider.incubator.apache.org/
Page 35
Next Steps
5Next Steps
1. Review YARN Slider Resources
2. Review webinar recordingor attend the next webinar
3. Attend Office Hours
4. Sign up for a 2 day class
5. Attend the next YARN webinar
ResourcesSetup HDP 2.1 environment• Leverage Sandbox: Hortonworks.com/Sandbox
Get Started with YARN
• http://hortonworks.com/get-started/YARN
Technical Preview
• http://hortonworks.com/blog/apache-slider-technical-preview-now-available/
Apache• http://slider.incubator.apache.org/
Dev Mailing Lists• [email protected]
JIRA• https://issues.apache.org/jira/browse/SLIDER
Hortonworks Office Hours
YARN Office HoursDial in and chat with YARN experts
Next Office Hour: Thursday August 14 @ 10-11am PDT. Register:
https://hortonworks.webex.com/hortonworks/onstage/g.php?t=a&d=628190636
We plan Office Hours for September 11th and October 9th @ 10am PT (2nd Thursdays)
Invitations will go out to those that attended or reviewed YARN webinars
And from Hortonworks University
Hortonworks Course: Developing Custom YARN ApplicationsFormat: Online
Duration: 2 Days
When: September – date tbd
Cost: No Charge to Hortonworks Partners
Space: Very Limited
Interested? Please contact Lisa
Next in the Series!
Join us for the full series of YARN Ready webinars:
YARN Native July 24 @ 9am PT (recording link)
Tez August 21 @ 9am PT (registration link)
Additional webinar topics are being added – watch the blog or visit Hortonworks.com/webinars:
September: Ambari and Scalding
October: Spark
http://hortonworks.com/webinars
© Hortonworks Inc. 2014
Thank you.
Page 42