www.edureka.co/hadoop-admin
Setting High Availability in Hadoop Cluster
www.edureka.co/hadoop-admin
What will you learn today?
Hadoop: A synonym for Big Data
Hadoop High Availability
Hands-On: Achieving NameNode and YARN high availability
Hands-On: Securing HDFS through ACL
Hadoop as a Data Warehouse
www.edureka.co/hadoop-admin
What is Hadoop?
Apache Hadoop is an open source, scalable and reliable solution that stores and allows distributed processing of large data sets across clusters of computers using simple programming model
www.edureka.co/hadoop-admin
A closer look at Apache Hadoop
Apache Hadoop includes following modules :
Hadoop Distributed File System (HDFS): A distributed file system
Hadoop Common: The common utilities that support the other Hadoop modules
Hadoop YARN: A framework for job scheduling and cluster resource management
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
www.edureka.co/hadoop-admin
High Availability
www.edureka.co/hadoop-admin
Maintaining High Availability
In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability
NameNode - No Horizontal Scale NameNode - No High Availability
DataNode
DataNode
DataNode
….
Client get Block Locations
Read Data
NameNodeNS
Block Management
www.edureka.co/hadoop-admin
NameNode: Single Point of Failure
SecondaryNameNode
NameNode
Secondary NameNode:
"Not a hot standby" for the NameNode
Connects to NameNode every hour*
Housekeeping, backup of NemeNode metadata
Saved metadata can build a failed NameNode
metadata
metadata
Single PointFailure
You give me metadata
every hour, I will make it
secure
www.edureka.co/hadoop-admin
Hadoop 2.0 Cluster Architecture: High Availability
Node Manager
HDFS
YARN
Resource Manager
Shared edit logs
All name space edits logged to shared NFS storage; single writer
(fencing)
Read edit logs and applies to its own namespace
Secondary Name Node
DataNode
Standby NameNode
Active NameNode
ContainerApp
Master
Node Manager
DataNode
ContainerApp
Master
Data Node
Client
DataNode
ContainerApp
Master
Node Manager
DataNode
ContainerApp
Master
Node Manager
NameNode High Availability
Next Generation MapReduce
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
HDFS HIGH AVAILABILITY
www.edureka.co/hadoop-admin
NN ActiveNN
Standby
DN 1 DN 2 DN n
Shared storage
Failover ControllerActive
ZK ZK ZK
Failover Controller Standby
Heartbeat Heartbeat
Monitors NN’s Health
Monitors NN’s Health
Block Reports to Active and standby NN: Update cmds from one
Sharead NN state with single writer(fencing)
HDFS
Cmds
www.edureka.co/hadoop-admin
ZooKeeperRMState
ZooKeeperRMState
ZKFC
Resource ManagerActive
ZKFC
Resource ManagerPassive
1. Active Node stores all state in ZKStore
2. Failure 4. Failover
3. Standby Nodebecome active
3. ZKFC Detects failure
www.edureka.co/hadoop-admin
Monitor liveness &
heath
zookeeper
Journal Node
zookeeper
zookeeper
Journal Node
Journal Node
ZookeeperFC
NameNode
StandbyNameNode
Active
DataNode DataNode DataNode
ZookeeperFC
Zookeeper Service
Shared Edits
Monitor and maintain
active lockMonitor and try to take active lock
Monitor liveness &
heath
ReadWrite
www.edureka.co/hadoop-admin
Hands-OnAchieving HDFS and YARN High Availability
www.edureka.co/hadoop-admin
Hands-OnSecuring HDFS through ACL
www.edureka.co/hadoop-admin
What to do with Big Data?
www.edureka.co/hadoop-admin
Hadoop: The Perfect Data Warehouse
Free TextImages/Videos
HCatalog
HiveSQL Others …ImpalaSQL
Tableau CognosQlikView
LogsTransaction Sensors
Pentaho
HDFS Files
Metadata
Query Engines
BI Tools
www.edureka.co/hadoop-admin
What a Data Warehouse is good at?
Among others, a data warehouse is the foundation for a successful business intelligence program
The Data Warehouse Institute
www.tdwi.org
www.edureka.co/hadoop-admin
Thank You …
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours
Top Related