HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...
-
Upload
xiao-qin -
Category
Technology
-
view
193 -
download
3
description
Transcript of HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...
Analysis of Data Placement Strategy based on Computing Power of Nodes onHeterogeneous Hadoop Clusters
Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin
Presentation Overview
● Synopsis● Mapreduce Programming Model Overview● HDFS Overview● Motivation● Design● Software Description● Hardware Description● Results● Conclusion
Synopsis
● Data placement strategy● Heterogeneous Clusters● Computing Power● Calculating Computing Ratio● WordCount and Grep
MapReduce Model
● Hadoop 1.0 and Hadoop 2.0● Master - Slave Model● JobTracker and TaskTracker Hadoop 1.0● YARN Hadoop 2.0● Resource Manager YARN● Application Manager YARN● Node Manager YARN● MapReduce Flow
Mapreduce Model
Mapreduce Model - 1.0
Mapreduce Model - YARN - 2.0
Mapreduce Model - Flow
HDFS
● Namenode● Datanode● Replication● Federated Namenodes
HDFS Architecture
HDFS Federated Namenodes
HDFS Federated Namenodes
● Scalability● Performance● Isolation - overload
Motivation
Software Description
● Hadoop 2.3.0● Maven● Eclipse● Protocol Buffers
Hardware Description
Design
Run WordCount and Grep Applications on individual nodes
Design
Calculate Computing Power of Individual Nodes fora specific application
Design
● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes
● Run the CRBalancer to balance the nodes● Finally re-run the applications to note the ramifications
of the data placement strategy.
Design - Algorithm
CRBalancer Strategy
Implementation
● CRBalancer ● CRBalancingPolicy● CRNamenodeConnector
Results - WordCount
Results - Grep
Questions ??