HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...

Analysis of Data Placement Strategy based on Computing Power of Nodes onHeterogeneous Hadoop Clusters

Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin

Presentation Overview

● Synopsis● Mapreduce Programming Model Overview● HDFS Overview● Motivation● Design● Software Description● Hardware Description● Results● Conclusion

Synopsis

● Data placement strategy● Heterogeneous Clusters● Computing Power● Calculating Computing Ratio● WordCount and Grep

MapReduce Model

● Hadoop 1.0 and Hadoop 2.0● Master - Slave Model● JobTracker and TaskTracker Hadoop 1.0● YARN Hadoop 2.0● Resource Manager YARN● Application Manager YARN● Node Manager YARN● MapReduce Flow

Mapreduce Model

Mapreduce Model - 1.0

Mapreduce Model - YARN - 2.0

Mapreduce Model - Flow

● Namenode● Datanode● Replication● Federated Namenodes

HDFS Architecture

HDFS Federated Namenodes

● Scalability● Performance● Isolation - overload

Motivation

Software Description

● Hadoop 2.3.0● Maven● Eclipse● Protocol Buffers

Hardware Description

Design

Run WordCount and Grep Applications on individual nodes

Design

Calculate Computing Power of Individual Nodes fora specific application

Design

● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes

● Run the CRBalancer to balance the nodes● Finally re-run the applications to note the ramifications

of the data placement strategy.

Design - Algorithm

CRBalancer Strategy

Implementation

● CRBalancer ● CRBalancingPolicy● CRNamenodeConnector

Results - WordCount

Results - Grep

Questions ??

HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...

Technology

Transcript of HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous...

JerrinJoseph Hadoop ppt - · PDF fileRelies on principles of Distributed File System. HDFS have a Master-Slave ... Non-Posted Write by Hadoop HDFS OPERATION. HDFS ... Architecture

Hadoop with Python - Amazon Web Services · CHAPTER 1 Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is a Java-based dis‐ tributed, scalable, and

Hadoop with Python - apphosting.io · 2016-10-11 · Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is a Java-based dis‐ tributed, scalable, and

Session2 - Hadoop Distributed File System...Hadoop Distributed File System (HDFS) What For Today!!! HDFS Features & Design Goals HDFS Operation Principle Data Locality, Rack Awareness

Hadoop HDFS and Oracle

Implementing Hadoop distributed ﬁle system (hdfs) Cluster ...

Apache Hadoop YARN, NameNode HA, HDFS Federation

Strata + Hadoop World 2012: HDFS: Now and Future

Introduction to Hadoop and HDFS

Hadoop Training #2: MapReduce & HDFS

HDFS: Hadoop Distributed FS

Introduction to hadoop and hdfs

Hadoop World: Hadoop Development at Facebook: Hive and HDFS

Map reduce & HDFS with Hadoop

Hdfs 2016-hadoop-summit-dublin-v1

HADOOP Interacting with HDFS - Meetupfiles.meetup.com/18978602/University Program - Interacting With HDFS.pdf · HADOOP Interacting with HDFS 1 For University Program on Apache Hadoop

HDFS and Hadoop

MASS HDFS: Multi-Agent Spatial Simulation Hadoop ...depts.washington.edu/dslab/MASS/reports/JasShih_sp17.pdfMASS HDFS: Multi-Agent Spatial Simulation Hadoop Distributed File System

Analysis of Data Placement Strategy based on Computing ...xqin/software/hdfs-hc/hdfs-hc2-report.pdf · Hadoop is an open source application rst developed by Yahoo inspired from the

Hadoop Interacting with HDFS