Post on 10-May-2015
description
Performance Issues onHadoop Clusters
Jiong Xie
Advisor: Dr. Xiao Qin
Committee Members:
Dr. Cheryl Seals
Dr. Dean Hendrix
University Reader:
Dr. Fa Foster Dai
04/11/23 1
Overview of My Research
04/11/23 2
Data Placementon Heterogeneous
Cluster[HCW 10]
Data movementData locality Data shuffling
Prefetching Data from Disk to Memory
[Submit to IPDPS]
Reduce network congest
[To Be Submitted]
Data-Intensive Applications
04/11/23 3
Data-Intensive Applications (cont.)
04/11/23 4
Background
• MapReduce programming model is growing in popularity
• Hadoop is used by Yahoo, Facebook, Amazon.
04/11/23 5
Hadoop Overview--Mapreduce Running System
04/11/23 6
(J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150)
Hadoop Distributed File System
04/11/23 7
(http://lucene.apache.org/hadoop)
Motivations
• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates
04/11/23 8
Existing Hadoop Clusters
• Observation 1: Cluster nodes are dedicated– Data locality issues– Data transfer time
• Observation 2: The number of nodes is increased Scalability issues Shuffling overhead goes up
04/11/23 9
Proposed Solutions
04/11/23 10
P3: Preshuffling
P1: Data placement
P2: Prefetching
InputInput
OutputOutput
MapMap
MapMap
MapMap
MapMap
MapMap
ReduceReduce
ReduceReduce
ReduceReduce
Solutions
04/11/23 11
P3: Preshuffling
P1: Data placement
P2: Prefetching
Offline, distributed data, heterogeneous node
Online, data preloading Intermediate data movement, reducing traffic
Improving MapReduce Performance through Data
Placement in Heterogeneous Hadoop Clusters
04/11/23 12
Motivational Example
04/11/23 1313
Time (min)
Node A(fast)
Node B(slow)
Node C(slowest)
2x slower
3x slower
1 task/min
The Native Strategy
04/11/23 14
Node A
Node B
Node C
3 tasks
2 tasks
6 tasks
Loading Transferring Processing
Time (min)
Our Solution--Reducing data transfer time
04/11/23 15
Node A’
Node B’
Node C’
3 tasks
2 tasks
6 tasks
Loading Transferring Processing
Time (min)
Node A
Challenges
04/11/23 16
• Does distribution strategy depend on applications?
• Initialization of data distribution
• The data skew problems– New data arrival– Data deletion – Data updating– New joining nodes
Measure Computing Ratios
04/11/23 17
• Computing ratio
• Fast machines process large data sets
Time
Node A
Node B
Node C
2x slower
3x slower
1 task/min
Measuring Computing Ratios
04/11/23 18
Node Response time(s)
Ratio # of File Fragments
Speed
Node A 10 1 6 Fastest
Node B 20 2 3 Average
Node C 30 3 2 Slowest
1. Run an application, collect response time
2. Set ratio of a node offering the shortest response time as 1
3. Normalize ratios of other nodes
4. Calculate the least common multiple of these ratios
5. Determine the amount of data processed by each node
Initialize Data Distribution
04/11/23 19
Namenode
Datanodes
112233
File1445566
778899
aabb
cc
• Input files split into 64MB blocks
• Round-Robin data distribution algorithm
CBA
Portions 3:2:1
Data Redistribution
04/11/23 2020
1
1.Get network topology, ratio, and utilization
2.Build and sort two lists:under-utilized node list L1
over-utilized node list L2
3. Select the source and destination node from the lists.
4.Transfer data
5.Repeat step 3, 4 until the list is empty.
Namenode
1122
33
4455
66778899
aabbcc
CA
CBA
B
234
L1
L2
Portion 3:2:1
Experimental Environment
04/11/23 21
Five nodes in a Hadoop heterogeneous cluster
Node CPU Model CPU(Hz) L1 Cache(KB)
Node A Intel core 2 Duo 2*1G=2G 204
Node B Intel Celeron 2.8G 256
Node C Intel Pentium 3 1.2G 256
Node D Intel Pentium 3 1.2G 256
Node E Intel Pentium 3 1.2G 256
Benckmarks
• Grep: a tool searching for a regular expression in a text file
• WordCount: a program used to count words in a text file
• Sort: a program used to list the inputs in sorted order.
04/11/23 22
Response Time of Grep andWordcount in Each Node
04/11/23 23
Application dependenceComputing ratio is
Data size independence
Computing Ratio for Two Applications
04/11/23 24
Computing ratio of the five nodes with respective of Grep and Wordcount applications
Computing Node Ratios for Grep Ratios for Wordcount
Node A 1 1
Node B 2 2
Node C 3.3 5
Node D 3.3 5
Node E 3.3 5
Six Data Placement Decisions
04/11/23 25
Impact of data placement on performance of Grep
04/11/23 26
Impact of data placement on performance of WordCount
04/11/23 27
Summary of Data Placement
P1: Data Placement Strategy• Motivation: Fast machines process large data sets• Problem: Data locality issue in heterogeneous
clusters• Contributions: Distribute data according to
computing capability– Measure computing ratio– Initialize data placement– Redistribution
04/11/23 28
Predictive Scheduling and Prefetching for Hadoop clusters
04/11/23 29
Prefetching
• Goal: Improving performance
• Approach– Best effort to guarantee data locality.– Keeping data close to computing nodes– Reducing the CPU stall time
04/11/23 30
Challenges
• What to prefetch?
• How to prefetch?
• What is the size of blocks to be prefetched?
04/11/23 31
Dataflow in Hadoop
04/11/23 32
mapmap
mapmap
reducereduce
reducereduce
HDFSHDFS
Block 1
Block 2
3.Read Input
1.Submit job
2.Schedule
Local FS
Local FS
Local FS
Local FS
4. Run map
5.he
artb
eat
6. N
ext t
ask
7.Read new file
Dataflow in Hadoop
04/11/23 33
mapmap
mapmap
reducereduce
reducereduce
HDFSHDFS
Block 1
Block 2
3.Read Input
1.Submit job
2.Schedule+ more task+ meta data
Local FS
Local FS
Local FS
Local FS
4. Run map
5.he
artb
eat
6. N
ext t
ask
5.1.Read new file
6. N
ext t
ask
4. Run map
Prefetching Processing
04/11/23 34
6
7
8
Software Architecture
04/11/23 35
Grep Performance
04/11/23 36
9.5% 1G8.5% 2G
WordCount Performance
04/11/23 37
8.9% 1G8.1% 2G
Large/Small file in a node
04/11/23 38
9.1% Grep8.3% WordCount
18% Grep24% WordCount
Experiment Setting
04/11/23 39
Large/Small file in cluster
04/11/23 40
Summary
P2: Predictive Scheduler and Prefetching• Goal: Moving data before task assigns• Problem: Synchronization task and data• Contributions: Preloading the required data early
than the task assigned– Predictive scheduler– Prefetching mechanism– Worker thread
04/11/23 41
Adaptive Preshuffling in Hadoop clusters
04/11/23 42
Preshuffling
• Observation 1: Too much data move from Map worker to Reduce worker– Solution1: Map nodes apply pre-shuffling
functions to their local output
• Observation 2: No reduce can start until a map is complete.– Solution2: Intermediate data is pipelined
between mappers and reducers. 04/11/23 43
Preshuffling
• Goal : Minimize data shuffle during Reduce
• Approach– Pipeline– Overlap between map and data movement– Group map and reduce
• Challenges– Synchronize map and reduce– Data locality
04/11/23 44
Dataflow in Hadoop
04/11/23 45
mapmap
mapmap
reducereduce
reducereduce
HDFSHDFS
Block 1
Block 2
3.Read Input
1.Submit job 2.Schedule
Local FS
Local FS
Local FS
Local FS
5.he
artb
eat
6. N
ext t
ask
2. New task
HTTP GET
4. Run map3. Request data
HDFSHDFS
5.Write data
4. Send data
PreShuffle
04/11/23 46
Data request
mapmap
mapmap
reducereduce
reducereduce
In-memory buffer
04/11/23 47
Pipelining – A new design
04/11/23 48
HDFSHDFSHDFSHDFS
Block 1
Block 2
mapmap
mapmap
reducereduce
reducereduce
WordCount Performance
04/12/23 49
230 seconds vs 180 seconds
WordCount Performance
04/12/23 50
Sort Performace
04/12/23 51
Summary
P3: Preshuffling• Goal: Minimize data shuffling during the Reduce• Problem: task distribution and synchronization • Contributions: preshuffling agorithm
– Push data instead of tradition pull– In-memory buffer– Pipeline
04/12/23 52
Conclusion
04/12/23 53
InputInput
Output
Output
P3: Preshuffling
P1: Data placement
P2: Prefetching
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
Reduce
Reduce
Reduce
Reduce
Reduce
Reduce
Offline, distributed data, heterogeneous node
Online, data preloading, single node
Intermediate data movement, reducing traffic
Future Work
• Extend Pipelining– Implement the pipelining design
• Small files issue– Har file– Sequence file– CombineFileInputFormat
• Extend Data placement
04/12/23 54
Thanks!And Questions?
55
Run Time affected by Network Condition
04/12/23 56
Experiment result conducted by Yixian Yang
Traffic Volume affected by Network Condition
04/12/23 57
Experiment result conducted by Yixian Yang