On the Duality of Data- intensive File System Design ...wtantisi/files/hadooppvfs-pdl11-talk.pdf ·...
Transcript of On the Duality of Data- intensive File System Design ...wtantisi/files/hadooppvfs-pdl11-talk.pdf ·...
-
On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS
Wittawat Tantisiriroj Swapnil Patil, Garth Gibson (CMU)
Seung Woo Son, Samuel J. Lang, Robert B. Ross (ANL)
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 2
Motivation Applications become more data-intensive
Large input data set (e.g. the entire web) Distributed, parallel application execution
Many new distributed file systems Commodity hardware and network Purpose-built for anticipated workloads New, diverse semantics (non-POSIX)
-
MPI MPI
File systems for data-intensive workloads
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 3
Can existing parallel file systems support new workloads?
S3
GFS
-
Why use parallel file systems? Handle a wide variety of workloads
High concurrent reads and writes Small file support, scalable metadata
Low capacity overhead reliability solutions
Standard Unix FS interface Support POSIX semantics
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 4
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 5
In this work ! Can we use an existing parallel file system for
Hadoop workloads? PVFS: parallel file system HDFS: file system designed for Hadoop
Goal: No modification to Hadoop or PVFS
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 6
Outline !Shim layer & vanilla PVFS
Extra performance via shim layer
Replication
Evaluation
-
PVFS shim layer under Hadoop
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 7
Hadoop application
Hadoop framework
File system API
HDFS client library
HDFS servers
PVFS shim layer (via JNI)
Unmodi!ed PVFS client library (C)
Unmodi!ed PVFS servers
Client Server
-
Preliminary evaluation Text search (grep)
Common workload in Hadoop applications
Searchs for rare pattern in 100-byte records 50GB dataset 50 nodes Each node serves as storage and compute nodes
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 8
-
0
100
200
300
400
500
600
Grep Benchmark
Com
plet
ion
Tim
e (s
econ
ds)
Vanilla PVFS HDFS
Vanilla PVFS* is disappointing
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 9
* with 64MB chunk size to match HDFS
8.5X
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 10
Outline Shim layer & vanilla PVFS
!Extra performance via shim layer Prefetching File layout information
Replication
Evaluation
-
Prefetching is useful for Hadoop Hadoop typical workload:
Small read (less than 128KB) Sequential through entire chunk
Prefetching can be implemented in shim layer In Hadoop, file becomes immutable after closed No need for cache coherence mechanism
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 11
-
0
100
200
300
400
500
600
Grep Benchmark
Com
plet
ion
Tim
e (s
econ
ds)
Vanilla PVFS PVFS w/ prefetching HDFS
Improving but still 2X slower
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 12
2X
-
Collocation in Hadoop Ships computation to where data is located
Helps reduce network traffic
Requires file layout information Describes where chunks are located
Already available at PVFS client for fast I/O Exposes this information to Hadoop in shim layer
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 13
-
0
100
200
300
400
500
600
Grep Benchmark
Com
plet
ion
Tim
e (s
econ
ds)
Vanilla PVFS PVFS w/ prefetching PVFS w/ prefetching and layout HDFS
Now, comparable performance
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 14
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 15
Outline Shim layer & vanilla PVFS
Extra performance via shim layer
!Replication
Evaluation
-
Replication vs RAID HDFS: software-based replication
1 local copy on writers disks, 2 copies random PVFS: hardware-based RAID
Expensive RAID controller
Can we use PVFS with replication instead? Get rid of RAID controller Implemented in shim layer
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 16
-
Data layout schemes
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 17
HDFS: 1 copy on writers disks, 2 copies random PVFS: 3 copies striped in file (straightforward way) PVFS Local: 1 copy on writers disks, 2 striped
Need support from PVFS developers
Writer
4 3 2 1 0
Server1
4 3 1 0
Server2
3 2 0
Server3
4 2 1
PVFS Local
Writer
4 2 1 0
Server1
4 3 1 0
Server2
4 3 2 0
Server3
3 2 1
PVFS
Writer
4 3 2 1 0
Server1
4 3 2 0
Server2
4 2 1
Server3
3 1 0
HDFS
-
PVFS shim layer (via JNI)
Shim layer with 3 extensions
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 18
Hadoop application
Hadoop framework
File system API
Unmodi!ed PVFS client library (C)
PVFS shim layer (via JNI) - Prefetching
- File layout info - Replication
-
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 19
Outline Shim layer & vanilla PVFS
Extra performance via shim layer
Replication
!Evaluation Micro-benchmark Hadoop benchmark
-
Micro-benchmark Cluster configuration
1 master nodes + 50 slave nodes Pentium Xeon 2.8 GHz dual quad core 16 GB Memory One 7200 rpm SATA 160 GB 10 Gigabit Ethernet
Uses file system API directly without Hadoop
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 20
-
Write benchmark 1. 5,10,!, or 50 clients write to 50 data servers
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 21
Client 1
HDFS/PVFS
Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50
-
HDFS utilizes idle resource better
HDFS pipelined replication improves parallelism Better than client driven replication in PVFS
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 22
0 500
1,000 1,500 2,000 2,500
0 5 10 15 20 25 30 35 40 45 50
Aggr
egat
e w
rite
th
roug
hput
(MB/
s)
Number of clients HDFS PVFS Local PVFS
-
HDFS is surprisingly tied to disk performance
Limited by synchronous file creation New file for each chunk in HDFS Head-of-line blocking while flushing write-back buffer
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 23
0 500
1,000 1,500 2,000 2,500
0 5 10 15 20 25 30 35 40 45 50
Aggr
egat
e w
rite
th
roug
hput
(MB/
s)
Number of clients HDFS PVFS Local PVFS
-
PVFS local improves write tput by 20%
Reduces network traffic by 33% 3 remote copies vs 2 remote copies
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 24
0 500
1,000 1,500 2,000 2,500
0 5 10 15 20 25 30 35 40 45 50
Aggr
egat
e w
rite
th
roug
hput
(MB/
s)
Number of clients HDFS PVFS Local PVFS
-
Read benchmark 1. 50 clients write to 50 data servers
2. 5,10,!, or 50 clients read back The same file each client creates
November 11!http://www.pdl.cmu.edu/ 25
Client 1
HDFS/PVFS
Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50
Client 1
HDFS/PVFS
Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50
-
Local copy improves read tput by 60%
Larger improvement than write No network traffic at all
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 26
0 500
1,000 1,500 2,000 2,500
0 5 10 15 20 25 30 35 40 45 50
Aggr
egat
e re
ad
thro
ughp
ut (M
B/s)
Number of clients HDFS PVFS Local PVFS
-
Hadoop benchmark setting Cluster configuration
2 master nodes + 50 slave nodes Pentium Xeon 2.8 GHz dual quad core 16 GB Memory One 7200 rpm SATA 160 GB 10 Gigabit Ethernet
Use Hadoop for map-reduce processing
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 27
-
Hadoop benchmark Dataset: 50GB of 100-byte records
Benchmark: Write: generate dataset Read: read with no computation Grep: search for rare pattern Sort: sort all records
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 28
-
Effect of Hadoop job scheduler
Masks inefficiencies of PVFS w/o local copy With collocation, most reads become local reads Blocks can be processed out of order
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 29
0
50
100
150
write read grep sort
Com
plet
ion
Tim
e (s
econ
ds)
Benchmark HDFS PVFS Local PVFS
-
Scientific Hadoop applications Sampling (B. Fu): read 71GB astronomy
dataset to determine partitions
FoF (B. Fu): cluster & join astronomical objects of the same 71GB dataset
Twitter (B. Meeder): process raw 24GB Twitter dataset into different format
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 30
-
Results
PVFS performance is comparable to HDFS
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 31
0 100 200 300 400 500 600 700 800
sampling fof twitter Com
plet
ion
Tim
e (s
econ
ds)
Application HDFS PVFS Local PVFS
-
Summary PVFS can be tuned to deliver comparable
performance for Hadoop applications Simple shim layer in Hadoop Local copy optimization
Little modification necessary Only for local copy optimization
Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 32