On the Duality of Data- intensive File System Design ...wtantisi/files/hadooppvfs-pdl11-talk.pdf ·...

download On the Duality of Data- intensive File System Design ...wtantisi/files/hadooppvfs-pdl11-talk.pdf · intensive File System Design: Reconciling HDFS and PVFS ... Hadoop workloads? ...

If you can't read please download the document

Transcript of On the Duality of Data- intensive File System Design ...wtantisi/files/hadooppvfs-pdl11-talk.pdf ·...

  • On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS

    Wittawat Tantisiriroj Swapnil Patil, Garth Gibson (CMU)

    Seung Woo Son, Samuel J. Lang, Robert B. Ross (ANL)

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 2

    Motivation Applications become more data-intensive

    Large input data set (e.g. the entire web) Distributed, parallel application execution

    Many new distributed file systems Commodity hardware and network Purpose-built for anticipated workloads New, diverse semantics (non-POSIX)

  • MPI MPI

    File systems for data-intensive workloads

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 3

    Can existing parallel file systems support new workloads?

    S3

    GFS

  • Why use parallel file systems? Handle a wide variety of workloads

    High concurrent reads and writes Small file support, scalable metadata

    Low capacity overhead reliability solutions

    Standard Unix FS interface Support POSIX semantics

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 4

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 5

    In this work ! Can we use an existing parallel file system for

    Hadoop workloads? PVFS: parallel file system HDFS: file system designed for Hadoop

    Goal: No modification to Hadoop or PVFS

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 6

    Outline !Shim layer & vanilla PVFS

    Extra performance via shim layer

    Replication

    Evaluation

  • PVFS shim layer under Hadoop

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 7

    Hadoop application

    Hadoop framework

    File system API

    HDFS client library

    HDFS servers

    PVFS shim layer (via JNI)

    Unmodi!ed PVFS client library (C)

    Unmodi!ed PVFS servers

    Client Server

  • Preliminary evaluation Text search (grep)

    Common workload in Hadoop applications

    Searchs for rare pattern in 100-byte records 50GB dataset 50 nodes Each node serves as storage and compute nodes

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 8

  • 0

    100

    200

    300

    400

    500

    600

    Grep Benchmark

    Com

    plet

    ion

    Tim

    e (s

    econ

    ds)

    Vanilla PVFS HDFS

    Vanilla PVFS* is disappointing

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 9

    * with 64MB chunk size to match HDFS

    8.5X

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 10

    Outline Shim layer & vanilla PVFS

    !Extra performance via shim layer Prefetching File layout information

    Replication

    Evaluation

  • Prefetching is useful for Hadoop Hadoop typical workload:

    Small read (less than 128KB) Sequential through entire chunk

    Prefetching can be implemented in shim layer In Hadoop, file becomes immutable after closed No need for cache coherence mechanism

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 11

  • 0

    100

    200

    300

    400

    500

    600

    Grep Benchmark

    Com

    plet

    ion

    Tim

    e (s

    econ

    ds)

    Vanilla PVFS PVFS w/ prefetching HDFS

    Improving but still 2X slower

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 12

    2X

  • Collocation in Hadoop Ships computation to where data is located

    Helps reduce network traffic

    Requires file layout information Describes where chunks are located

    Already available at PVFS client for fast I/O Exposes this information to Hadoop in shim layer

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 13

  • 0

    100

    200

    300

    400

    500

    600

    Grep Benchmark

    Com

    plet

    ion

    Tim

    e (s

    econ

    ds)

    Vanilla PVFS PVFS w/ prefetching PVFS w/ prefetching and layout HDFS

    Now, comparable performance

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 14

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 15

    Outline Shim layer & vanilla PVFS

    Extra performance via shim layer

    !Replication

    Evaluation

  • Replication vs RAID HDFS: software-based replication

    1 local copy on writers disks, 2 copies random PVFS: hardware-based RAID

    Expensive RAID controller

    Can we use PVFS with replication instead? Get rid of RAID controller Implemented in shim layer

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 16

  • Data layout schemes

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 17

    HDFS: 1 copy on writers disks, 2 copies random PVFS: 3 copies striped in file (straightforward way) PVFS Local: 1 copy on writers disks, 2 striped

    Need support from PVFS developers

    Writer

    4 3 2 1 0

    Server1

    4 3 1 0

    Server2

    3 2 0

    Server3

    4 2 1

    PVFS Local

    Writer

    4 2 1 0

    Server1

    4 3 1 0

    Server2

    4 3 2 0

    Server3

    3 2 1

    PVFS

    Writer

    4 3 2 1 0

    Server1

    4 3 2 0

    Server2

    4 2 1

    Server3

    3 1 0

    HDFS

  • PVFS shim layer (via JNI)

    Shim layer with 3 extensions

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 18

    Hadoop application

    Hadoop framework

    File system API

    Unmodi!ed PVFS client library (C)

    PVFS shim layer (via JNI) - Prefetching

    - File layout info - Replication

  • Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 19

    Outline Shim layer & vanilla PVFS

    Extra performance via shim layer

    Replication

    !Evaluation Micro-benchmark Hadoop benchmark

  • Micro-benchmark Cluster configuration

    1 master nodes + 50 slave nodes Pentium Xeon 2.8 GHz dual quad core 16 GB Memory One 7200 rpm SATA 160 GB 10 Gigabit Ethernet

    Uses file system API directly without Hadoop

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 20

  • Write benchmark 1. 5,10,!, or 50 clients write to 50 data servers

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 21

    Client 1

    HDFS/PVFS

    Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50

  • HDFS utilizes idle resource better

    HDFS pipelined replication improves parallelism Better than client driven replication in PVFS

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 22

    0 500

    1,000 1,500 2,000 2,500

    0 5 10 15 20 25 30 35 40 45 50

    Aggr

    egat

    e w

    rite

    th

    roug

    hput

    (MB/

    s)

    Number of clients HDFS PVFS Local PVFS

  • HDFS is surprisingly tied to disk performance

    Limited by synchronous file creation New file for each chunk in HDFS Head-of-line blocking while flushing write-back buffer

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 23

    0 500

    1,000 1,500 2,000 2,500

    0 5 10 15 20 25 30 35 40 45 50

    Aggr

    egat

    e w

    rite

    th

    roug

    hput

    (MB/

    s)

    Number of clients HDFS PVFS Local PVFS

  • PVFS local improves write tput by 20%

    Reduces network traffic by 33% 3 remote copies vs 2 remote copies

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 24

    0 500

    1,000 1,500 2,000 2,500

    0 5 10 15 20 25 30 35 40 45 50

    Aggr

    egat

    e w

    rite

    th

    roug

    hput

    (MB/

    s)

    Number of clients HDFS PVFS Local PVFS

  • Read benchmark 1. 50 clients write to 50 data servers

    2. 5,10,!, or 50 clients read back The same file each client creates

    November 11!http://www.pdl.cmu.edu/ 25

    Client 1

    HDFS/PVFS

    Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50

    Client 1

    HDFS/PVFS

    Client 2 Client 3 Client 49 Client 50 /dataset/01 /dataset/02 /dataset/03 /dataset/49 /dataset/50

  • Local copy improves read tput by 60%

    Larger improvement than write No network traffic at all

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 26

    0 500

    1,000 1,500 2,000 2,500

    0 5 10 15 20 25 30 35 40 45 50

    Aggr

    egat

    e re

    ad

    thro

    ughp

    ut (M

    B/s)

    Number of clients HDFS PVFS Local PVFS

  • Hadoop benchmark setting Cluster configuration

    2 master nodes + 50 slave nodes Pentium Xeon 2.8 GHz dual quad core 16 GB Memory One 7200 rpm SATA 160 GB 10 Gigabit Ethernet

    Use Hadoop for map-reduce processing

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 27

  • Hadoop benchmark Dataset: 50GB of 100-byte records

    Benchmark: Write: generate dataset Read: read with no computation Grep: search for rare pattern Sort: sort all records

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 28

  • Effect of Hadoop job scheduler

    Masks inefficiencies of PVFS w/o local copy With collocation, most reads become local reads Blocks can be processed out of order

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 29

    0

    50

    100

    150

    write read grep sort

    Com

    plet

    ion

    Tim

    e (s

    econ

    ds)

    Benchmark HDFS PVFS Local PVFS

  • Scientific Hadoop applications Sampling (B. Fu): read 71GB astronomy

    dataset to determine partitions

    FoF (B. Fu): cluster & join astronomical objects of the same 71GB dataset

    Twitter (B. Meeder): process raw 24GB Twitter dataset into different format

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 30

  • Results

    PVFS performance is comparable to HDFS

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 31

    0 100 200 300 400 500 600 700 800

    sampling fof twitter Com

    plet

    ion

    Tim

    e (s

    econ

    ds)

    Application HDFS PVFS Local PVFS

  • Summary PVFS can be tuned to deliver comparable

    performance for Hadoop applications Simple shim layer in Hadoop Local copy optimization

    Little modification necessary Only for local copy optimization

    Wittawat Tantisiriroj November 11!http://www.pdl.cmu.edu/ 32