HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working...

Post on 20-May-2020

3 views 0 download

Transcript of HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working...

Copyright © 2015 NTT DATA Corporation

9/30/2015NTT DATA CorporationAkira Ajisaka

HDFS 2015: Past, Present, and Future

Apache: Big Data Europe 2015

2Copyright © 2015 NTT DATA Corporation

Self introduction

Akira Ajisaka (NTT DATA)

Apache Hadoop Committer

130+ commits in 2015

Working on usability

80+ documentation patches

"Open-Source Professional Services" team

Has deployed and supported 10k+ nodes of Hadoop clusters overall for 7 years

Contributing to Apache Hadoop 6th in the world with NTT [1]

[1] The Activities of Apache Hadoop Community 2014 http://ajisakaa.blogspot.com/2015/02/the-activities-of-apache-hadoop.html

3Copyright © 2015 NTT DATA Corporation

About

Similar to "YARN 2015" presentation by @tshooter

HDFS is developed faster than YARN

Need a summary of HDFS new features

0

200

400

600

800

1000

1200

1400

1-Jan-15 1-Feb-15 1-Mar-15 1-Apr-15 1-May-15 1-Jun-15 1-Jul-15 1-Aug-15 1-Sep-15

Resolved issues in 2015 (cumulative)

HDFS YARN

4Copyright © 2015 NTT DATA Corporation

Agenda

Past

Present

Future

Copyright © 2015 NTT DATA Corporation 5

Past

6Copyright © 2015 NTT DATA Corporation

2.X is the release branch

1.X and 0.23.X are no longer maintained

Past releases

20142010 2011 201320122009

branch-2

2.2.0 (GA)

2.3.0

2.4.02.0.0-alpha

2.1.0-beta

branch-1

(branch-0.20)

1.0.0 1.1.0 1.2.1(stable)0.20.1 0.20.205

0.22.00.21.0

New append

Security

0.23.0

0.23.11(final)

NameNode Federation, YARN

NameNode HA

2015

2.5.0

2.6.0

2.7.0

trunk

7Copyright © 2015 NTT DATA Corporation

Hadoop 2.2 (2013-10-13)

NameNode High-Availability

No Single Point of Failure

Federation

Multiple NameNodes, multiple namespaces

Improve scalability

Snapshots

Read only point-in-time copy (Copy on Write)

NFSv3 mount

8Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient NameNode1. Ask NN to cache a file

DISK Memory

File

9Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient NameNode

DISK Memory

File2. Ask DN to cache blocks

File

10Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient

DISK Memory

File File

If cached locally,

read directly from memory and

skip checksum calculation

11Copyright © 2015 NTT DATA Corporation

Hadoop 2.4 (2014-04-07)

Rolling Upgrades

No need to wait for hours

ACLs

More fine-grained permissions

Similar to POSIX ACL

-rw-rw-r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt

$ hdfs dfs -setfacl -m group:hive:rw- /user/tester/test.txt

gives write permission to hive group

12Copyright © 2015 NTT DATA Corporation

Hadoop 2.5 (2014-08-11)

Extended Attributes (XAttrs)

Similar to extended attributes in Linux

Currently used by transparent encryption

-rw-r--r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt

Set XAttrs

$ hdfs dfs -setfattr -n user.locale -v jp /user/tester/test.txt

$ hdfs dfs -setfattr -n user.city -v tokyo /user/tester/test.txt

Get XAttrs

$ hdfs dfs -getfattr -d /user/tester/test.txt

# file: /user/tester/test.txt

user.locale="jp"

user.city="tokyo"

13Copyright © 2015 NTT DATA Corporation

Hadoop 2.6 (2014-11-18)

Hot swap volumes

Recover from disk failures w/o stopping DNs

Integrate Apache HTrace (incubating)

Trace RPCs inside HDFS

Finding bottlenecks becomes easier

Time

Span A trace id: 12345

parent: rootnode 1

Span B trace id: 12345

parent: Anode 2

Span C Span Dnode 3

RPC

RPC RPC

Easy to find

parent-child

relations

14Copyright © 2015 NTT DATA Corporation

Hadoop 2.6 (2014-11-18) (Cont.d)

Heterogeneous Storages (Phase 2)

Archival Storage

Memory as storage tier

Transparent Encryption

15Copyright © 2015 NTT DATA Corporation

Heterogeneous Storages

Problem

SSD is getting cheaper

Want to store hot data in SSD to achieve higher throughput

Solution: Introduce storage type and block placement policy

Storage: HDD, SSD, ARCHIVE, ...

Policy: One_SSD, HOT, WARM, COLD, ...

Example: A -> One_SSD, B -> HOT

DN1

SSD DISK

DISK DISK

A

B

DN2

SSD DISK

DISK DISKA

B DN3

SSD DISK

DISK DISK

A B

Hadoop 2.6

16Copyright © 2015 NTT DATA Corporation

How to use

Configure HDFS to recognize storage type for each disk

Set block placement policy to HDFS path

Reset policy after putting data is possible

Mover will move blocks to satisfy the policy considering rack awareness

Hadoop 2.6

Heterogeneous Storages

<parameter>

<name>dfs.datanode.data.dir</name>

<value>[SSD]file:///data/ssd,[HDD]file:///data/hdd</value>

</parameter>

$ hdfs setstoragepolicies -setStoragePolicy -path <path> -policy <policy>

17Copyright © 2015 NTT DATA Corporation

Archival Storage

DISK or ARCHIVE?

ARCHIVE is for cold data

eBay reduces cost/GB by 5x [1]

Use low-spec DNs for ARCHIVE

No need to split cluster![1] Reduce Storage Costs by 5x Using The New HDFS Tierd Storage Feature http://www.slideshare.net/Hadoop_Summit/reduce-storage-costs-by-5x-using-the-new-hdfs-tiered-storage-feature

Regular Node Archival Node

Drives 12 HDDs 60 HDDs

CPU 32 Cores 4 Cores

Memory 128GB 64GB

Run NodeManager Yes No

Hadoop 2.6

18Copyright © 2015 NTT DATA Corporation

Transparent Encryption

Problem

Cannot guard data from OS-level attacks

Solution

Provide end-to-end encryption

Encrypt/decrypt data transparently

No need to rewrite user application

Hadoop 2.6

Client

DataNode

DataTransferProtocol

can be encrypted

DISK

Data

DataEncrypted data

NOT encrypted!

19Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

1. Create file in EZ

2. Get EDEK

3. Store EDEK in metadata

EDEK

• Proxy to underlying key provider

• ACLs on per key basis

• Bundled with Hadoop package

Hadoop 2.6

20Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

4. EDEK returned EDEK

5. Call to decrypt EDEK to DEK

EDEK

Hadoop 2.6

21Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

EDEKDEK

DataNode

6. Write encrypted data to DN using DEK

Hadoop 2.6

Encrypted data

Encrypted data

22Copyright © 2015 NTT DATA Corporation

Transparent Encryption: Very low overhead

Very low overhead

Simple benchmark with 3 slaves (m3.xlarge, 4 core Xeon E5-2670 v2)

Use AES-NI

Known issue

Encryption is sometimes done incorrectly (HADOOP-11343)

Recommend 2.7.1 or 2.6.1

Hadoop 2.6

Encryption Off Encryption On

1GB Teragen 17 sec 18 sec

1GB Terasort 47 sec 49 sec

Copyright © 2015 NTT DATA Corporation 23

Present

24Copyright © 2015 NTT DATA Corporation

Hadoop 2.7 (2014-11-18)

Quota per storage type

Truncate API

Files with variable-length blocks

Web UI for NFS gateway

NNTop: top-like tool for NameNode

List top users for each operation

Exposed via metric

fsck -blockId option

Print the file which the blockId belongs to

Inotify

25Copyright © 2015 NTT DATA Corporation

INotify for HDFS

Problem

Some components do caching

Hive caches path names

Impala caches block locations

When to invalidate cache?

Solution

Introduce a tool similar to Linux inotify

Client can monitor the events without parsing NN log or edits

Hadoop 2.7

26Copyright © 2015 NTT DATA Corporation

INotify for HDFS: Technical Approach

Client polls NameNode periodically

Not push model

Known issue

Truncate is not notified (HDFS-8742)

Fixed in 2.8.0

Client NameNode

1. Poll any events after #XX

2. Return events after #XX

Caches the highest

event number

Hadoop 2.7

Copyright © 2015 NTT DATA Corporation 27

Future

28Copyright © 2015 NTT DATA Corporation

Many features are being developed

2.8 (not released)

Support OAuth2 in WebHDFS

RPC Congestion control

Feature branches

Erasure Coding (HDFS-7285)

Ozone: Object store (HDFS-7240)

BlockManager Scalability Improvements (HDFS-7836)

HTTP/2 support for DataTransferProtocol(HDFS-7966)

Implement an async pure c++ HDFS client (HDFS-8707)

29Copyright © 2015 NTT DATA Corporation

RPC Congestion Control

Problem

NameNode RPC queue is FIFO

DDoS can kill entire cluster

Solution

Fair scheduling for RPC queue (2.6.0)

Retriable exception with exponential backoff(2.8.0)

Enable by default in 2.8

while (true) {

dfs.exists("/data");

}Don't do this!

Hadoop 2.8

30Copyright © 2015 NTT DATA Corporation

Erasure Coding

Problem

Reduce costs of storage

Blocks are replicated to 3 DNs

3x storage overhead is costly

Solution

Use Erasure Code

3-replication (6,3)-Reed-Solomon

Tolerates 2 failures 3 failures

Disk Usage 3x 1.5x

31Copyright © 2015 NTT DATA Corporation

Erasure Coding: Write files using (6,3)-Reed-Solomon

Write data to 9 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

Incoming Data

・・・

ECClient

・・・

3 Parity Blocks

6 Data Blocks

32Copyright © 2015 NTT DATA Corporation

Erasure Coding: Read files

Read data from 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

ECClient

・・・

33Copyright © 2015 NTT DATA Corporation

Erasure Coding: Read files when DN fails

Read data from (arbitrary) 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

ECClient

・・・

×

34Copyright © 2015 NTT DATA Corporation

Erasure Coding: Current status

Suitable for cold data

No data locality

Very low cost/GB with archival storage

Now preparing for merge

Follow on work

Intel ISA-L support for faster encoding

Support append/truncate/hflush/hsync

More encoding schemas

Pipeline error handling

Support contiguous layout (HDFS EC Phase 2)

35Copyright © 2015 NTT DATA Corporation

Summary

Many features are still in development

I cannot predict when the feature will be available

Recommend anyone who wants a feature to join contributing to it to make the development faster

There are many ways to contribute

Creating/Testing/Reviewing patches

Reporting bugs

Writing documents

Discussing architecture design

https://wiki.apache.org/hadoop/HowToContribute

Copyright © 2011 NTT DATA Corporation

Copyright © 2015 NTT DATA Corporation

37Copyright © 2015 NTT DATA Corporation

References

Apache Hadoop Docs: http://hadoop.apache.org/docs/current/

In-memory caching (HDFS-4949)

In-memory Caching in HDFS: Lower Latency, Same Grate Taste: http://www.slideshare.net/Hadoop_Summit/inmemory-caching-in-hdfs-lower-latency-same-great-taste-33921794

Heterogeneous Storages (HDFS-5682)

Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature: http://www.slideshare.net/Hadoop_Summit/reduce-storage-costs-by-5x-using-the-new-hdfs-tiered-storage-feature

Transparent Encryption (HDFS-6134)

Transparent Encryption in HDFS: http://www.slideshare.net/Hadoop_Summit/transparent-encryption-in-hdfs

INotify (HDFS-6634)

Keep Me in the Loop: Introducing HDFS Inotify: http://www.slideshare.net/Hadoop_Summit/keep-me-in-the-loop-inotify-in-hdfs

38Copyright © 2015 NTT DATA Corporation

References

RPC congestion control (HADOOP-9640, HADOOP-10597, HDFS-8820)

Improving HDFS Availability with Hadoop RPC Quality of Service: http://www.slideshare.net/MingMa4/hadoop-rpcqoshadoopsummit2015

Erasure Coding (HDFS-7285)

HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency: http://www.slideshare.net/Hadoop_Summit/hdfs-erasure-code-storage-same-reliability-at-better-storage-efficiency