Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC...

25
* Some names and brands may be claimed as the property of others. OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel

Transcript of Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC...

Page 1: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

OpenFabrics Software User Group Workshop

Lustre* Filesystem for Cloud and Hadoop*

Robert Read, Intel

Page 2: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Lustre* for Cloud and Hadoop*

•  Brief Lustre History and Overview •  Using Lustre with Hadoop •  Intel® Cloud Edition for Lustre

March 15 – 18, 2015 #OFADevWorkshop 2

Page 3: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

What is Lustre*

•  High performance parallel filesystem for Linux environments •  Designed to use RDMA on high performance fabrics

•  Allows large number of users to share a file system •  High speed and low latencies •  Across local or wide area networks

•  Designed for reliable storage •  Ideal for streaming IO and large, shared datasets •  Evolving to bring parallel filesystem to new

workloads

March 15 – 18, 2015 #OFADevWorkshop 3

Page 4: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Quick History

•  Originally built to solve next gen IO problems for HPC

•  An open source (GPL) project from the beginning

•  Core team has survived numerous transitions and is now safely established in Intel’s HPDD.

•  Used in production systems since 2002 •  Intel Enterprise Edition for Lustre* since 2013 •  Intel Cloud Edition now available on AWS

Marketplace

March 15 – 18, 2015 #OFADevWorkshop 4

Page 5: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Lustre* Storage Components

5

Management Target

Metadata Targets

Object Storage Targets

Lustre mount serviceInitial point of contact for Clients

Namespace of file systemFile layouts, no dataScalable

File content stored as objectsStriped across multiple targetsScales to 100s

MGS MGT

MDSMDT

OSSOST

Page 6: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Hadoop* Introduction

•  Open source framework for data-intensive computing

•  Parallelism hidden by framework –  Highly scalable: can be applied to large datasets (Big

Data) and run on commodity clusters •  Comes with its own user-space distributed file

system (HDFS) based on the local storage of cluster nodes

6

Page 7: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Hadoop* with HDFS

•  HDFS locality has advantages, but… –  HDFS requires import/

export to share data –  Compute nodes require

local storage –  Hadoop nodes are both

compute and IO nodes –  Hadoop nodes are

single-purpose

March 15 – 18, 2015 #OFADevWorkshop 7

Lustre Filesystem

HPC Cluster

Hadoop Cluster HDFS

Import / Export

Page 8: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Hadoop* Adapter for Lustre*

•  Shared data repository for all compute resources

•  Use data in place (no import/export)

•  Dedicated Compute and IO nodes

March 15 – 18, 2015 #OFADevWorkshop 8

Lustre Filesystem

Hadoop Cluster

HPC Cluster

Page 9: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

HPC Adapter for MapReduce

•  Run Hadoop* on HPC cluster

•  Replace YARN with Slurm

•  Single compute cluster used for variety of workloads

March 15 – 18, 2015 #OFADevWorkshop 9

Lustre Filesystem

HPC Cluster

Hadoop

Page 10: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

LUSTRE (Global Namespace)

Optimized Shuffle: HDFS vs Lustre*

Sort Merge

Map  O/P  

Spill

Map

Map  O/P    

Map  O/P  

Map  O/P   Map  O/P  

Shuffle Merge

Input

Reduce

Output

Input Output

Spill

Spill File : Reducer à N : 1

Local Disk Local Disk

REDUCE MAP

HDFS

Repetitive

Page 11: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Performance Comparison

•  Tata Consultancy Services performed comparison in 2014 –  “Performance comparison of Lustre* and HDFS for MR

implementation of FSI workload using HDDP cluster hosted in the Intel BigData Lab in Swindon (UK) and Intel® Enterprise Edition for Lustre* software”

–  Intel® EE for Lustre = 3 X HDFS for single job –  Intel® EE for Lustre = 5.5 X HDFS for concurrent workload

•  http://insidebigdata.com/2014/09/29/performance-comparison-intel-enterprise-edition-lustre-hdfs-mapreduce/

March 15 – 18, 2015 #OFADevWorkshop 11

Page 12: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others. 23

Degree of Concurrency = 1

Intel® EE for Lustre* = 3 X HDFS for optimal SC settings*

March 15 – 18, 2015 #OFADevWorkshop 12 http://www.eofs.eu/fileadmin/lad2014/slides/21_Rekha_Singhal_LAD2014_Lustre_vs_HDFS_MR.pdf

Page 13: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others. 24

Degree of Concurrency = 1

Intel® EE for Lustre* optimal SC gives 70% improvement over HDFS

March 15 – 18, 2015 #OFADevWorkshop 13

http://www.eofs.eu/fileadmin/lad2014/slides/21_Rekha_Singhal_LAD2014_Lustre_vs_HDFS_MR.pdf

Page 14: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Intel Cloud Edition for Lustre*

•  Three Intel Lustre AMIs available on AWS Marketplace*: •  Community Version •  Global Support •  Global Support HVM

•  Lustre v2.5.3 •  Automated Lustre configuration •  Lustre monitoring with Ganglia and LMT/ltop •  Automated S3 data import •  Global Support includes additional capabilities:

•  Intel Lustre Support •  High Availability(requires VPC) •  Enhanced Networking (HVM + VPC) •  Higher performance instance types

14

Page 15: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Automated Lustre* Deployment

15

OSS

MGS

CloudFormation

MDS

OSS

OSS

2

3

4

5

2 MGS Initializes itself

DynamoDB

3 MGS updates DB with NID

MDS formats MDT, registers with MGS, updates DB.

5

5

5 OSSs format local targets, updates DB

1

1 CloudFormation creates a stack of AWS resources from a template4

Page 16: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Using Virtual Private Cloud

March 15 – 18, 2015 #OFADevWorkshop 16

VPC Compute Subnet

Lustre* Private SubnetPublic Subnet

MGS &Console

OSS

OSS

NAT

ELB

Compute

MDS

AWS APIs

Worker Client

OSS

OSS

PublicNetwork OSS

OSS

OSS

OSS

Compute Compute Compute

Page 17: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Automated S3 Import

•  Option to import an S3 bucket into a new Lustre* filesystem

•  Initially only the file metadata is imported •  File contents are retrieved from S3 on demand

and stored on Lustre

March 15 – 18, 2015 #OFADevWorkshop 17

Page 18: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Lustre* HA in the cloud

•  HA template is available –  Storage is managed independently of instances

•  Failure scenario –  AWS AutoScaling detects and terminates a failed

instance –  AutoScaling creates a new instance –  New instance identifies orphaned storage and

network interface –  “Adopts” the storage and NIC –  Target is brought back online and recovers

March 15 – 18, 2015 #OFADevWorkshop 18

Page 19: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

ICE-L Monitoring tools

19

Page 20: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Large File Benchmark

•  Using IOR benchmark •  Comparing 3 Lustre

cluster configurations •  Increase the number of

OSSs •  4 OSS •  8 OSS •  16 OSS

•  Configurations of MGS and MDS are fixed

•  1-32 clients

20

MDSEBS Optimized

RAID08x 40GBStandard

110 MB/sec

m3.2xlarge

OSSEBS Optimized

8x 100GBStandard

110 MB/sec

m3.2xlarge

Client110 MB/sec

m3.2xlarge

MGS94 MB/sec

m1.medium

Page 21: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

IOR Sequential Read FPP

21

0  

200  

400  

600  

800  

1000  

1200  

1400  

1600  

1   2   4   8   16   32  

4OSS  

8OSS  

16OSS  

N. Clients

MB/sec

Client’s network bottleneck

OSS’s network bottleneck

OSS’s network bottleneck

Close to the OSS network

Page 22: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

0  

200  

400  

600  

800  

1000  

1200  

1400  

1600  

1   2   4   8   16   32  

4OSS  

8OSS  

16OSS  

IOR Sequential Write FPP

22

N. Clients

MB/sec

Client’s network bottleneck

OSS’s network bottleneck

OSS’s network bottleneck

Ops….

Page 23: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

Aggregate Performance During Run

23

time

1920 MB/sec

Long tail

LMT used to record the OST metrics during the IOR run. With a simple python script we create this graph: “aggregate performance vs time” to analyze the problem.

Page 24: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others.

MDTEST on 16 OSS Cluster Configuration

24

N. Threads/32 clients0  

5000  

10000  

15000  

20000  

25000  

30000  

32   64   128   256  

Directory  crea9on  

Directory  stat          

Directory  removal    

File  crea9on            

File  stat                    

File  removal              

Tree  crea9on            

Tree  removal              

MOPS

Page 25: Lustre Filesystem for Cloud and Hadoop...• Originally built to solve next gen IO problems for HPC • An open source (GPL) project from the beginning ... – Hadoop nodes are single-purpose

* Some names and brands may be claimed as the property of others. #OFSUserGroup

OpenFabrics Software User Group Workshop

Thank You