SDSC's Data Oasis

10
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013

description

SDSC's Data Oasis. Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13 ) Rick Wagner San Diego Supercomputer Center Jeff Johnson Aeon Computing April 18, 2013. Data Oasis. High performance, high capacity Lustre-based parallel file system - PowerPoint PPT Presentation

Transcript of SDSC's Data Oasis

Page 1: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

SDSC's Data OasisBalanced performance and cost-effective Lustre

file systems.

Lustre User Group 2013(LUG13)

Rick WagnerSan Diego Supercomputer Center

Jeff JohnsonAeon Computing

April 18, 2013

Page 2: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis• High performance, high capacity Lustre-based parallel

file system• 10GbE I/O backbone for all of SDSC’s HPC systems,

supporting multiple architectures• Integrated by Aeon Computing using their EclipseSL• Scalable, open platform design• Driven by 100GB/s bandwidth target for Gordon• Motivated by $/TB and $/GB/s

• $1.5M = 4@MDS + 64@OSS = 4PB = 100GB/s• 6.4PB capacity and growing• Currently Lustre 1.8.7

Page 3: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Heterogeneous Architecture

OSS72TB

64 OSS (Object Storage Servers)

Provide 100GB/s Performance and >4PB Raw Capacity

JBODs (Just a Bunch Of Disks)

Provide Capacity Scale-out

Arista 750810G

Arista 750810G

Redundant Switches for Reliability and

Performance

3 Distinct Network Architectures

OSS72TB

OSS72TB

OSS108TB

JBOD132TB

64 Lustre LNET Routers100 GB/s

Mellanox 5020 Bridge12 GB/s

MDS: Gordon scratch

MDS: Trestles scratch

Juniper 10G SwitchesXX GB/s

MDS: Triton scratch

GORDONIB cluster

TRITON10G & IB cluster

TRESTLES IB cluster

Metadata Servers

MDS: Gordon & Trestles project

Page 4: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

File Systems

File System Clusters OSSes JBODs Capacity (RAW)Monkey Gordon 32 0 2.3PB

Meerkat Gordon & Trestles

8 8 1.9PB

Puma Trestles 8 0 576TB

Dolphin Triton 16 0 1.2PB

Rhino Development 4 4 480TB

Page 5: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Servers

MDS (active) OSS OSS+JBOD

MDS (backup)

LSI

LSI

RAID 10 (2x6)

Myri10GbE

LSI

RAID 6 (7+2)

RAID 6 (7+2)

Myri10GbE

Myri10GbE

LSI

RAID 6 (8+2)

RAID 6 (8+2)

Myri10GbE

x4x4 … …

3TB drives2TB drives

Page 6: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Trestles Architecture

QDR 40 Gb/sGbE 10GbE

EthernetManagement

QDR InfiniBandSwitch

NFS Servers (4x)Gordon & Trestles

Shared

ComputeNode

ComputeNode

ComputeNode

Data Movers(4x)

Data OasisLustre PFS

4 PB

XSEDE & R&E Networks

LoginNodes (2x)

ComputeNode 324

• QDR IB• GbE management• GbE public• Round robin login• Mirrored NFS• Redundant front-end

Arista (2x MLAG)7508 10 GbE switch

IB/EthernetBridge switch

Mgmt.Nodes (2x)

GordonCluster

4x

SDSC Network

12x

Page 7: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Gordon Network Architecture

QDR 40 Gb/sGbE 2x10GbE 10GbE

3D torus: rail 1 3D torus: rail 2

Mgmt.Nodes (2x)

Mgmt. Edge & Core Ethernet

Public Edge & Core Ethernet

NFSServer (4x)

ComputeNode

ComputeNode

ComputeNode

Data Movers(4x)

Data OasisLustre PFS

4 PB

XSEDE & R&E Networks

SDSC Network

IO Nodes

IO Nodes

LoginNodes (4x)

ComputeNode

1,024

64

• Dual-rail IB• Dual 10GbE storage• GbE management• GbE public• Round robin login• Mirrored NFS• Redundant front-end

Arista10 GbE switch

4x

128x

Page 8: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Gordon Network Design DetailSTATUS

PSU 1

PSU 2

FAN

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

STATUS

PSU 1

PSU 2

FA N

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

Rail 0

Rail 1

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

0

1

00

1

0 0

1

00

1

0 0

1

00

1

0 0

1

00

1

0

STATUS

PSU 1

PSU 2

FA N

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

STATUS

PSU 1

PSU 2

FA N

RST

3433

3231

3635

2827

2625

3029

2221

2019

2423

1615

1413

1817

109

87

1211

43

21

65

IS 5030CONSOLEMGT

16 Compute Nodes

16 Compute NodesFlash I/O Node

Flash I/O Node

Each switch connected to its 6neighbors via 3 QDR links

LustreFilesystem

Dual 10GbE

Dual 10GbE

Page 9: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Data Oasis Performance – Measured from Gordon

Page 10: SDSC's Data Oasis

SAN DIEGO SUPERCOMPUTER CENTER

Issues & The Future• LNET “death spiral”

• LNET tcp peers stop communicating, packets back up• We need to upgrade to Lustre 2.x soon

• Can’t wait for MDS SMP improvements & DNE• Design drawback: juggling data is a pain• Client virtualization testing

• SR-IOV very promising for o2ib clients• Watching the Fast Forward program

• Gordon’s architecture ideally suited to burst buffers

• HSM• Really want to tie Data Oasis to SDSC Cloud