In ~2011, we discovered an exciting new failure mode in ...files.meetup.com/14556262/Austin Data...

•  In ~2011, we discovered an exciting new failure mode in large scale systems.

•  We called this failure mode “Hadoop”. •  Recipe for success: take a big HPC system

–  A huge central filesystem –  Optimized for large, sequential, access to multi GB to TB size files –  With a highly tuned, low level C interface

•  And on that run software that: –  Assumes a small, massively distributed filesystem. –  Optimized for very “small” files. –  With an untuned, well… Java.

To address these and other needs for the Data Driven

researcher, we built Wrangler

Goals of the Wrangler Project •  To address the data problem in multiple dimensions

–  Big (and small), reliable, secure –  Lots of data types: Structured and unstructured –  Fast, but not just for large files and sequential access. Need high

transaction rates and random access too. •  To support a wide range of applications and interfaces

–  Hadoop, but not *just* Hadoop. –  Traditional languages, but also R, GIS, DB, and other, perhaps less

scalable things by HPC standards. •  To support the full data lifecycle

–  More than a scratch (aka purged) file system –  Metadata and collection management support

14*

Wrangler

40 Gb/s Ethernet

100 Gbps Public

Network

GlobusInterconnect with1 TB/s throughput

IB Interconnect 120 Lanes(56 Gb/s) non-blocking

High Speed Storage System500+ TB1 TB/s

250M+ IOPS

Access &Analysis System

96 Nodes128 GB+ Memory

Haswell CPUs

Mass Storage Subsystem10 PB

(Replicated)

Access &Analysis System

24 Nodes128 GB+ Memory

Haswell CPUs

Mass Storage Subsystem10 PB

(Replicated)

TACC Indiana

15*

16*

Half*of*the*Final*

Cabling*on*4*D5s*

6*of*the*10*D5s*

5*of*the*35*storage*

JBODs*and*Lustre*

Metadata*server*

12*of*120*compute*

Haswell*nodes*

DSSD Storage •  Not SSD ; a custom interface allows

access to the NAND flash technology performance without the overhead of the traditional “disk” interface.

•  Not Flash storage tied to individual nodes –  Multiple nodes have direct access to the

same storage via independent PCI –  ~600 PB of usable space –  10s of GB/s IO to one node –  Accessible across whole cluster

Dynamic Storage Provisioning •  The high speed storage will be visible in several ways:

–  As a traditional filesystem, for traditional applications –  As an HDFS filesystem, for Hadoop, Spark, and other HDFS

applications. –  As a SQL database backend with high transaction rates –  As an object store with a native API, for novel data applications

•  In addition to our “traditional” HPC stack, we support R, databases, NoSQL databases, and the full Hadoop “suite”.

•  Long term storage and cataloging leveraging the replicated 10PB file system

18*

Moving Data “Effectively” •  2x100 GB/s Internet 2 connection… •  Data Dock capability for the “last mile

problem” where limited by other networks.

3,000 Rice genomes from the Philippines; yes, this is the actual box

19*

In ~2011, we discovered an exciting new failure mode in ...files.meetup.com/14556262/Austin Data...

Documents

Transcript of In ~2011, we discovered an exciting new failure mode in ...files.meetup.com/14556262/Austin Data...