MeDiCI - How to Withstand a Research Data Tsunami

UQ’s MeDiCIData in the right place, at the right time.

Jake Carroll, Senior ICT Manager, ResearchThe Queensland Brain Institute, The University of Queensland, Australia

jake.carroll@uq.edu.au

This is a story of data locality, performance, namespace and financial

complexity.

100’s of TB’s per day of data generated - eclectic mixture of Life

Sciences data, engineering, physics, nanotech

Every man, woman and child seems to build a (little) supercomputer, to deal with their problems…

Compute + Storage are tightly connected, in each building.

Instrument outputs + scientific endeavors grow - budgets for storage

and compute do not.

To add another complexity…

The MeDiCI Journey

Thus, the problem (or question) definition:

“How do we provide parallel access to scientific data, through a multitudeof protocols and give the illusion that the data is ‘next to’ the applications, on a budget, keeping the rightdata near the right type of computational infrastructure, noting our budgetary constraints?

SpectrumScale AFM (cache)

{Parallel IO via NSD protocol}

SpectrumScale AFM (home)

Back at UQ

uqjcarr1

Scale cluster “A”using UQ creds

Scale cluster “B”using other creds

Out at Polaris

someOtherName

mmname2uuidmmuuid2Name

Turns out, all that code was missing from SpectrumScale.

Network stumbles…

• We had, at best, 10GbE between our buildings and around the campus.

• Not made for the parallel IO aggression of spectrumScale AFM over the NSD protocol.

• Needed to spawn an entire mini-project to upgrade campus networks for big storage IO to 40/100G around the “ring” of nodes.

Recovery storms - AFM is a work in progress

• When you’re trying to recover 10’s of millions of files, AFM doesn’t always keep up.

• IBM working on it, for us (and others, globally).

• Scaling to 100’s of millions of files in a single (or multiple) file-sets, if not billions of files in sync/push/recovery is required.

Things we assumed users would doas per our mental model.

User puts data in cache frominstruments to send to a

supercomputer, at remote site

User processes data out atremote site on said supercomputer

Things people actually did, breaking our mental model.

User puts data in cache frominstruments. They start processing

on a supercomputer locally.

Simultaneously, they start using the storage fabric to process other “bits”of the outputs of the run on the other supercomputer for an additive workflow.[culminating in the fabric becoming a means for both supercomputers to work on the same tasks at the same time]

Same data namespaceended up everywhere.

That much, was intentional.

As a result, user could leverage*every bit of the compute* everywhere

simultaneously, if their workflowis smart enough…

IMB QBI

Turns out, we’re onto something

Thank you.

• UQ RCC, David Abramson for mentorship and a true sense of adventure.

• The Queensland Cyber Infrastructure Foundation (QCIF)

• My colleagues at UQ QBI, IMB, CAI, AIBN, ITS

• AIIA, ACS

• Justin Glen @ DDN

MeDiCI - How to Withstand a Research Data Tsunami

Technology

Transcript of MeDiCI - How to Withstand a Research Data Tsunami

Lorenzo de' Medici

The Medici

Medici effect capt4

Bernhardt Villa Medici Catalog

hotel de' medici Bruges

The House of Medici

20090118 Medici

CHANDLER , MEDICI BASIS

Fin de los medici

Medici Project

Luca de’ Medici

Medici Project Worldview Weekends

Villa Medici

Reactor Neutrino Monitor Experiments in Japan · withstand earthquakes and tsunami. ... Detector Location (Outside) Water Tank (24 cm thickness) 6m 200L for each tank • Result of

Medici Internship Packet

Brochure per Medici

Thomas Brown - Religio Medici

Medici Project Development Labs

BUS103_Chapter 2_Cosimo de Medici

074 Tesis Medici