Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... ·...

8
Page 1 Dr. Alf Wachsmann IBM Spectrum Scale User Group Meeting Head of Scientific Computing Nov 12, 2017 Spectrum Scale for Life Science Workload

Transcript of Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... ·...

Page 1: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 1

Dr. Alf Wachsmann IBM Spectrum Scale User Group Meeting

Head of Scientific Computing Nov 12, 2017

Spectrum Scale for Life Science Workload

Page 2: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 2

Helmholtz Association

Largest research organization in Germany

18 national research centers (6 withfocus on Health Research)

More than 38,000 staff Budget: more than € 4 bn

Max Delbrück Center for Molecular Medicine

MDC Core Budget 2016 € 87.5 M

Extramural Funding 2016

Shared 2nd place for DFG-funding nationwide€ 30.7 M

European Research Council (ERC) Grants 15

Research Groups 65

Staff (incl. guests), 25% international 1,660

Postdocs (incl. guests), 50% international 230

PhD Students (incl. guests), > 50% international 360

Patent families 95

Max Delbrück Center forMolecular Medicine in the

Helmholtz Association (MDC)

Helmholtz Centre forInfection Research (HZI)

German Center forNeurodegenrativeDiseases (DZNE)

German Cancer Research Centre

(DKFZ)

Helmholtz Centre Munich –German Research Center for

Environmental Health (HMGU)

Bas

ic F

acts

20

16

(HZB)

(UFZ)

Page 3: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 3

RESEARCH AT MDC

cell and molecular biology, signaling pathways, developmental biology, physiology, structural biology, omics, systems biology, bioinformatics

Cancer Diseases of the nervous system

Cardiovascular and metabolic disease

Medical Systems Biology

Page 4: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 4

INTERNAL DATAFLOWS

Gene Sequencers

Mass Spectrometers

Microscopes

Compute ClusterGPFS Clients

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0

2

1

3

5

4

6

8

7

9

11

10

Front

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

E S T

6 74 52 30 1 14 1512 1310 118 9

CIFS to Spectrum Scale CES

Fast Storage for„hot data“GPFS

Tape Archive (IBM TS4500 with 9 TS1150 andVersity StorageManager)

40 Gb/s Network

1 / 10 Gb/s Network

Slower Storagefor „cool data“NFS, CIFS

40 Gb/s Network

Page 5: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 5

• One storage system and one compute cluster for all of our use cases• Applications with very different I/O characteristics• Data of very different sizes

• Tried Lustre in my previous job for similar workload: small-block I/O didn‘t work well• Tried BeeGFS two years ago: seems to work well but not better than GPFS to switch

I/O CHARACTERISTICS

Page 6: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 6

• MDC is using only very few of the GPFS features– Quotas

– Snapshots

• Plan to use rules engine to find „cool data“ and notify users– Later maybe migrate the data automatically

• Will investigate changing HSM to IBM Spectrum Protect (Tivoli Storage Manager) for better integration with GPFS next year

GPFS FEATURES IN USE AT MDC

Page 7: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 7

Page 8: Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... · Spectrum Scale for Life Science Workload. Page 2 Helmholtz Association Largest research

Page 8

HPC

RFP is out for~70 compute nodes witha 40 Gb/s backbone and5 Tesla P100 GPUs

GPFS is now ~830 TB usable

NFS space on5 Oracle ZFS appliances (HA)and 2 Dell servers.Total of ~3.4 PB usable space