Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... ·...
Transcript of Spectrum Scale for Life Science Workloadfiles.gpfsug.org/presentations/2017/SC17/Spectrum... ·...
Page 1
Dr. Alf Wachsmann IBM Spectrum Scale User Group Meeting
Head of Scientific Computing Nov 12, 2017
Spectrum Scale for Life Science Workload
Page 2
Helmholtz Association
Largest research organization in Germany
18 national research centers (6 withfocus on Health Research)
More than 38,000 staff Budget: more than € 4 bn
Max Delbrück Center for Molecular Medicine
MDC Core Budget 2016 € 87.5 M
Extramural Funding 2016
Shared 2nd place for DFG-funding nationwide€ 30.7 M
European Research Council (ERC) Grants 15
Research Groups 65
Staff (incl. guests), 25% international 1,660
Postdocs (incl. guests), 50% international 230
PhD Students (incl. guests), > 50% international 360
Patent families 95
Max Delbrück Center forMolecular Medicine in the
Helmholtz Association (MDC)
Helmholtz Centre forInfection Research (HZI)
German Center forNeurodegenrativeDiseases (DZNE)
German Cancer Research Centre
(DKFZ)
Helmholtz Centre Munich –German Research Center for
Environmental Health (HMGU)
Bas
ic F
acts
20
16
(HZB)
(UFZ)
Page 3
RESEARCH AT MDC
cell and molecular biology, signaling pathways, developmental biology, physiology, structural biology, omics, systems biology, bioinformatics
Cancer Diseases of the nervous system
Cardiovascular and metabolic disease
Medical Systems Biology
Page 4
INTERNAL DATAFLOWS
Gene Sequencers
Mass Spectrometers
Microscopes
Compute ClusterGPFS Clients
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0
2
1
3
5
4
6
8
7
9
11
10
Front
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
E S T
6 74 52 30 1 14 1512 1310 118 9
CIFS to Spectrum Scale CES
Fast Storage for„hot data“GPFS
Tape Archive (IBM TS4500 with 9 TS1150 andVersity StorageManager)
40 Gb/s Network
1 / 10 Gb/s Network
Slower Storagefor „cool data“NFS, CIFS
40 Gb/s Network
Page 5
• One storage system and one compute cluster for all of our use cases• Applications with very different I/O characteristics• Data of very different sizes
• Tried Lustre in my previous job for similar workload: small-block I/O didn‘t work well• Tried BeeGFS two years ago: seems to work well but not better than GPFS to switch
I/O CHARACTERISTICS
Page 6
• MDC is using only very few of the GPFS features– Quotas
– Snapshots
• Plan to use rules engine to find „cool data“ and notify users– Later maybe migrate the data automatically
• Will investigate changing HSM to IBM Spectrum Protect (Tivoli Storage Manager) for better integration with GPFS next year
GPFS FEATURES IN USE AT MDC
Page 7
Page 8
HPC
RFP is out for~70 compute nodes witha 40 Gb/s backbone and5 Tesla P100 GPUs
GPFS is now ~830 TB usable
NFS space on5 Oracle ZFS appliances (HA)and 2 Dell servers.Total of ~3.4 PB usable space