The past, present, and future of HPC in life sciences
Erich Birngruber, Ümit Seren
Gregor Mendel Institute for Molecular Plant Biology (GMI)
AHPC17
Who we are
- Basic research institute in plant sciences
- 9 independent research groups
- Employees 100 + 20 (scientific + admin)
- HPC Operations Team: 2 + 1 (engineer + lead)
Past: Beginnings as traditional HPC
Scientific computing at GMI
- Started in 2010- SGI ICE-X since 2013 (MENDEL)
(72 nodes, 144 today)- SGI UV2000- Rich software environment
(EasyBuild, lmod)- Keeping up with current
developments
Machine specs
3 generations of nodes:
- 72x 16c E5-2609, 192gb mem- 18x 20c E5-2680, 256gb mem- 54x 24c E5-2650, 256gb mem,
230gb ssd
UV2000: 96c E5-4617, 2tb mem
IB FDR interconnect (1 fabric)
Storage: Lustre 300tb, NetApp >1pb
Present: GMI site specifics
- Services: customers are biologists
- On campus initial training
- Consulting and support (w/ ticket system, intranet wiki)
- Software installations
- Provided as modules: different versions, repeatability
- This is getting harder with the demand for more complex software
- Monitoring software usage
Present: Monitoring software usage
- Software in env modules
- 460 software packages
in 1297 versions
- Monitoring module usage
(load, unload)
- Reporting by user, job, project
Present: Monitoring system activity
Monitoring and metrics
The foundation for all future decisions
- Resource consumption
- Capacity planning
- Software, technology usage
- Auditing
Alerting
Present: Applications & Appliances
Phenobox (in development)
- Web-interface, API
- MySQL (DB)
- DSLR, RaspberryPi
- HPC (computer vision, storage)
GWA-Portal (https://gwas.gmi.oeaw.ac.at)
- Web-interface, API
- Elasticsearch (fulltext search)
- PostgreSQL (DB)
- Docker (Python microservices)
- HPC (analysis, storage)
Galaxy (https://galaxyproject.org/)
- Web-interface, API
- MySQL (DB)
- Visualization
- HPC (analysis, storage)
PacBio SMRT Link
(https://github.com/PacificBiosciences/SMRT-Link)
- Web-interface, API
- MySQL (DB)
- HPC (analysis, storage)
Own developments: 3rd party software:
Present: new developments
Deployment of OpenStack (IaaS):
- Cross-vendor open source project- On-premises cloud- Provision VMs and containers- Deploy classic application services- Enables self-service for customers
Consequences:
- More heterogeneous use-cases- Customer base is increasing- Non-human “customers” of HPC- Services are more complex and
distributed over subsystems
Present: Problem 1: maintenance
- VMs are difficult to maintain
- Wrong abstraction for the use-case
- What is the next step?
- Containers?- Container Orchestration Engines?- Provide Software as a Service (SaaS)?
Fact is: the field is evolving
Future: Problem 2: integration
Applications sit on different islands:
HPC vs. Cloud
Drawbacks:
- Hard to maintain (infra)- Hard to debug (app)
Vision: converged compute platform.
Unified infrastructure to schedule all types of tasks
New challenges:
- Networking - Storage- IDM - Accounting
What do others do?
Container Orchestration Engine (Google Kubernetes, Docker Swarm, Apache Mesos)
First steps:
- Containers for HPC- Biocontainers http://biocontainers.pro- Singularity http://singularity.lbl.gov- Current status: test deployment
Contact / References:
Erich Birngruber <[email protected]>, @ebirn
Ümit Seren <[email protected]>, @timeu_s
GMI on Github:
https://github.com/Gregor-Mendel-Institute
Total recall: holistic metrics for broad systems performance and user experience visibility in a
data-intensive computing environment
https://dl.acm.org/citation.cfm?id=2835001
Acknowledgements
Gregor Mendel Institute of Molecular Plant BiologyDr Bohr-Gasse 31030 Vienna, Austria
Top Related