Research Computing University Of South Florida Providing Advanced Computing Resources for Research...

49
Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration

Transcript of Research Computing University Of South Florida Providing Advanced Computing Resources for Research...

Page 1: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Research ComputingUniversity Of South Florida

Providing Advanced Computing Resources for Research

and Instructionthrough

Collaboration

Page 2: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Mission

• Provide advanced computing resources required by a major research universityo Softwareo Hardwareo Trainingo Support

Page 3: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

User Base

• 40 Research groups• 6 Colleges• 100 faculty• 300 students

Page 4: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Hardware

• System was build on the condominium model and consists of 300 Nodes 2400 Processorso University provides infrastructure and some

computational resourceso Faculty funding provides bulk of computational

resources

Page 5: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Software

• Over 50 scientific codes o Installationo Integrationo Upgradeso Licensing

Page 6: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Support Personnel

• Provide all systems administration• Software support• One-on-one consulting• System efficiency improvements• Users are no longer just the traditional “number crunchers

Page 7: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Current Projects

• Consolidating the last standalone cluster (of appreciable size)

• Advanced Visualization Centero Group of 19 Faculty applied for fundingo Personnelo Trainingo Large Resolution 3D display

Page 8: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Current Projects

• New computational resourceso Approximately 100 nodeso GPU resourceso Upgrade parallel file system

• Virtual Clusterso HPC for the other 90 %

• FACC

Page 9: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Florida State University's Shared HPC

Building and Maintaining Sustainable Research

Computing at FSU

Page 10: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Shared-FSU HPC Mission

• Support multidisciplinary research• Provide a general access computing

platform• Encourage cost sharing by departments

with dedicated computing needs• Provide a broad base of support and

training opportunities

Page 11: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Turn-key Research SolutionParticipation is Voluntary

• University provides staffing• University provides general infrastructure

o Network fabricso Rackso Power/Cooling

• Additional buy-in incentiveso Leverage better pricing as a group o Matching funds

• Offer highly flexible buy-in optionso Hardware purchase onlyo Short-term Service Level Agreementso Long-term Service Level Agreements

• Shoot for 50% of hardware costs covered by Buy-in

Page 12: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.
Page 13: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Research Support @ FSU

• 500 plus users • 33 Academic Units• 5 Colleges

Page 14: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Owner Groups• 2007

o Department of Scientific Computingo Center for Ocean-Atmosphere Prediction Studieso Department of Meteorology

• 2008 o Gunzburger Group (Applied Mathematics)o Taylor Group (Structural Biology)o Department of Scientific Computingo Kostov Group (Chemical & Biomedical Engineering)

• 2009 o Department of Physics (HEP, Nuclear, etc.)o Institute of Molecular Biophysicso Bruschweiler Group (National High Magnetic Field Laboratory)o Center for Ocean-Atmosphere Prediction Studies (with the Department of Oceanography)o Torrey Pines Institute of Molecular Studies

• 2010o Chella Group (Chemical Engineering)o Torrey Pines Institute of Molecular Studieso Yang Group (Institute of Molecular Biophysics)o Meteorology Departmento Bruschweiler Groupo Fajer Group (Institute of Molecular Biophysics)o Bass Group (Biology)

Page 15: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Research Support @ FSU

• Publicationso Macromoleculeso Bioinformaticso Systematic Biologyo Journal of Biogeographyo Journal of Applied Remote Sensingo Journal of Chemical Theory and Computationo Physical Review Letterso Journal of Physical Chemistryo Proceeding of the National Academy of Scienceo Biophysical Journalo Journal Chemical Theory Computationo Journal: J. Phys. Chem.o PLoS Pathogenso Journal of Virologyo Journal of the American Chemical Societyo The Journal of Chemical Physicso PLoS Biologyo Ocean Modelingo Journal of Computer-Aided Molecular Design

Page 16: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Sliger Data Center

Shared-HPC pfs

FSU’s Shared-HPCStage 1: Infiniband Connected Cluster

Page 17: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Single and Multiprocessor UsageYear 1

Page 18: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

DSL BuildingSliger Data Center

Shared-HPC pfs

Condor

FSU’s Shared-HPCStage 2: Alternative Backfilling

Page 19: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Backfilling Single Proc Jobs on Non-HPC Resources Using Condor

Page 20: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Condor Usage

• ~1000 processor cores available for single processor computations

• 2,573,490 processor hours used since Condor was made available to all HPC users in September

• Seven users have been using Condor from HPC

• Dominate users are Evolutionary Biology, Molecular Dynamics, and Statistics (same users that were submitting numerous single proc. jobs)

• Two workshop introducing it to HPC users

Page 21: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Single vs. Multi-processor JobsYear 2

Page 22: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Single vs. Multi-processor JobsYear 3

Page 23: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

DSL BuildingSliger Data Center

Shared-HPC pfs

Condor

SMP

FSU’s Shared-HPCStage 3: Scalable SMP

Page 24: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

FSU’s Shared-HPCStage 3: Scalable SMP

• One MOAB Queue for SMP or very large memory jobs

• Three “nodes”o M905 blade with 16 cores and 64GB memo M905 blade with 24 cores and 64GB memo 3Leaf system with up to 132 cores and 528 GB

mem

Page 25: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

DSL Building

DSL Data Center

Sliger Data Center

Shared-HPC pfs

Condor

SMP

2°fs

Vis

Page 26: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Interactive ClusterFunctions

• Facilitates data exploration• Provides venue for software not well suited for

a batch scheduled environmento (e.g., some MatLab, VMD, R, Python, etc.)

• Provides access to hardware not typically found on standard desktops/laptops/mobile devises (e.g. lots of memory, high-end GPUs)

• Provides licensing and configuration support for software applications and libraries

Page 27: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Interactive ClusterHardware Layout

• 8 high-end CPU based host nodeso Multi-core Intel or AMD processorso 4 to 8 GB of memory per coreo 16X PCIe connectivityo QDR IB connectivity to Luster storageo IP (read-only) connectivity to Panasaso 10 Gbps connectivity to campus network backbone

• One C410x external PCI chassiso Compacto IPMI managemento Supports up to 16 NVIDIA Tesla M2050

Up to 16.48 teraflops

Page 28: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

DSL Building

DSL Data Center

Sliger Data Center

Shared-HPC pfs

Condor

SMP

2°fs

Vis

Db.Web

Page 29: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Web/Database HardwareFunction

• Facilitates creation of Data analysis Pipelines/Workflows

• Favored by external funding agencieso Demonstrated cohesive Cyberinfrastructure o Fits well into required Data Management Plans

(NSF) • Intended to facilitate access to data on

Secondary storage or cycles on owner share of HPC

• Basic Software Install, no development support

• Bare Metal or VM

Page 30: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Web/Database HardwareExamples

Page 31: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Web/Database HardwareExamples

Page 32: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

FSU Research CI

HPC

HTC

SMP

1° storage

2°Storage

Vis and interactive

DB and Web

Page 33: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Florida State University's Shared HPC

• Universities are by design multifaceted and lack a singular focus of support

• Local HPC resources should also be multifaceted and have a broad basis of support

Page 34: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

University of Florida

HPC Center

Page 35: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Short history

• Started in 2003• 2004 Phase I:

CLAS – Avery – OIT• 2005 Phase IIb:

o COE – 9 investors• 2007 Phase IIb:

o COE – 3 investors• 2009 Phase III:

o DSR – 17 investors - ICBR - IFAS • 2011 Phase IV:

o 22 investors

Page 36: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Budget

• Total budgeto 2003-3004 $0.7 Mo 2004-2005 $1.8 Mo 2005-2006 $0.3 Mo 2006-2007 $1.2 Mo 2007-2008 $1.6 Mo 2008-2009 $0.4 Mo 2009-2010 $0.9 M

Page 37: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Hardware

• 4,500 cores• 500 TB storage• InfiniBand connected• In three machine rooms

o Connected by 20 Gbit/sec Campus Research Network

Page 38: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

System software

• RedHat Enterprise Linux o through free CentOS distributiono upgrade once per year

• Lustre file systemo mounted on all nodeso Scratch onlyo Provide backup through CNS service

Requires separate agreement between researcher and CNS

Page 39: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Other software

• Moab scheduler (commercial license)• Intel compilers (commercial license)• Numerous applications

o Open and commercial

Page 40: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Operation

• Shared cluster• some hosted systems• 300 users• 90% - 95% utilization

Page 41: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Investor Model

• Normalized Computing Unito $400 per NCUo Is one coreo In fully functional system (RAM, disk, shared

file system)o For 5 years

Page 42: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Investor Model

• Optional Storage Unito $140 per OSUo 1 TB of file storage (RAID) on one of a few

global parallel file systems (Lustre)o For 1 year

Page 43: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

HPC Summit

Other options

• Hosted systemo Buy all hardware, we operateo No sharing

• Pay as you goo Agree to pay monthly billo Equivalent (almost) to $400 NCU prorated on a

monthly basis• Or rates are 0.009 cents per hour

o Cheaper than Amazon Elastic Cloud

Page 44: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

www.ccs.miami.edu

Page 45: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Mission Statement

• UM CCS is establishing nationally and internationally recognized research programs, focusing on those of an interdisciplinary nature, and actively engaging in computational research to solve the complex technological problems of modern society. We provide a framework for promoting collaborative and multidisciplinary activities across the University and beyond

Page 46: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

CCS overview

• Started in June 2007• Faculty Senate approval in 2008• Four Founding Schools: A&S, CoE,

RSMAS, Medical• Offices in all Campus• ~30 FTEs• Data Center at the NAP of Americas

Page 47: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

UM CCS Research Programs and Cores

Physical Science&

Engineering

Computational Biology&

Bioinformatics

Data Mining

Visualization

ComputationalChemistry

SoftwareEngineering

High PerformanceComputing

Social Systems Informatics

Page 48: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

Quick Facts• Over 1,000 UM users• 5,200 cores of Linux Based Cluster• 1,500 cores of Power-based Cluster• ~2.0 PT of Storage• 4.0 PT of Back-up • More at:

o http://www.youtube.com/watch?v=JgUNBRJHrC4

o www.ccs.miami.edu

Page 49: Research Computing University Of South Florida Providing Advanced Computing Resources for Research and Instruction through Collaboration.

High Performance Computing

• UM Wide Resource Provides Academic Community & Research Partners with Comprehensive HPC Resources:o Hardware & Scientific Software Infrastructureo Expertise in Designing & Implementing HPC Solutionso Designing & Porting Algorithms & Programs to Parallel

Computing Models• Open Access of compute processing (first come

serve)o Peer Review for large projects – Allocation Committeeo Cost Center for priority access

• HPC serviceso Storage Cloudo Visualization and Data Analysis Cloud o Processing Cloud