Major Systems at ANL
Bill Gropp
www.mcs.anl.gov/~gropp(virtual Remy Evard)
Current User FacilitiesChiba City – Linux Cluster for Scalabilty
OASCR funded. Installed in 1999.512 CPUs, 256 nodes, Myrinet, 2TB storage.
Mission: address scalability issues in system software, open source software, and applications code.
Jazz – Linux Cluster for ANL AppsANL funded. Installed in 2002. Achieved 1.1 TF sustained.350 CPUs, Myrinet, 20TB storage.
Mission: support and enhance ANL application community.50 projects.On the DOE Science Grid.
TeraGrid – Linux Cluster for NSF Grid UsersNSF funded as part of DTF and ETF.128 IA-64 CPUs for computing.
192 IA-32 CPUs for visualization. Mission: production grids, grid application code, visualization service.
Current TestbedsAdvanced Architectures Testbed
ANL LDRD Funding. Established in 2002.Experimental systems: FPGAs,Hierarchical Architectures, ...
Mission: explore programming models and hardwarearchitectures for future architectures.
Grid and Networking TestbedsI-WIRE: Illinois-funded Dark FiberParticipation in large number of Grid projects.Facilities at ANL include DataGrid, Distributed Optical Testbed, and others.Mission: Grids and networks as an enabling technology
for Petascale science.
Visualization and Collaboration FacilitiesAccessGrid, ActiveMural, Linux CAVE, others
Chiba City - the Argonne Scalable Cluster1 of 2 rows of Chiba City:
http://www.mcs.anl.gov/chiba/
256 computing nodes.512 PIII CPUs.
32 visualization nodes.
8 storage nodes.4TB of disk.
Myrinet interconnect.
Mission: Scalability and open source software testbed.
Systems Software Challenges• Scale Invariance
• Systems services need to scale to arbitrary large-scale systems (e.g. I/O, scheduling, monitoring, process management, error reporting, diagnostics etc.)
• Self-organizing services provides one path to scale invariance
• Fault Tolerance • Systems services need to provide sustained performance in spite of
hardware failures
• No-single point of control, peer-to-peer redundancy
• Autonomy• Systems services should be self-configuring, auto-updating and self-
monitoring
Testbed Uses• System Software
• MPI Process Management
• Parallel Filesystems
• Cluster Distribution Testing
• Network Research
• Virtual Node Tests
Testbed Software Development• Largely based on SSS component architecture and interfaces
• Existing resource management software didn’t meet needs
• SSS Component architecture allowed easy substitution of system software where required
• Simple interfaces allow fast implementation of custom components (resource manager)
• Open architecture allows implementation of extra component based in local requirements (file staging)
Accounting
Event Manager*
ServiceDirectory*
MetaScheduler
MetaMonitor
MetaManager
Scheduler*Node StateManager*
AllocationManagement*
Process Manager*
UsageReports
Meta Services
System &Job Monitor
Job QueueManager*
NodeConfiguration
& BuildManager*
CommunicationLibrary*
Checkpoint /Restart
Validation & Testing
HardwareInfrastructure
Manager*
Chiba City Implementation
Software Deployment Testing• Beta software run in production
• Testbed software stack
• Configuration management tools
• Global process manager
• Cluster distribution installation testing
• Friendly users provide useful feedback during the development process
The ANL LCRC Computing Clusterhttp://www.lcrc.anl.gov
350 computing nodes: 2.4 GHz Pentium IV 50% w/ 2 GB RAM 50% w/ 1 GB RAM 80 GB local scratch disk Linux
10 TB global working disk: 8 dual 2.4 GHz Pentium IV servers 10 TB SCSI JBOD disks PVFS file system
10 TB home disk: 8 dual 2.4 GHz Pentium IV servers 10 TB Fiber Channel disks GFS between servers NFS to the nodes
Network: Myrinet 2000 to all systems Fast Ethernet to the nodes GigE aggregation
Support: 4 front end nodes: 2x 2.4 GHz PIV 8 management systems
1Gb to ANL
LCRC enables analysis of complex systems
Regional Aerosol Impacts
3D Numerical Reactor
Spatio-Temporal Chaos
Catalysis in Nanoporous Materials
LCRC enables studies of system dynamics
Neocortical Seizure Simulation Aerodynamic Drag for Heavy Vehicles
Sediment TransportLattice Quantum-Chromodynamics
Jazz Usage – Capacity and Load
• We’ve reached the practical capacity limit given the job mix.• There are always jobs in the queue. Wait time varies enormously, averaging ~ 1 hr.
0
10
20
30
40
50
60
70
80
90
100
Perc
en
t all
ocate
d
Jazz Usage – Accounts
0
50
100
150
200
250
300
10-Dec-02 29-Jan-03 20-Mar-03 9-May-03 28-Jun-03 17-Aug-03 6-Oct-03 25-Nov-03
User
Acco
un
ts
New Users
Total Users
• Constant growth of ~15 users a month.
Jazz Usage – Projects
Feb
ruar
y
Mar
ch
Apr
il
May
June
July
Aug
ust
Sep
tem
ber
05
1015202530
3540
45
New Projects
Total Projects
Steady addition of ~ 6 new projects a month.
Physics
Biology
Climate Modeling
Applied Mathematics
Nanotechnology
Software Development
Chemistry
Other Engineering Apps
Grid Development & App
Environmental Science
Nuclear Engineering
Support
Material Sciences
Geology
FY2003 LCRC Usage by DomainA wide range of lab missions
Applied Math
Biosciences
ChemistryClimate
Nanosciences
Nuclear Engineering
Physics
Software Tools
Startup
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
CP
U H
ou
rs
Startup
System Support
Softw are Tools
Physics
Nuclear Engineering
Nanosciences
Material Sciences
Grid Computing
Geology
Environmental
Engineering
Computer Science
Climate
Chemistry
Biosciences
Applied Math
Jazz Usage by Domain over time
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
April May June July August September October
CP
U H
ou
rs
Jazz Usage – Large Projects (>5000 hrs)
Chaos - MCSAerosols - ER
Climate - MCS
QMC - PHYSediment - MCS
Neocortex Sim - MCS Protein - NE
Startup Projects Ptools
Nanocatalysis - CNM
Lattice QCD - HEP Compnano - CNM
Heights EUV - ET
COLUMBUSCHM
NumericalReactor - NE
PETSc
Foam - MCS
230 TB FCS SAN500 TB FCS SAN
256 2p Itanium2670 2p Madison
Myrinet
128 2p Itanium2256 2p Madison
Myrinet
ANL
NCSA
Caltech
SDSC PSC
100 TB DataWulf
ETF Hardware Deployment, Fall 2003http://www.teragrid.org
32 Pentium452 2p Itanium220 2p Madison
Myrinet
1.1 TF Power4Federation
20 TB96 GeForce4 Graphics Pipes
96 Pentium4 64 2p Madison
Myrinet
4p Vis75 TB Storage
750 4pAlpha EV68
Quadrics
128p EV7
Marvel
16 2p (ER)Itanium2Quadrics
2p 2.4 GHz4 GB RAM73 GB disk
Radeon 9000
ETF ANL: 1.4 TF Madison/Pentium IV, 20 TB, Viz
GbE FabricGbE Fabric
Myrinet Fabric
2p 2.4 GHz4 GB RAM 73 GB disk
Radeon 9000
Visualization.9 TF Pentium IV
96 nodes
2p Madison4 GB memory
2x73 GB
Compute.5 TF Madison
64 nodes
20 TB
Interactive Nodes
Login, FTP4 2pPIVNodes
4 4pMadisonNodes
30 Gbps to TeraGrid Network
2p Madison4 GB memory
2x73 GB
8 2x FC
250MB/s/node * 64 nodes250MB/s/node * 96 nodes
2p 2.4 GHz4 GB RAM
Storage Nodes
Storage I/Oover Myrinet and/or GbE
Viz DevicesNetwork Viz
96 visualization streams
Viz I/Oover Myrinet and/or GbE
To TG network.
Top Related