JASMIN/CEMS and EMERALD
Scientific Computing Developments at STFCPeter Oliver, Martin Bly
Scientific Computing DepartmentOct 2012
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Outline
• STFC• Compute and Data• National and International Services• Summary
Isaac Newton Group of TelescopesLa Palma
UK Astronomy Technology CentreEdinburgh
Polaris HouseSwindon, Wiltshire
Chilbolton ObservatoryStockbridge, Hampshire
Daresbury LaboratoryDaresbury Science and Innovation CampusWarrington, Cheshire
Joint Astronomy Centre Hawaii
Rutherford Appleton LaboratoryHarwell Oxford Science and Innovation Campus
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
What we do….
• The nuts and bolts that make it work• enable scientists, engineers and researcher to develop
world class science, innovation and skills
SCARF
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Providing Resources for STFC Facilities, Staff and their collaborators• ~2700 Cores• Infiniband• Panasas filesystem• Managed as one entity• ~50 peer reviewed publications/year
• Additional capacity per year for general use• Facilities such as CLF add capacity using their own
funds• National Grid Service partner
• Local access using Myproxy-SSO• Users use federal id and password to login
• UK e-Science Certificate access
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• NSCCS (National Service Computational Chemistry Software)
• Providing National and International Compute, Training and support
• EPSRC Mid-Range Service– SGI Altix UV SMP system, 512 CPUs, 2TB shared
memory• Large memory SMP chosen over a traditional cluster
as this best suites the Computational Chemistry Applications
• Supports over 100 active users– ~70 peer reviewed papers per year– Over 40 applications installed
• Authentication using NGS technologies• Portal to submit jobs
– access for less computationally aware chemists
Tier-1 Architecture
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
CPUATLASCASTOR
CMSCASTOR LHCB
CASTORGENCASTOR
SJ5
Storage Pools
• >8000 processor cores• >500 disk servers (10PB)• Tape robot (10PB)• >37 dedicated T10000 tape drives
(A/B/C)
OPN
E-infrastructure South
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Consortium of UK universities• Oxford, Bristol, Southampton, UCL• Formed the Centre for Innovation
• With STFC as a partner• Two New Services (£3.7M)
• IRIDIS – Southampton – x86-64• EMERALD – STFC – GPGPU Cluster
• Part of larger investment in e-infrastructure• A Midland Centre of Excellence (£1M). Led by Loughborough University• West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led
by the University of Strathclyde• E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester• MidPlus: A Centre of Excellence for Computational Science, Engineering and
Mathematics (£1.6 M). Led by the University of Warwick
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Providing Resources to Consortium and partners• Consortium of UK universities
• Oxford, Bristol, Southampton, UCL, STFC• Largest production GPU facility in UK
• 372 Nvidia Telsa M2090 GPUs• Scientific Applications
• Still under discussion• Computational Chemistry front runners
• AMBER• NAMD• GROMACS• LAMMPS
• Eventually 100’s of applications covering all sciences
EMERALD
EMERALD
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• 6 racks
EMERALD HARDWARE I
• 15 x SL6500 chassis:– 4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090
GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW • SL6500 scalable line chassis • 4 x 1200W power supplies, 4 fans• 4 x 2U, half-width SL390 servers
– SL390s nodes• 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) • 3 x NVidia M2090 GP-GPUs (512 CUDA cores)• 48GB DDR-3 memory • 1 HDD 146GB SAS 15k drive • HP QDR Infiniband & 10GbE ports • Dual 1Gb network ports
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD HARDWARE II
• 12 x SL6500 chassis, – 2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090
GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW.Twelve Chassis• SL6500 scalable line chassis • 4 x 1200W power supplies, 4 fans• 2 x 4U, half-width SL390 servers
– SL390s nodes• 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) • 8 x NVidia M2090 GP-GPUs (512 CUDA cores)• 96GB DDR-3 memory • 1 HDD 146GB SAS 15k drive • HP QDR Infiniband & 10GbEthernet• Dual 1Gb network ports
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• System Applications• RedHat Enterprise 6.x• Platform LSF• CUDA tool kit
• SDK and libraries• Intel and Portland Compilers
• Scientific Applications• Still under discussion• Computational Chemistry front runners
• AMBER• NAMD• GROMACS• LAMMPS
• Eventually 100s of applications covering all sciences
•
EMERALD
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Managing a GPU cluster• GPUs are more power efficient and give more Gflops/Watt than x86_64 servers
• Reality……True……But each 4 U Chassis: • ~1.2 kW/U space• Full rack required 40+ kW!• Hard to cool
• Additional in row coolers• Cold aisle containment
• Uneven power demand• Stresses aircon and power infrastructure
• 240 GPU job• 31kW Cluster idle to 80kW instantly
• Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W
• Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W
CEDA data storage & services• Curated data archive• Archive management services• Archive access services (HTTP, FTP, Helpdesk, ...)
Data intensive scientific computing• Global / regional datasets & models• High spatial, temporal resolution• Private cloud
Flexible access to high-volume & complex data for climate & earth observation communities• Online workspaces• Services for sharing & collaboration
JASMIN/CEMS
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Deadline (or funding gone!): 31st March 2012 for “doing science”• Government Procurement : £5M Tender to order < 4 weeks• Machine room upgrades + Large Cluster compete for time• Bare floor to operation in 6 weeks• 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL• “Doing science” 14th March• 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC
600TB)
Oct 2011 ... 8-Mar-2012 BIS Funds Tender Order Build Network Complete
JASMIN/CEMS
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS at RAL
`
JAS
MIN
JAS
MIN
JAS
MIN
30kWIn-R
ow
Cooling
30kWIn-R
ow
Cooling
300300
JAS
MIN
JAS
MIN
JAS
MIN
JAS
MIN
JAS
MIN
JAS
MIN
30kWIn-R
ow
Cooling
30kWIn-R
ow
Cooling
30kWIn-R
ow
Cooling
JAS
MIN
JAS
MIN
JAS
MIN
30kWIn-R
ow
Cooling
- 12 Racks w. Mixed Servers and Storage- 15KW/rack peak (180KW Total)
- Enclosed cold aisle + in-aisle cooling- 600kg / rack (7.2 Tonnes total)- Distributed 10Gb network
- (1 Terabit/s bandwidth)- Single 4.5PB global file system- Two VMware vSphere pools of servers with
dedicated image storage.- 6 Weeks bare floor to working 4.6PB.
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN / CEMS Infrastructure
Configuration:
Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total).
Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers
Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers
Physical: 12 Racks. Enclosed aisle, in-row chillers
Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs)
High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute
Single Namespace Solution: one single file system, managed as one system
Status: The largest Panasas system in the world and one of the largest storage deployments in the UK
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS Networking
• Gnodal 10Gb Networking– 160 x 10Gb Ports
• in a 4 x GS4008 switch stack• Compute
• 23 Dell servers for VM hosting• (VMware vCentre + vCloud) and HPC
access to storage.• 8 Dell Servers for compute
• Dell Equallogic iSCSI arrays (VM images)
• All 10Gb connected.
• Already upgraded 10Gb network• to add 80 more Gnodal 10Gb ports• Compute expansion
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
What is Panasas Storage?
• “A complete hardware and software storage solution”
• Ease of Management– Single Management Console for 4.6PB
• Performance– Parallel access via DirecFlow, NFS, CIFS– Fast Parallel reconstruction
• ObjectRAID– All files stored as objects.– RAID level per file– Vertical, Horizontal and network parity
• Distributed parallel file system– Parts (objects) of files on every blade– All blades transmit/receive in parallel
• Global Name Space• Battery UPS
– Enough to shut down cleanly.• 1x 10Gb Uplink per shelf
– Performance scales with size
Director Blade
Storage Blades
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
PanActive Manager
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Panasas in Operation
• Reliability– 1133 Blades– 206 Power Supplies– 103 Shelf Network switches– 1442 components
• Soak testing revealed 27 faults• In Operation 7 faults
– No loss of service – ~0.6% failure per year– Compared to commodity
storage ~5% per year
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Performance– Random IO 400MB/s per host– Sequential IO 1Gbyte/s per
host
• External Performance – 10Gb connected– Sustained 6Gp/s
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
• Backups• System and User Data• SVN
• Codes and documentation• Monitoring
• Ganglia, Cacti, Power-management• Alerting
• Nagios• Security
• Intrusion detection, patch monitoring• Deployment
• Kickstart, LDAP, inventory database• VMware
• Server consolidation,extra resilience• 150+ Virtual servers• Supporting all e-Science activities
• Development Cloud
• ~
Infrastructure SolutionsSystems Management
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
e-Infrastructures
• Lead role in National and International e-infrastructures• Authentication
• Lead and Develop UK e-Science Certificate Authority• Total issued ~30,000• Current~3000
• Easy integration of UK Access Management Federation
• Authorisation• Use existing EGI tools
• Accounting• Lead and develop EGI APEL accounting
• 500M Records, 400GB data• ~282 Sites publish records• ~12GB/day loaded into the main tables• Usually 13 months but Summary data since 2003
• Integrated into existing HPC style services
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
e-Infrastructures
• Lead role in National and International e-infrastructures• User Management
• Lead and develop NGS UAS Service• Common portal for project owners• Manage Project and User Allocations
• Display trends, make decisions (policing)
• Information, what services are available?• Lead and develop the EGI information portal GOCDB • 2180 registered GOCDB users belonging to 40 registered NGIs• 1073 registered sites hosting a total of 4372 services• 12663 downtime entries entered via GOCDB
• Training & Support• Training Market place
• tool developed to promote training opportunities, resources and materials• SeIUCCR Summer Schools
• Supporting 30 students for 1 week Course (120 Applicants)
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Summary
• High Performance Computing and Data• SCARF• NSCCS• JASMIN• EMERALD• GridPP – Tier1
• Managing e-Infrastructures• Authentication, Authorisation, Accounting• Resource discovery• User Management, help and Training
19th October 2012 JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Information
• Website• http://www.stfc.ac.uk/SCD
• Contact: Pete Oliver• peter.oliver at stfc.ac.uk
Questions?
Top Related