CERN openlab Board of Sponsors Update on Computing at CERN
description
Transcript of CERN openlab Board of Sponsors Update on Computing at CERN
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
CERN openlab Board of Sponsors Update on Computing at CERN
Frédéric HemmerIT Department Head
CERN2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 2
Storage & Data Management (I)
• LHC is just starting to take data– Modest experience in distributing and analyzing the
data worldwide• Already some concerns wrt. performance and scalability
– Initial assumptions in computing models that the network was the limitation
– Data Management software “home grown” and too complicated
• Concerns about the long term sustainability
• However– Network, (global) file systems, storage have evolved
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 3
• Just starting to assemble ideas– IT PoW 11/2009
http://indico.cern.ch/getFile.py/access?sessionId=16&resId=0&materialId=2&confId=68463
– WLCG Data Management Jamboreehttp://indico.cern.ch/conferenceDisplay.py?confId=82919
• Other communities have similar or even larger scale problems– Biology, SKA, etc...– EIROLabs
• European Commission is launching calls centred around data infrastructure– Opportunities for further collaborations
Storage & Data Management (II)
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 4
Tier 0 – Tier 1 – Tier 2Tier-0 (CERN):•Data recording• Initial data reconstruction•Data distribution
Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis
Tier-2 (~130 centres):• Simulation• End-user analysis
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
• WLCG running increasingly high workloads:– ~1 million jobs/day
• Real data processing and re-processing
• Physics analysis• Simulations
– ~100 k CPU-days/day• Unprecedented data rates
WLCG Status is:~100k CPU-days/day
Tier-0 data traffic:> 4 GB/s input> 13 GB/s served
Data export during data taking:- According to expectations on average
Traffic on OPN up to 70 Gb/s!- ATLAS reprocessing campaigns
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 6
The CERN “Tier-0” in numbers• Data Centre Operations (Tier 0)
– 24x7 operator support and System Administration services to support 24x7 operation of all IT services.
– Hardware installation & retirement• ~7,000 hardware movements/year; ~1000 disk failures/year
– Management and Automation framework for large scale Linux clusters
• AssetsServers 8,076
Processors 13,802
Cores 50,855
HEPSpec06 359,431
Disks 53,728
Raw disk capacity (TB) 45,331
Memory modules 48,794
RAID controllers 3,518
Xeon 51502%
Xeon 516010% Xeon
E53357%
Xeon E534514%
Xeon E5405
6%Xeon E541016%
Xeon L5420
8%
Xeon L552033%
Xeon 3GHz4%
Fujitsu3%
Hitachi23%
HP0%
Max-tor0% Seagate
15%
Western Digital59%
Other0%
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 7
Some concerns• Complex, aging infrastructure with many players
– (Too) much diversity
• Too many hardware movements– Operations teams overloaded– How to handle remote operations
• Virtualization– Handling 10x increase in host numbers– Many technologies– Security (signatures)
• Systems Management– Many home made developments– Significant cost in software development/maintenance
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 8
Tier-0 Power needs estimates
2 July 2010
May 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 9
Plans for 2010-2012
• Consolidation of the existing 513 capacity– 600 KW of backed-up power– 3.5 MW of Physics capacity
• Provision of “container” style of capacity– Incremental addition of ~400 KW units
• Investigations of remote capacity usage– As a logical extension of 513, not as Grid/Cloud capacity– Already experimenting 100 KW in the Geneva area– But is this really (economically) feasible?
• Many challenges– Timing, operation models, costs
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 10
Other topics/concerns/questions
• Computer Security– On-site – off-site
• Identity Management & Single Sign on– Including multi-factor authentication
• Cloud Computing– Can we make use of it? Economically?
• Critical data protection – backups/restores
– Some systems at the limit – O(10**9) files
• Content Management Systems & Enterprise Search
• ITIL– Slow progress – as expected
• Software licenses (cost and models)– Common concern with EIROs
• Wireless coverage and deployment– (unreasonable) expectations from users
• Further progress in automation– Use of industry solutions?
• Global File Systems– Are there alternatives to AFS?
• Wide area/high speed networking evolution
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 112 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 12
Collaboration with Institutions: UNOSAT
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 13
BACKGROUND INFORMATION
CERN IT Department
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 14
Outline
• General Services
• Collaborative Tools– CDS/Invenio– Indico
• Networking– Internal– Wireless– External– CIXP
• Computer Security– CNIC/PLC– Spam– Intrusion Detection
• Grid Computing– Tier-0– WLCG– Data Storage
• EC Projects
• Openlab– Competency Centres– Workshops – Summer
Student program
• UNOSAT
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Department Head
Frédéric Hemmer
Communication Systems
(CS)Jean-Michel
Jouanigot
Departmental Infrastructure
(DI)Alan Silverman
Database Services
(DB)Tony Cass
Experiment Support
(ES)Jamie Shiers
Computing Facilities
(CF)Wayne Salter
Operating Systems &
Information Services
(OIS)Christian Isnard
EU Projects
Bob Jones
Deputy Head
David Foster
Planning Officer
Alan Silverman
WLCG
Ian Bird
Data & Storage Services
(DSS)Alberto Pace
User & Document
Services (UDS)
Tim Smith
Platform & Engineering
Services(PES)
Helge Meinhard
Grid Technology(GT)
Markus Schulz
IT Organization 2010
Director of Research
andComputing
Sergio Bertolucci
CERN openlab
CERN School of Computing
Computer Security
Department Heads Office
(DHO)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 16
IT Staff Breakdown
Cat 2
73%
Cat 3
18%
Cat 5A1%
Cat 5B7%
Cat 5C0%
Total: 230 Staff13-Jan-2010
43 Fellows21 technical Students10 Project Associates1 Doctoral Student
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 17
General Services• Data Centre Operations (Tier 0)
– 24x7 operator support and System Administration services to support 24x7 operation of all IT services.
– Hardware installation & retirement (~7,000 hardware movements/year)– Management and Automation framework for large scale Linux clusters
– Installed Capacity• 6’300 systems, 39’000 processing cores
– CPU servers, disk servers, infrastructure servers
• 13’900 TB usable on 42’600 disk drives• 34’000 TB on 45’000 tape cartridges
– (56’000 slots), 160 tape drives
– Tenders in progress or planned (estimates)• 2’400 systems, 16’000 processing cores
– 19’000 TB usable on 20’000 disk drives
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 18
General Services (II)• E-Mail and Distribution Lists
– Up to 2 M incoming messages/day, 99% detected as spam– 18’000 mailboxes (~ 68% owned by physics community)
• Web Services– 8’725 Web sites (~45% owned by physics community, 30% AFS-based)
• Active Directory, CERN Certification Authority & CERN Authentication– Central authentication service for Linux and Windows computers and applications– Online X509 Certificate Authority
• Windows Services– 60 TB of DFS workspaces (60 TB)– ~ 6’000 active PCs managed by CMF
• Windows Terminal Servers and Custom Servers – 120 ‘custom servers’ (not for public use) hosted for various departments– including 62 Windows Terminal Servers
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 19
OIS services significant numbers
• 15000 Linux systems (Quattor managed or updating from linuxsoft.cern.ch)5700 In the Computer Center3100 elsewhere at CERN6200 outside CERN
• 6’000 active NICE PCs, >1’500 Macs.• Infrastructure of 300+ servers,
including 120 ‘custom servers’ hosted for various dept.including 62 Windows Terminal Servers.
• 60 TB DFS workspace including 30 TB for Media Archive in collaboration with UDS, 15 TB Home Directories, 15 TB Project workspaces.
• 18’000 mailboxes, ~ 8’000 e-groups, 3.6TB of mail data, 40 production mail servers, ~ 2 M incoming messages/day, ~ 99% detected as spamFax service, 3’000 faxes/month, 1’700 users.
• 8’725 Web sites including 845 SharePoint sites, 35 production Web servers,5.6 M hits/day, 2.2 TBytes/day transferred in June 2009
• CERN Certification Authority: 5’000 user certificates, 9’000 host certificates issued.
• CERN Authentication usage increased 50’000 authentications/day, 180 applications registered.
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 20
General Services (III)• Database and Application Deployment Services
– Mainly based on Oracle software– AIS DBs and Applications, EDMS, Accelerator DBs, IT DBs, CASTOR DBs, Physics databases (Calibration,
Alignment, etc...), Public J2EE Service, etc...• 120 General Purpose Databases, 240 TB of NAS storage• 130 Web /Application Servers with 700 virtual hosts• 50 Terabytes of worldwide replicated Physics databases
• Engineering and Software Development Services– Mechanical and electronic CAE, field calculations, structural analysis, simulations, mathematics, etc
• 50 packages, 1000 users– Twiki Service
• 6000 users, 36’000 pages updated per month– Version Control Services (CVS/SVN)
• 2000 users, 200 projects
• Audiovisual Service: support, record and archive official committees and events • VideoConference Service: provide video conferencing in rooms across site• Video Conferencing System (Indico)
– Distributed and used worldwide• CDS-Invenio, a Digital Library Open Source Software produced, used and maintained at CERN
– free support via mailing lists– commercial-like support via a maintenance contract
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 21
Invenio• Comprehensive solution for the management of document
repositories of moderate to large size
• Currently installed and in use by over a dozen scientific institutions worldwide. Some examples:– MeIND - HBZ NRW, Cologne, Germany– EPFL Infoscience - Lausanne, Switzerland– Aristotle University of Thessaloniki – Greece– Dipòsit de Documents, Universitat Aut. de Barcelona, Spain– RomDoc - UPB-CTTPI, Bucharest, Romania– Repozytorium Eny Politechnika - Wroclaw Univ. of Tech., Poland– Pacific Rim Library - Hong Kong – Academic Repository of Rwanda – KIST, Kigali, Rwanda– Being deployed: AstroParticle Data System, NASA/Smithsonian, ILO
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 22
Invenio in Africa
• A digital library workshop initiated by UNESCO was organized in Sep 2009 in Kigali to train librarians on how to use CDS-Invenio, attended by librarians from Rwanda, Cameroun, Ghana and Mozambique
• The Academic Repository of Rwanda was set up and additional training was provided by CERN
• A fruitful collaboration is on-going with the Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
• Invenio solution is under consideration for the Documentation Centre on Genocide documents and for the Parliament of Ghana documentation
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 23
Networking• Design, implementation and support of CERN’s internal and external
networking infrastructure in support of desktop, technical and scientific computing– Several 10 Gbps backbones in a multi manufacturer environment:
• GPN, TN, ENs, LCG and External networks• Switching capacity of the internal LCG network is 4.8Tbps
– Deployment and maintenance of network star points and wireless network services
• More than 400 star points and ~50 000 UTP sockets• ~450 wireless base stations
– Management CERN Firewall and provision of Internet connectivity• >10 Gbps Internet connectivity • Internal and external firewalling
– Development and management of tools for network and telecom monitoring, user request provisioning and issue tracking
• Central database for automatic network equipment configuration• About 19 000 network and telecom connectivity requests per year
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer
24
Fibre cut during STEP’09:Redundancy meant no interruption
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 25
Telephony and CIXP• Provision and support of telephony services
– Telephone exchange network of 10 000 lines• IP telephony, Audio conferencing, Switchboard, Call centres
– GSM Mobile services • Dedicated VPN of more than 4300 subscriptions• Including LHC Tunnel coverage
– VHF network for the fire brigade
• Integrated operation services for both network and telecom services– Several support contracts using same software tools
• Management of the CERN Internet Exchange (CIXP)– Around 40 clients (telco operators, institutions)
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 26
Computer Security
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 272 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer
Tier 0 at CERN: Acquisition, First pass processing Storage & Distribution
282 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer
WLCG Grid Computing
29
Tier-0 (CERN):•Data recording•Initial data reconstruction
•Data distribution
Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis
Tier-2 (~130 centres):• Simulation• End-user analysis
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 30
Data & Storage Services• The data management challenges
– Storing 15’000’000 gigabytes of data every year– Ensure that any file, including the smallest kilobyte is available, anywhere
from the internet, within a short time (small latency)– Cope with ever-changing storage technologies
• Long term data preservation• All past data must be kept readable for the future
• Castor• Software for data management at CERN (Castor) and for partner data centres• 1000 servers, 2000 cores, 10000 disks
• AFS distributed filesystem operations (30 TB, 500 Million files)
• Backups (~2 PB, 1 Billion files)
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 31
Data transfersFinal readiness test (STEP’09)
Preparation for LHC startup LHC physics data
Nearly 1 petabyte/week2009: STEP09 + preparation for data
Castor traffic:> 4 GB/s input> 13 GB/s served
Real data – from 30/3
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 32
Readiness of the computing
• Has meant very rapid data distribution and analysis– Data is processed and available at Tier 2s
within hours!
CMS
ATLAS
LHCb
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 33
• Running increasingly high workloads:– Jobs in excess of 650k / day;
Anticipate millions / day soon– CPU equiv. ~100k cores
• Workloads are:– Real data processing– Simulations– Analysis – more and more
(new) users
• Data transfers at unprecedented rates
Today WLCG is:
e.g. CMS: no. users doing analysis
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Grid Computing Now
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>150 VOs>150,000 jobs/day
• LCG has been the driving force for the European multi-science Grid EGEE (Enabling Grids for E-sciencE)
• EGEE is now a global effort, and the largest Grid infrastructure worldwide
• Co-funded by the European Commission (Cost: ~170 M€ over 6 years, funded by EU ~100M€)
• EGEE already used for >100 applications, including…
Impact of the LHC Computing Grid in Europe
352 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it36 Health-e-Child
Similarity Search
Temporal Modelling
Visual Data Mining
Genetics Profiling
Treatment Response
Inferring Outcome
Biomechanical ModelsTumor Growth Modelling
Semantic Browsing
Personalised Simulation
Surgery Planning
RV and LV Automatic Modelling
Measurement of Pulmonary Trunk
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 37
Grid related collaborating projects and other IT-EC projects
S O N D J F M A M J J A S O N D J F M A M J J A S O N D
1) BalticGrid-II2) D4Science3) D4Science-II Sep 2011
4) EGEE-III5) EGI_DS6) enviroGRIDS up to Mar 2013
7) ETICS 28) GridTalk9) Health-e-Child10) OpenAIRE up to Nov 2012
11) PARTNER up to Sep 2012
12) SEE-GRID-SCI
2009Active Projects 2010 2011
•~39 2-yr project-funded posts (~26 EGEE posts) – on project and IT group-related activities • Total requested EC contribution (except PARTNER project): 12.1 M EUR
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 44
Collaboration with Industry: CERN openlab
• A science – industry partnership to drive R&D and innovation
• Started in 2002, now in phase 3
Motto: “you make it – we break it”
• Evaluates state-of-the-art technologies in a very complex environment and improves them
• Test in a research environment today what will be used in industry tomorrow
• Training:• CERN School of Computing• openlab student programme• Topical seminars
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 45
CERN openlab phase III
Covers 2009-2011 Status
– Partners: HP, Intel, Oracle and Siemens Topics
– Global wireless coverage for CERN (HP Procurve)– Power-efficient solutions (Intel)– Performance Tuning (Oracle)– Control systems and PLC security (Siemens) Advanced storage systems and/or global file system
(partner to be identified) 100Gb/s networking (partner to be identified)
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 46
openlab people: students in 2009
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 47
Collaboration with Institutions: UNOSAT
2 July 2010
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/it
Update on Computing at CERN - Frédéric Hemmer 482 July 2010