Experiences running a production Puppet infrastructure @CERN
Academic cloud experiences cern v4
Transcript of Academic cloud experiences cern v4
![Page 1: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/1.jpg)
Clouds at CERNTim Bell
Clouds at CERNTim Bell
Academic Cloud Experiences, 29th April 2013Academic Cloud Experiences, 29th April 2013T. Bell 1
![Page 2: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/2.jpg)
2
CERN was founded 1954: 12 European States“Science for Peace”
Today: 20 Member States
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO
~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF
~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF
T. Bell 2
![Page 3: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/3.jpg)
T. Bell 3
Is the Higgs boson the source of mass of our fundamental particles?
![Page 4: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/4.jpg)
T. Bell 4
Why is the universe made of matter
and not equal amounts of matter/antimatter?
![Page 5: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/5.jpg)
T. Bell 5
Dark Matter and Dark Energy?
TTWe do not know the
composition of 95% of the universe
Temperature of the universeWMAP satellite
![Page 6: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/6.jpg)
T. Bell 6
Blue tubes contain the two beam pipes and magnets at 1.8 degrees Kelvin
![Page 7: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/7.jpg)
T. Bell 7
ATLAS detector during construction in 2005
![Page 8: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/8.jpg)
T. Bell 8
Number of candidates (vertical axis)
Mass of the candidates(horizontal axis)
We observe an excess of candidates with a mass of 125 proton-
masses
Search for Higgs decays to 4 “leptons” (electrons or muons)
Also observed in the CMS experiment
![Page 9: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/9.jpg)
T. Bell 9
July 4, 2012
![Page 10: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/10.jpg)
The Worldwide LHC Computing Grid
Tier-1: permanent storage, re-processing, analysis
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysisTier-2: Simulation,end-user analysis
> 2 million jobs/day> 2 million jobs/day
~250’000 cores~250’000 cores
173 PB of storage173 PB of storage
nearly 160 sites, 35 countries
nearly 160 sites, 35 countries
10 Gb links10 Gb links
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysis
> 2 million jobs/day
~250’000 cores
173 PB of storage
nearly 160 sites, 35 countries
10 Gb links
WLCG:An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists
WLCG:An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicistsT. Bell 10
![Page 11: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/11.jpg)
IT Infrastructure Challenges
Staff numbers fixed Materials budget decreasing Increasing users of CERN’s facilities Legacy tools are high maintenance and brittle Additional data centre in Budapest now online
doubling potential capacity and 200GBit/s network
How do we scale from our current 11,000 servers within these constraints ?
T. Bell 11
![Page 12: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/12.jpg)
Approach
Remodel IT services on Cloud layered models IaaS, PaaS, SaaS
Move to commonly used open source tools Puppet,OpenStack,Foreman,Koji,Oz,Kibana, …
Implement clouds at scale IT aims for 15,000 hypervisors with 150,000 VMs
by 2015 Exploit ecosystem solutions such as LBaaS,
DBaaS, MQaaS rather than build our own
T. Bell 12
![Page 13: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/13.jpg)
Clouds in High Energy Physics
T. Bell 13
Long-term preservation of software and data of
HEP experiments
Utilize special computing resources
attached to the detectors
Simplify the management of heterogeneous in-
house resources
Use commercial clouds for exceptional
computing demands
Distributed cloud computing using HEP and non-HEP clouds
![Page 14: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/14.jpg)
Service Models
T. Bell 14
Pets are given names like pussinboots.cern.ch
They are unique, lovingly hand raised and cared for
When they get ill, you nurse them back to health
Cattle are given numbers like vm0042.cern.ch
They are almost identical to other cattle When they get ill, you get another one
Future application architectures tend towards Cattle but Pet support is needed for some specific zones of the cloud
![Page 15: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/15.jpg)
Refine Service Levels ?
T. Bell 15
Hippos are cattle with bulk storage. Useful where Cassandra or MongoDBensures redundancy
Canaries are cattle at high risk to give early warning of failures .. Deploy early, fail fast and fix
![Page 16: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/16.jpg)
Infrastructure Overview
T. Bell 16
Microsoft Active Directory
CERN DB on Demand
CERN Network Database
Account mgmt. system
Horizon
Keystone
NetworkCompute
Glance
Scheduler
Cinder
Nova
CERN Block Storage provider
![Page 17: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/17.jpg)
Dashboard using Horizon
T. Bell 17
![Page 18: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/18.jpg)
Timelines
Deploy as stable release becomes available in EPEL
Keep up to date but not too close Benefit from continuous integration testing of
other companies
T. Bell 18
Grizzly
' 12 Jan2013 Feb Apr May … Oct Dec ' 13
Today HavanaOct, 2013
Havana ServiceNov/Dec, 2013
Apr 4, 2013
Grizzly ServiceMay, 2013
IbexFeb, 2013
FolsomSep 27, 2012
![Page 19: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/19.jpg)
Status CERN IT OpenStack Cloud
Running Folsom around 500 hypervisors on KVM and Hyper-V
High availability using load balancing 75 users creating around 50 new VMs/day
Experiment farms CMS currently running 1,300 hypervisors with
50,000 cores using Essex ATLAS starting to ramp up to a similar size
Other HEP sites moving to private cloud Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP,
…T. Bell 19
![Page 20: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/20.jpg)
Next Steps (I)
Move to Grizzly Target end May 2013
Enable Kerberos and X.509 authentication Avoids users having to enter passwords
Recycle existing hardware and scale using cells Can recycle around 100 batch machines to
hypervisors/week
T. Bell 20
![Page 21: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/21.jpg)
Cells
T. Bell 21
![Page 22: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/22.jpg)
We’re not alone …
T. Bell 22
Already 6 sites running more than 10,000 hypervisors according to the latest OpenStack user survey
![Page 23: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/23.jpg)
Next Steps (II) Block Storage for Hippos and Pets
Cinder with Ceph, NetApp or GlusterFS Heat for Orchestration and auto-scaling Load Balancing as a Service Bare-Metal to bring all servers under
OpenStack Move ceilometer into production
Accounting by project Move to wall-clock, vCPU metering
T. Bell 23
![Page 24: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/24.jpg)
Cost Model CERN computing is funded from CERN central
budgets, no billing but quotas
T. Bell 24
IT resource manager
Experiment resource managers
Project Management
![Page 25: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/25.jpg)
Quota Management
What to do when quota is exceeded ? No credit card
If capacity is not used ? Spot market on low SLA conditions
Fair share across the cloud ? Worked for supercomputers but heavy for clouds
at scale Bursting to public clouds an option ?
IT provisioned or experiment decision
T. Bell 25
![Page 26: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/26.jpg)
Cloud of clouds: the next big step What is required to get to a cloud of clouds ?
Federated identity Image conversion and sharing API standardisation SLAs Security models
Many initiatives investigating this at different levels Public/Private bursting Private/Private sharing (as the grid) Homogeneous and Heterogeneous
We will see intensive efforts in this area over the coming year
T. Bell 26
![Page 27: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/27.jpg)
Conclusions
Clouds provide a framework for re-engineering how IT is delivering responsive services to the physicists
OpenStack and the ecosystem provide a suitable solution with flexibility and opportunity to contribute as well as benefit from work of others
Migration via re-cycling bare-metal to hypervisors provides a smooth transition
Cloud of clouds has potential to replace grid computing models in the future
T. Bell 27
![Page 28: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/28.jpg)
Questions?Questions?
T. Bell 28
![Page 29: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/29.jpg)
BACKUP SLIDES
![Page 30: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/30.jpg)
Job Opportunities
T. Bell 30
![Page 31: Academic cloud experiences cern v4](https://reader034.fdocuments.us/reader034/viewer/2022052523/556219acd8b42af2128b55bd/html5/thumbnails/31.jpg)
Science is getting more and more global
CERN: x staff, x fellows
T. Bell 31