Big data from the LHC commissioning: practical lessons from big science - Simon Metson (Cloudant)
-
Upload
jaxlondonconference -
Category
Technology
-
view
534 -
download
2
Transcript of Big data from the LHC commissioning: practical lessons from big science - Simon Metson (Cloudant)
Big Data from the LHC Commissioning
!
Practical Lessons from Big Science
Simon/@drsm79
Hello!
Bristol University Cloudant
Time at places I’ve worked
0
25
50
75
100
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Python Perl Bash C++ Java Javascript Fortran
The formula
G * E
The formulaFixed
Fixed Usually fixed
G* E
The formula
Grant * Effectiveness
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
D i g b i g t u n n e l s
C h a i n u p s e r i e s o f “ a t o m s m a s h e r s ”
P u t s e n s i t i v e c a m e r a s i n a w k w a r d p l a c e s
R e c o r d e v e n t s
Process data on high end machines
http://www.chilton-computing.org.uk
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
CMS online data flow
We have a big digital camera
It takes photos of this
courtesy of James Jackson
which come out like this
courtesy of James Jackson
CMS online data flow
We have a big digital camera
Which goes into lots of computers (the HLT)
CMS online data flow
We have a big digital camera
Which goes into lots of computers (the HLT)
Which goes into lots of disk (the Storage Manager)
CMS data flow
We have a big digital camera
Which goes into lots of computers (the HLT)
Which goes into lots of disk (the Storage Manager)
Write to HLT at ~200GB/s
Write to Storage Manager at ~2GB/s
Write to T0 at ~2GB/s
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
1 0 P B o f d a t a / y e a r
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
1PB/week
Why transfer so much data?
To process all the data taken in one year on one computer would take ~64,000 years
The life of LHC data1. Detected by experiment
2. “Online” filtering (hardware and software)
3. Transferred to CERN main campus, archived & reconstructed
4. Transferred to T1 sites, archived, reconstructed & skimmed
5. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
6. Written into locally analysable files, put on laptops
7. Turned into a plot in a paper
Analysis
• Each analysis is ~unique
• Query language is C++
• Runs on distributed system and local resources
• Series of “cut” selections to identify interesting events
• Data in the final plot may be substantially reduced from the original dataset
Workflow ladderLarge datasets (>100 TB) Complex computation
Private datasets (0.1-10 GB) Simple computation
Work on laptop/desktop machine, store resulting datasets to Grid storage
Use Grid compute and storage exclusively
Shared datasets (0.1-10 GB) Simple computation
Large datasets (>100 TB) Simple computation
Shared datasets (10-100 GB) Simple computation
Work on departmental resources, store resulting datasets to Grid storage
Shared datasets (10-500 GB) Complex computation
Shared datasets (>500 GB) Complex computation
}}}
Number of users
The life of LHC simulated data
1. Simulated by experimentalists at T0/T1/T2 sites
2. Transferred to T1 sites, archived possibly reconstructed & skimmed
3. Transferred to T2 sites, reconstructed, skimmed, filtered & analysed
4. Written into locally analysable files, put on laptops
5. Turned into a plot in a paper
Most events get cut
!“We are going to die, and that makes us the lucky ones. Most people are never going to die because they are never going to be born.”
!- Richard Dawkins
Adoption & Use
Setup
• Maybe a bit different to other people
• Many sites (>100) with >100’s TB storage, 10000’s worker nodes
• Global system
• Why not at one site?
• politics, power budget, cost
The grid
We Have a “Big Data” Problem
We Have a Big “Data Problem”
Do what you do best, out source the rest
What's interesting is that big data isn't
interesting any more
NIH
Define and refine workflows
Our situation
• Expert users, who are not interested in infrastructure
• Will work around things they perceive as unnecessary limitations
Disruptive users
How to engage disruptive users?
Open access
1PB/week
Open access
Our situation
• Limited resources for integration/testbed style activities
• Strange organisation
Data temperature
There is no such thing as now
Keep things as local as possible
Defining monitoring is difficult
Small files are bad, m'kay
Compartmentalise metadata
Recognise, embrace and communicate failures
People are harder than computers
People are important
The formula
���64
Consequences
• Automate all the things
• Learn to love a configuration management system
• Make sure everyone in the team knows how to interact with it
• Simple human solutions go a long way
Build good abstractions
Encourage collaboration
Workflow ladderLarge datasets (>100 TB) Complex computation
Private datasets (0.1-10 GB) Simple computation
Work on laptop/desktop machine, store resulting datasets to Grid storage
Use Grid compute and storage exclusively
Shared datasets (0.1-10 GB) Simple computation
Large datasets (>100 TB) Simple computation
Shared datasets (10-100 GB) Simple computation
Work on departmental resources, store resulting datasets to Grid storage
Shared datasets (10-500 GB) Complex computation
Shared datasets (>500 GB) Complex computation
}}}
Number of users
Summary