Post on 30-Dec-2015
Distributed Monte Carlo Instrument
Simulations at ISIS
Tom Griffin, ISIS Facility & University of Manchester
• What is Distributed Computing
• The software we use
• VITESS Specifics
• McStas Specifics
• Conclusions
Introduction
What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive
tasks
• Break large jobs into smaller chunks
• Send these chunks out to (distributed) machines
• Distributed machines do the work
• Collate and merge the results
Spare Cycles Concept
• Typical PC usage is about 10%
• Most PCs not used at all after 5pm
• Even with ‘heavily used’ (Outlook, Word, IE)
PCs, the CPU is still grossly underutilised
• Everyone wants a fast PC!
• Can we use (“steal?”) their unused CPU cycles?
• SETI@home, World Community Grid (www.worldcommunitygrid.org)
• Toolkit e.g. COSM• Low level toolkit – source code level integration
• So time consuming work, for each application
• Entropia DC Grid• Trial run at ISIS two years ago. Some success
• Company bought out and in limbo (?)
• United Devices Grid MP• What we’re currently using
• Quite expensive
• Condor• Free (academic research project)
• In our experience 2 yrs ago, not reliable with Windows
Possible Software Implementations
The United Devices System• Server hardware
• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients
• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs
•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc
• Management Console• Web browser based• Can manage services, jobs, devices etc
• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?
Suitable / Unsuitable Applications
• Program• McStas
• Job• wish_simulation
• Jobstep
• Workunit • sent to a Device
• Data Set
• Data
Objects within the Grid
1) Think about how to split your data and merge results
2) Wrap and upload your executable
3) Write the application service• Pre and Post processing
4) Use the Grid
• Fairly easy to write
• Interface to grid via Web Services
• So far used: C++, Java, Perl, Fortran, C#
How to write Grid Programs
• Executable + any dlls etc
• Standard data files
• Compression
• Encryption
• Capture screen output
• Set Environmental Variables
• Command Line
Wrapping Your Executable
• Pre-processing1) Partition data
2) Package data partitions
3) Log in to the Grid server
4) Create a Job and Job Step
5) Create a Data Set
6) Create Datas and upload data packages
7) Create Workunits
8) Set the Job running
• Post-Processing1) Retrieve results
2) Merge results
Application Service
• Two scenarios:
• Single large simulation run
• Split the neutrons into smaller numbers and execute separately
• Merge results in some way
• Many smaller runs
• Parameter scan
Monte Carlo Speed-up Ideas
• Easy mode of operation: fixed executables + data files
• Executables held on server
• Split command line into bits – divide Ncount
• Vary the random seed
• Create data packages
• Upload data packages
VITESS – Splitting It
• Use GUI to create instrument – Save As Command
• “Parameter directory” set to “.”
VITESS – Running It
• Submit program parses bat file
• Substitutes ‘V’ and ‘P’
• Removes ‘header’ and ‘footer’
• Creates many new bat files with different ‘--Z’s and
• Submit program creates many bat files
VITESS – Running ItC:\My_GRID\VITESSE\VITESSE\build>Vitess-Submit.exe example_job example.bat req_files 20logging in to https://bruce.nd.rl.ac.uk:18443/mgsi/rpc_soap.fcgi as tom....
Adding Vitesse dataset....Adding Vitesse datas....3e+007 neutrons split into 20 chunks, of -n1500000 neutronsTotal number of Vitesse 'runs' = 20Uploading data for run #1...Uploading data for run #2.....Uploading data for run #19...Uploading data for run #20...
Adding Vitesse datas to system....Adding job....Adding jobstep....Turning on automatic workunit generation....Closing jobstep....
All doneYour job_id is 4878
• Download the ‘chunks’
• Merge Data files
• DetectedNeutrons.dat : concatenate
• vpipes : trajectories & count rate
• Two classes of files
•1D - Values: sum & divide by num chunks-
- Errors: square, sum and divide
•2D –Sum / num of chunks
VITESS – Merging It
• Many times faster: linear increase
• Needs verification runs (x3)
• Typically 11 (potentially) 30+ times faster
• 12 hours runs in 1 hour!
• Very large simulations reach random limits
VITESS – Advantages and Problems
VITESS – Some Results
Comparison
Time-of-Flight (ms)
63.0 63.2 63.4 63.6 63.8 64.0 64.2 64.4
Neutrons s-1
0
2
4
6
8
10
12
1 CPU Simulation - 66 Hours GRID Simulation - 6 Hours
176 hours
59 hours6hrs 20mins
• Different executable for every run
• Executable must be uploaded at run time
• Split –n into chunks
• or run many instances (parameter scan)
• Create data (+ executable) packages
• Upload packages
McStas – Splitting It
• Use McGui to create and compile executable
• Create input file for Submit program
McStas – Running It
• Large run• Submit program breaks up –n#####
• Uploads new command line + data + executable
• Parameter Scan• Send each run to a separate machine
McStas – Running It
• Many output files Separate merge program
• PGPLOT and Matlab implemented
• Very similar
• PGPLOT• 1D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum
• 2D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum
• Matlab• 1D – Same maths, different format
• 2D – Virtually the same
• ‘Metadata’ leave untouched
McStas – Merging It
• Security: Do we trust users?
• 100 times faster[?]
• Linux version much faster than Windows [?]
• How do we merge certain fields?• values = '1.44156e+006 10459.9 30748';
• statistics = 'X0=3.5418; dX=1.52975; Y0=0.000822474; dY=1.0288;';
• Some issue related to randomness of moderator file
McStas – Advantages and Problems
• Expansion• Proposal accepted for an additional 400 licenses
• Giving us a total of 480
• Change in licensing model
Future Developments - Expansion Upgrade to 280
Licences
Upgrade tounlimited licences
for 1 year
MP Insight
UnlimitedLicences forever
480 Permanentlicences
Completed
Funded
Seeking funding
$50k
$45k
$50k
$83k
• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD
• Total ≈ $250k
• Both run well under Grid MP
• Submit & Retrieve a few hours work
• Merge a bit more
• Needs to merge more output formats [?]
• Issues with very large simulations
• More info on Grid MP at www.ud.com
Conclusions