Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University...

25
Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST

Transcript of Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University...

Page 1: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Distributed Grid Computing at ISIS using the Grid MP

System

Tom Griffin, ISIS Facility & University of Manchester / UMIST

Page 2: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

What do I mean by ‘Distributed Grid’?• A way of speeding up large, compute intensive

tasks

• Break large jobs into smaller chunks

• Send these chunks out to (distributed) machines

• Distributed machines do the work

• Collate and merge the results

Page 3: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Spare Cycles Concept

• Typical PC usage is about 10%

• Most PCs not used at all after 5pm

• Even with ‘heavily used’ (Outlook, Word, IE)

PCs, the CPU is still grossly underutilised

• Everyone wants a fast PC!

• Can we use (“steal?”) their unused CPU cycles?

• SETI@home, World Community Grid (www.

worldcommunitygrid.org)

Page 4: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Toolkit e.g. COSM• Low level toolkit – source code level integration

• So time consuming work, for each application

• Entropia DC Grid• Trial run at ISIS two years ago. Some success

• Company bought out and in limbo (?)

• United Devices Grid MP• What we’re currently using

• Quite expensive

• Condor• Free (academic research project)

• In our experience 2 yrs ago, not reliable with Windows

Possible Software Implementations

Page 5: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

The United Devices System• Server hardware

• We use two, dual Xeon servers + 280 client licenses• Could (will) easily cope with more clients

• Software• Servers run RedHat Linux Advanced Server / DB2• Clients available for Windows, Linux, SPARCs and Macs

•Programming• MGSI – Web Services interface – XML, SOAP• Accessed with C++ and Java classes etc

• Management Console• Web browser based• Can manage services, jobs, devices etc

Page 6: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Visual Introduction to the Grid

Page 7: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Installing and Deploying the System• Servers

• Complete set up in under 3 hours

• Virtually self maintaining

• Clients• Windows only so far

• MSI Installer

• approx 20 seconds

• SMS

• MP Agent User

• Install to other OSs looks straightforward

Page 8: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• CPU Intensive• Low to moderate memory use• Not too much file output• Coarse grained• Command line / batch driven• Licensing issues?

Suitable / Unsuitable Applications

Page 9: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Program

• Job

• Jobstep

• Data Set

• Data

• Workunit

• Client

Objects within the Grid

Page 10: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

1) Think about how to split your data and merge results

2) Wrap and upload your executable

3) Write the application service• Pre and Post processing

4) Use the Grid

• Fairly easy to write

• Interface to grid via Web Services

• So far used: C++, Java, Perl, C# (any .Net language)

How to write Grid Programs

Page 11: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Executable + any dlls etc

• Standard data files

• Compression

• Encryption

• Capture screen output

• Set Environmental Variables

• Command Line

Wrapping Your Executable

Page 12: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Pre-processing1) Partition data

2) Package data partitions

3) Log in to the Grid server

4) Create a Job and Job Step

5) Create a Data Set

6) Create Datas and upload data packages

7) Create Workunits

8) Set the Job running

• Post-Processing1) Retrieve results

2) Merge results

Application Service

Page 13: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction dataParametric problem

• e.g. vary parameters such as acceptance ratio, to scan a 3D grid

• each run completely independent of any other

• Send one run to each machine on the grid

Example Application: HMC

Page 14: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Unchanged exe

• User edits or creates an appropriate settings file

• User runs “my” HMC submit program• Splits bat file into one line per machine

• Uploads chunks to the Grid server• Grid server distributes Workunits to clients

• User monitors the job with their web browser

• Clients return results to the Grid server

• User runs HMC retrieve program• Downloads results

Running HMC on the Grid

Page 15: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Split the batch file into lines

• Create a dataset (to hold our data)

• Package data (command line and zmatrix files etc)

• Associate data with dataset

• Upload data packages to Grid server

• Create Workunits from the dataset

• Create a Job to hold the Workunits

More on HMC Submit…

Page 16: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Yet more…• Program written in C++

• Uses C++ classes to ‘hide’ SOAP calls

dsHMC.data_set_gid = mgsi->createDataSet(dsHMC);

ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException){ SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0)));

ud::uuid retval; response.GetReturnValue() >> retval; return retval;

}

• Auto generated by ‘Axis C++’ from WSDL file

• Also a C++ HTTPs file transfer program

Page 17: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Linear: 50 devices ≈ 50 times faster

• Affected by size of Workunit– Overhead for distribution is ≈ 1minute– Risk of device being switched off

Performance

Page 18: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

Example 2: MD Manager• Molecular Dynamics simulation(s)

• Program written in C#• Generated from WSDL (and modified) C# classes to hide

SOAP

• Wrote generic C# HTTP file transfer classes

• ‘Interactive’ program

• Typical runtime ~10 hours per single

simulation

• Need to investigate ‘grids’ of simulations

Page 19: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

IHG

FED

CBA

IHG

FED

CBA A B C

D E F

G H I

• But in 3-dimensions

• and with ‘ordering restrictions’

• plus a post processing stage

Temperature

Pressure

Page 20: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.
Page 21: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Johnson & Johnson

• Novartis

• GSK

• National Physical Laboratory

• Accelrys

• IBM

• World Community Grid• http://www.worldcommunitygrid.org/

• Currently the Human Proteome Folding project

Who Else Does This?

Page 22: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Technical Problems• Mercifully few!

• Main issue has been RAM thresholding (now resolved)

• Encryption of certain files causes a problem

• Support• So far been very good

• Responses to queries always next day (time difference) and always insightful• Ease of setup / maintenance• Installed and fully running in ~3 hours

• Next to no maintenance required, other than backup

Problems Encountered & Support

Page 23: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Easiest thing to blame

• Too abstract for some users (no big box)• Stealing my cycles

• Expansion leads to political problems

‘Social’ Issues

Page 24: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Expansion• Proposal accepted for an additional 400 licenses

• Giving us a total of 480

• Change in licensing model

Future Developments - Expansion Upgrade to 280

Licences

Upgrade tounlimited licences

for 1 year

MP Insight

UnlimitedLicences forever

480 Permanentlicences

Completed

Funded

Seeking funding

$50k

$45k

$50k

$83k

• Bottom Line: Costs• Setup, server licenses, 80 client licenses + support – $18k – CMSD

• Total ≈ $250k

Page 25: Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST.

• Grid is here and running smoothly

• Easy to use

• Excellent performance

• Vast amount of compute power available

• Future looks good

Summary