Using Grid Computing David Groep, NIKHEF 2002-07-15.

30
Using Grid Computing David Groep, NIKHEF 2002-07-15

Transcript of Using Grid Computing David Groep, NIKHEF 2002-07-15.

Page 1: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Using Grid Computing

David Groep, NIKHEF2002-07-15

Page 2: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Physics @ CERN• LHC particle accellerator

• operational in 2007

• 5-10 Petabyte per year

• 150 countries

• > 10000 Users

• lifetime ~ 20 years

level 1 - special hardware

40 MHz (40 TB/sec)

level 2 - embeddedlevel 3 - PCs

75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &

offline analysis

The Grid, But Why?

Page 3: Using Grid Computing David Groep, NIKHEF 2002-07-15.

CPU & Data RequirementsEstimated CPU Capacity at CERN

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

year

K S

I95

Moore’s law – some measure of the capacity technology advances provide for a constant number of processors or investment

Jan 2000:3.5K SI95

LHC experimentsOther experiments

< 50% of the main analysis capacity will be at CERN

Estimated CPU capacity required at CERN

Page 4: Using Grid Computing David Groep, NIKHEF 2002-07-15.

More Reasons Why

ENVISAT• 3500 MEuro programme cost3500 MEuro programme cost

• 10 instruments on board10 instruments on board• 200 Mbps data rate to ground200 Mbps data rate to ground• 400 Tbytes data archived/year400 Tbytes data archived/year• ~100 `standard’ products~100 `standard’ products• 10+ dedicated facilities in Europe10+ dedicated facilities in Europe

• ~700 approved science user projects~700 approved science user projects

• 3500 MEuro programme cost3500 MEuro programme cost

• 10 instruments on board10 instruments on board• 200 Mbps data rate to ground200 Mbps data rate to ground• 400 Tbytes data archived/year400 Tbytes data archived/year• ~100 `standard’ products~100 `standard’ products• 10+ dedicated facilities in Europe10+ dedicated facilities in Europe

• ~700 approved science user projects~700 approved science user projects

Page 5: Using Grid Computing David Groep, NIKHEF 2002-07-15.

And More …

•For access to data

–Large network bandwidth to access computing centers

–Support of Data banks replicas (easier and faster

mirroring)

–Distributed data banks

•For interpretation of data

–GRID enabled algorithmsBLAST on distributed data banks, distributed data mining

Bio-informatics

Page 6: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Common Ground

• Large amounts of data• Distributed, ad-hoc user community• Problems are distributable

• Need for resources grows faster than market• Network grows faster than the application needs

• Willingness to share resources …• … if security and integrity is guaranteed

Page 7: Using Grid Computing David Groep, NIKHEF 2002-07-15.

The One-Liner

• Resource sharing and coordinated problem solving in dynamic multi-institutional virtual organisations

Page 8: Using Grid Computing David Groep, NIKHEF 2002-07-15.

What is Grid computing?

• Dependable, consistent and pervasive access• Combining resources from various organizations

• `Virtual Organizations’ – user-based view on Grid

• Technical challenges:– transparent decisions for the user– uniformity in access methods– secure & crack resistant– authentication, authorization, accounting (AAA) &quota

Page 9: Using Grid Computing David Groep, NIKHEF 2002-07-15.

• Globus Project started 1997• de facto-standard• Reference implementation of Gridforum standards

• Large community effort• Basis of several projects, including EU-DataGrid

• Toolkit `bag-of-services' approach

• Successful test beds, with single sign-on, etc…

Grid Middleware

Page 10: Using Grid Computing David Groep, NIKHEF 2002-07-15.

In The Beginning

• Distributed Computing– synchronous processing

• High-Throughput Computing– asynchronous processing

• On-Demand Computing– dynamic resources

• Data-Intensive Computing– databases

• Collaborative Computing– science

Ian Foster and Carl Kesselman, editors, “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, 1999

Page 11: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Grid Architecture

Applications

Grid Services GRAM

Grid Security Infrastructure (GSI)

Grid FabricCondor MPI PBS Internet Linux

Application ToolkitsDUROC MPICH-G2Condor-G

GridFTPMDS

SUN

VLAM-G

Make all resources talk standard protocols

Promote interoperability of application toolkit, similar to interoperability of networks by Internet standards

ReplicaSrv

Page 12: Using Grid Computing David Groep, NIKHEF 2002-07-15.

OGSA: new directions

• Looks superficially like `web services’• Based on common standards:

– WSDL– SOAP– UDDI

• Adds:– Transient services– State of distributed activities– Workflow, videoconf, distributed data analysis

• Management of service instances• Grid Security Infrastructure

Page 13: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Looking for Resources

• Resource Brokerage based on matchmaking (Condor)

• Information Services Mesh– Meta-computing directory– Replica Catalogues

DataGrid http://marianne.in2p3.fr/

Page 14: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Submitting a Job

Page 15: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Locating a Replica

• Grid Data Mirror Package

• Moves data across sites• Replicates both files and

individual objects• Catalogue used by Broker• Replica Location Service

(giggle)

• Read-only copies “owner” by the Replica Manager.

http://cmsdoc.cern.ch/cms/grid

Page 16: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Sending Your Data

• Tape robots, disks, etc. share GridFTP interface• Supports single-sign-on and confidentiality• Optimize for high-speed >1Gbit/s networks

• In the future: automatic optimizations, bandwidth reservations, directory-enabled networking, …

Page 17: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Grid-enabled Databases?

• SpitFireuniform access to persistent storage on the Grid

• Multiple roles support• Compatible with GSI (single sign-on) though CoG• Uses standard stuff: JDBC, SOAP, XML• Supports various back-end data bases

http://hep-proj-spitfire.web.cern.ch/hep-proj-spitfire/

Page 18: Using Grid Computing David Groep, NIKHEF 2002-07-15.

DataGrid Test Bed 1

• DataGrid TB1:– 14 countries– 21 major sites

– Growing rapidly

• Submitting Jobs:– Login only once,

run everywhere– Cross administrative

boundaries in asecure and trusted way

– Mutual authorization

Page 19: Using Grid Computing David Groep, NIKHEF 2002-07-15.

DutchGrid Platform

Amsterdam

UtrechtKNMI

Delft

Leiden

Nijmegen

Enschede

• DutchGrid:– Test bed coordination– PKI security

• Participation byNIKHEF:

FOM, VU, UvA, Utrecht, Nijmegen

KNMI, SARA

AMOLF

DAS-2 (ASCI):TUDelft, Leiden, VU, UvA, Utrecht

Telematics Institute

Page 20: Using Grid Computing David Groep, NIKHEF 2002-07-15.

And now for some Technical Details

For Users

Page 21: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Start using the grid

• All the necessary “client tools” are on all Linux and Solaris systems

• You just need:– Credentials/tokens for the Grid (see next slides)– Authorization to use resources

(you get all NIKHEF resources by default)– Information on which resources to use effectively

Page 22: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Your Grid Credentials

• You will use resources across several domains– You may not care about security and authorization– But the remote site admin will !

• All communications are authenticated usingX.509 “Public Key” Certificates

• The technology used to securecredit card transactions on the web (https://……)

• Uniquely binds name/affiliation to a digital token

Page 23: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Certification Authorities

• CA’s act as trusted third parties

• Remote sites trust the CA for a proper binding• They will not do authentication again, so

only authorization left.

• CA’s are highly valuable: crack one to impersonate others on the Grid

(and abuse resources)

• Registration Authorities do in-person ID checks

Page 24: Using Grid Computing David Groep, NIKHEF 2002-07-15.

CA’s in DataGrid

• 10 National CA’s (one per EU country)• Each one has a detailed

policy and practice statement

• NIKHEF operates the CA for DutchGridSee http://www.dutchgrid.nl/ca

• Get a “certificate” from the DutchGrid CAbefore you can start using the Grid

• It’s valuable, protect it with a pass phrase• One cert valid for all DataGrid sites

Page 25: Using Grid Computing David Groep, NIKHEF 2002-07-15.

The Proxy

• A `proxy certificate’ is a limited-lifetime delegationwithout a pass phrase to protect it

• Implements the single sign-on for Grid• Valid for 12 hours (by default)

• Use it to:– Run your jobs– Get access to your data

• Get it, by running grid-proxy-init

Page 26: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Now see for yourself

Page 27: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Getting a Certificate

• Initialize your environment for the Grid• Use the Globus local guide from

http://www.dutchgrid.nl/Support/

• Send the result to [email protected] will be contacted by phone

• Put the certificate (sent by mail) in your$HOME/.globus/usercert.pem

• Or use the Web at http://certificate.nikhef.nl/userhelp.html

Page 28: Using Grid Computing David Groep, NIKHEF 2002-07-15.

Using the Grid

• Request authorization: [email protected]• Look what is out there using grid-info-search or

http://marianne.in2p3.fr/datagrid/giis/giis-browse.html

• Try some local hosts:– bilbo, kilogram, triangel

kilogram:davidg:1009$ globus-job-run dommel.wins.uva.nl /usr/ucb/quota -v

Disk quotas for random (uid 12xxx):

Filesystem usage quota limit timeleft files quota limit timeleft

/home/random 13067 1500000 2000000 0 0 0

kilogram:davidg:1010$

• Start running your analysis/MC/other jobs

Page 29: Using Grid Computing David Groep, NIKHEF 2002-07-15.

GridFTP

• Universal high-performance file transfer• Extends the FTP protocol with:

– Single sign-on (GSI, GSSAPI, RFC2228)– Parallel streams for speed-up– Striped access (ftp from multiple sites to be faster)

• Clients: gsincftp, globus-url-copy.

Page 30: Using Grid Computing David Groep, NIKHEF 2002-07-15.

What’s Next?

• Some of the nice user-features to come:

– Finding data files by characteristics(give me all golden decay’s)

– Moving your job to where the data is– Automatic partitioning of jobs– Support true-interactive work– Better network utilisation (faster access to data)– ………

• If you are in the DataGrid project, ask your WP leader for authorization in TB1