Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC)...

23
Cluster Computing through an Application-oriented Computational Chemistry Grid Kent Milfeld and Chona Guiang, Sudhakar Pamidighantam, Jim Giuliani Supported by the NSF NMI Program under Award #04-38312 http://www.GridChem.org April 24, 2005 Outline Computational Chemistry Grid Overview HPC Application Computing through a Client Interface Architecture for the GridChem Client Supporting the Virtual Organization

Transcript of Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC)...

Page 1: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

1

Cluster Computingthrough an

Application-orientedComputational Chemistry Grid

Kent Milfeld and Chona Guiang, Sudhakar Pamidighantam, Jim Giuliani

Supported by the NSF NMI Program under Award #04-38312 http://www.GridChem.org

April 24, 2005

Outline

• Computational Chemistry Grid Overview• HPC Application Computing through a

Client Interface• Architecture for the GridChem Client• Supporting the Virtual Organization

Page 2: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

2

The Big Picture

Computation Chemistry Grid (CCG)“GridChem”

• A collection of “grid-enabled” resources to routinely run chemical physics applications

• Integrates a desktop environment into an infrastructure for a specific community of users– comp. chemists with large/small scale needs– experimental chemists who occasionally need

simulation capabilities to verify experimental results

• Establishes a distributed infrastructure for open scientific research– a virtual organization

Page 3: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

3

The Components• Applications

• Resources

• Desktop Environmentthe GridChem Client

Compute Intensive 0.1-100 hrs runsHigh Speed Infrastructure Not Needed

4 – 16 CPUS/job

Batch SupportSecure Access (grid-enabled)

Multi-platform support (XP, OS X, Linux)Responsive

Client Interface to the Grid

Cluster Systems

Ψ=ΗΨ EAtoms, Small Molecules, Clusters

Electronic Structure

Applications• GridChem supports some apps already

– Gaussian 98/03, GAMESS, MolPro• Schedule of integration of additional software

– NWChem– ACES-2– Crystal– Q-Chem– NBO– Wein2K– MCCCS Towhee

• homegrown computational chemistry codes developed at LSU

Page 4: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

4

Computational Chemistry Resource Providers

TACCCCT

NCSA

CCS OSC

ResourcesOver 400 processors and 3,525,000 CPU

hours available annually

140,00016IBM Power4 (TACC)

280,00032Intel Cluster (LSU)

1,000,000128SGI Origin2000 (NCSA)560,00064Intel Cluster (NCSA)

290,00033HP Integrity Superdome

840,00096Intel Cluster (UKy)

315,00036Intel Cluster (OSC)

100,00012HP Intel Cluster (OSC)

Total CPU Hours/YearProcs AvailSystem (Site)

Page 5: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

5

DesktopUse your workstation!

Data Storage

Graphics

Computing Resource

596 MB for Internet Files

Intel: GMA 900 Graphics,…Mac: ATI Radeon 9200 with 32MB DDR

Gigaflop ProcessorsGigaBytes for Storage

Desktop -- GridChem• Java Based Client

– “Same” Look and Feel on “every” machine.– Consistent Environment

• No Globus Installation• Designed for “Application Services”

through a Server; but• Can be used as Stand-Alone Client

for job submission

Page 6: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

6

3-tier Architecture

GridChem Client GC Middleware Server HPC Resources

AuthenticationKerberosMyProxySSHmodules

Resource specs. into Local Batch Specs

Batch ScriptQueue Submission

Application Execution +Output & File Transfers

inputs, resource specs. & temp. cert. transfer

Kerb. & MyProxyServers

Input PreparationMolecular Editor

Application OptionsSite Info (stat./dyn.)

J2EE

MySQL

Site Monitors

Site PreferenceJob Submission Resource

Management

File Management

RemoteStorage

MolecularEditor +Visualization

User See Two-Tier Architecture

Infrastructure (now)• Server, GridChem Client*

– MyProxy (X-500 Certs), Kerberos (Security)– CGI Scripts GSI

( for Data Movement & Job Launch)• Job Monitoring

– Perl filters, MySQL• Support

– PCS, Portable Consulting Service• EOT

– OSU

*

Page 7: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

7

Infrastructure (future)• Server

– Condor + ? (Job Launch)– GSI ( Data Transfer, uberftp)– Information Repository (IGRID, GPIR,…?)

• Support– GridPort 3.0 Monitoring,

Consulting (PCS), Accounting…

Infrastructure (future)• GridClient

– 3-rd party file transfers (Trebuchet)– More intelligence in input construction– Increase Application Space– Web Start– Indirect DB access for preferences– Advanced visualization support (Molden,…)

• Resource Sites– Condor– Globus Utils

Page 8: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

8

GridChem Client

GridChem Client

Page 9: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

9

GridChem Nanocad Molecular Editor

Lexical Analysis And Parsing• Follows the progress of the calculation

while it is still running or has completed.• Plots the energy, gradient, etc versus

iteration number.

Page 10: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

10

Monitoring

User

Storage

HTTPHTML,PHP,…

Service

Login

DB

Batchcron

Filter

Job-Status-Monitor = JSM

Page 11: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

11

Consulting

Database

Report Problem

User Interface

Consultant Interface

Email Notification

Further Interaction

ConsultantMonitorPage

Front-line triage

TicketTransfer &ResponsePage

Ticket Owner& CCs

Problem Form

PersonalizedMonitorPage

Email Notification

Notifications

Page 12: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

12

User Submission

Consultant ViewConsultant’s View

Page 13: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

13

Post Processing2d and 3d Orbital isosurfaces

Using NCSAChem/Slice/Molden/Cartona/Free Software

Molecule Viewers

Electron Structure: Orbitals or electron densities,…

5-dehydro-m-xylylene triradical

Page 14: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

14

Transcription Regulator(molecular modeling)

Summary

• GridChem Client• Resources• Community

• “Better Living Through Chemistry”

Page 15: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

15

References

• www.gridchem.org• www.grids-center.org• www.gridlab.org/

GridChem Job Management

Page 16: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

16

Molecular Viewers• Rasmol: berkeley - cambridge - umass - umass - GW - pps - mrc . Get

rasmol ftp - get rasmol xerxes - get rasmol(Bernstein) - Martz raswin.hlp - get rasmenu addon - Molsee(rasmol aid) - Chime animation of Rasmol - Chimesupport - molsurfer(embl) -Kinemage: kinemage (download) - kinemage (download) - mage (download) - mage setup - kinemages kinemages? -Other viewers: Cn3D (download) - vmd(openGL) - zoomseq for VMD -moviemol(OGL) - Whatif (embl) - swisview download (Win,Mac) -swisview download (Win,Mac) - Moil trajectory - Moil(sgi suny) -Visualize - xmol (no further devpt) -viewmol (C download) - viewmol(C download) - viewmol (C download) - viewmol (C) - webspace -raster - raster - Pov - CACAO - Molecule - weblab(msi) - Accord -xbs(unix) - WebMol Java - molden - molden - molmol (unix,win) -Chem3D - MolWin download - tessel - gOpenMol MD trajectory (Win,Linux,IRIX) - gOpenMol ccl - povray(Win) - mol2pov - mol2pov -molsoft webviewer - molsoft icmlite download (win,sgi,linux) -Chem3D(Win) download - Molecules-3D - ORTEP (win) - Interchem(sgi) - Mehmacc - NLMlist

http://www.clarku.edu/faculty/mlei/chem_link.htm

Consulting• Present grid and local consulting systems are

people intensive, cumbersome and possibly expensive.

• Need a single, inexpensive, web-based system.Characteristics:– Easy to Learn and Access– Minimal Maintenance– Efficient Tracking

Page 17: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

17

Consulting

User Ticket Consultant

PCS*

*Developer:Patrick Hurley @TACC

www.gridchem.org/consult

Server Organized**

entry DB & notice email

DB (MySQL,…) & PHPBrowser & Email Browser & Email

email DB & notice response

Components:

Operation:

Features:

Email should be the messenger,not the organizational framework(structure) for handling problems.**

Login protected

Memory Sensitive

Simple to Use & Preferences

Page 18: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

18

Consulting

*******

noidea

User Submission

Page 19: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

19

Creating Users

User Preferences

Page 20: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

20

Monitoring

• Admin (machine view)– No. of nodes

(up/down)– Size of jobs & distrib.

– Node Load

• User (job view)– No. of nodes available

– Number of my jobs– Job (percent or

absolute time left)

• When things go wrong– Job ID– Start-time, End-time– Job nodes list with load, swap, memory usage

Monitoring

Batch

User

Login

Text

filter or native cmdshowqqstatllqbjobs

Page 21: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

21

Monitoring

Batch

User

HTTPHTML,PHP,…

Login

Service

Filter

WEB

The Admin Monitor

Page 22: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

22

Security• Goals

– Single Sign-On– Unattended Program Execution

• Direct Access to Compute Resources (SSH)• Authentication via X.509 User Certificates• Managing User Credentials– behind the scenes.

SSH Security• Generate key-pair (ssh-keygen)

(private key is passphrase protected)

• Put id_dsa.pub or id_rsa.pub in remote host account’s authorized_keys

• SSH passphrase needed when using ssh or scp

• Remote host: authorized_keysfile contains public keys allowing access to holders of the corresponding private key.

• SSH Key Passphrase is not transmitted over the wire

Client

Page 23: Outline - Linux Clusters Institute · • HPC Application Computing through a ... IBM Power4 (TACC) 16 140,000 Intel Cluster (LSU) 32 280,000 ... • Grid utilities, gsi-ssh, uberftp,

23

SecurityX.509 User Certificates

• De facto standard for Grid authentication• Public Key approach

– User keeps private key.– CA digitally signs the public key (with its private key)

and inserts additional (expiration data) info to produce a user certificate.

• Many Grids supports X.509 certificates for– GSI-SSH authentication– Globus grid services authentication

• E.g., globus-job-submit, globus-url-copy, uberftp

SecurityX.509 User Certificates

• Generate key pair.• Send public key to certificate authority.• Authority certifies you and associates your distinguished

name, DN, (common name) with the certificate.• Sites put your local login name and DN in a “map” file.• Grid utilities, gsi-ssh, uberftp, etc. use the certificate (or

proxy certificate for MyProxy), to authenticate as “DN” to the site, and hence can run as the local site login (as directed in the map file).