The Ibis e-Science Software Framework · · 2011-11-28Please help us define this properly ......
Transcript of The Ibis e-Science Software Framework · · 2011-11-28Please help us define this properly ......
The Ibis e-Science Software Framework
Frank J. Seinstra, Jason Maassen, Niels Drost
Jungle Computing Research & Applications Group Department of Computer Science
VU University, Amsterdam, The Netherlands
Mini-course: General Overview
● 10:00 – 11:00: Introduction to Ibis ● 11:00 – 12:00: Ibis as ‘Master Key’ ● 12:00 – 13:00: Lunch (free) ● 13:00 – 14:00: Ibis as ‘Glue’ ● 14:00 – 15:00: Ibis today, tomorrow, and beyond ● During and after course time for discussion
● Note: ● Second day (hands-on session) on Monday December 12
2 Ibis Mini-Course November 2011
Introduction: Overview
● Ibis: ‘Problem Solving’ vs. ‘System Fighting’ ● Disclaimers ● Jungle Computing ● Domain Example #1: Computational Astrophysics ● Domain Example #2: Multimedia Content Analysis ● The Ibis Software Framework
● Requirements and Tools
● The 3 Common Uses of Ibis ● Ibis as ‘Master Key’, Ibis as ‘Glue’, Ibis as ‘HPC solution’
3 Ibis Mini-Course November 2011
● DACH 2008, Japan ● Distributed multi-cluster system
● Heterogeneous ● Distributed database (image pairs)
● Large vs small databases/images ● Partial replication
● Image-pair comparison given (in C)
● Find all supernova candidates ● Task 1: As fast as possible ● Task 2: Idem, under system crashes
5 Ibis Mini-Course November 2011
A Random Example: Supernova Detection
‘Problem Solving’ vs. ‘System Fighting’
● All participating teams struggled (1 month) ● Middleware instabilities… ● Connectivity problems… ● Load balancing…
● But not the Ibis team ● Winner (by far) in both categories ● Note: many Japanese teams with years of experience
● Hardware, middleware, network, C-code, image data… ● Focus on ‘problem solving’, not ‘system fighting’
● incl. ‘opening’ of black-box C-code
6 Ibis Mini-Course November 2011
Ibis Results: Awards & Prizes
7
1st Prize: SCALE 2008
AAAI-VC 2007 Most Visionary Research Award
1st Prize: DACH 2008 - BS 1st Prize: DACH 2008 - FT
WebPie: A Web-Scale Parallel Inference Engine
J. Urbani, S. Kotoulas,
J. Maassen, N. Drost, F.J. Seinstra, F. van Harmelen, and H.E. Bal
3rd Prize: ISWC 2008
1st Prize: SCALE 2010
Ibis Mini-Course November 2011 Finalist: EYR3 2011
● Many domains; data/compute intensive, real-time...
Disclaimer #1
● During this mini-course I (= Frank) will take the position of an enthusiastic user / domain expert… ● Because I am / used to be:
● Multimedia Content Analysis (UvA, 1996 - 2006) ● 2005/2006:
● Ibis provided all tools to allow me to continue my research ● Was immediately successful: award at AAAI 2007 ● Became Ibis ‘evangelist’: “If I can do it, anyone can”
● Now leader of JCRA group ● Direction, focus, scope ● Outreach to application domains / developers
10 Ibis Mini-Course November 2011
Disclaimer #2
● The Ibis projected started in 2002 ● Dozens of people have worked on it and/or co-implemented it ● Many more have used it ● Hence:
● Ibis reflects many important lessons learned over +/- 10 years
● In this tutorial we can only ‘scratch the surface’ ● Attempt to give best possible impression of what Ibis has to offer ● Never complete – focus on:
● most important lessons learned ● best match with ‘you’
11 Ibis Mini-Course November 2011
Disclaimer #3
● Ibis is not the-be-all-and-end-all solution ● A modular toolbox
● Many (integrated) solutions to fundamental problems ● Some Ibis-tools are low-level
● but higher level APIs, models, GUIs often available ● Further research required ● Further development / engineering required
● Note: ● Code itself (in principle) less important than the knowledge it
contains
12 Ibis Mini-Course November 2011
Disclaimer #4
● Ibis does not (explicitly) provide solutions for ● World peace, cold fusion, stock market prediction, … J ● … ● Security
● Well, there is no Ibis-specific security ● Do integrate existing solutions, e.g. ssh tunnels, grid certificates
● Co-allocation ● Well, there is a JavaGAT call, but adaptors must implement it ● Do provide support for malleability and migration
● It’s all a matter of ‘scope’: ● Please help us define this properly
13 Ibis Mini-Course November 2011
Disclaimer #5
● Outreach to Application Domains ● We are happy to support any user, but time is against us
● We must do research primarily ● Currently we do ‘cherry-picking’ ● Future: through NLeSC, or…?
● Outreach to Expert HPDC Developers ● Ibis is open source; contributions always welcome
● So far: mostly JavaGAT adaptors (e.g. from D-Grid) ● Future: through NLeSC, or…?
● Roadmap to ‘Ibis Coding Community’ required
14 Ibis Mini-Course November 2011
Jungle Computing
● ‘Worst case’ computing as required by end-users ● Distributed ● Heterogeneous ● Hierarchical (incl. multi-/many-cores)
Ibis Mini-Course November 2011 16
Why Jungle Computing?
● Scientists often forced to use a wide variety of resources simultaneously to solve computational problems, e.g. due to:
● Desire for scalability ● Distributed nature of (input) data ● Software heterogeneity (e.g.: mix of C/MPI and CUDA) ● Ad hoc hardware availability ● …
● Note: most users do not need ‘worst case’ jungle ● Ibis aims to apply to any subset
Ibis Mini-Course November 2011 17
Example Application Domains
● Computational Astrophysics (Leiden) ● AMUSE: multi-model / multi-kernel simulations ● “Simulating the Universe on an Intercontinental
Grid” - Portegies Zwart et al (IEEE Computer, Aug 2010)
● Climate Modeling (Utrecht) ● CPL: multi-model / multi-kernel simulations
● Atmosphere, ocean, source rock formation, … - hardware:
(potentially) very diverse - high resolution => speed & scalability
- …
18 Ibis Mini-Course November 2011
20 Ibis Mini-Course November 2011
Domain Example #1: Computational Astrophysics
Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA (two week ago)
21 Ibis Mini-Course November 2011
Domain Example #1: Computational Astrophysics
AMUSE
radiative transport
gravitational dynamics
hydro-dynamics
stellar evolution
● The AMUSE system (Leiden University) ● Early Star Cluster Evolution, including gas
● Gravitational dynamics (N-body): GPU / GPU-cluster ● Stellar evolution: Beowulf cluster / Cloud ● Hydro-dynamics, Radiative transport: Supercomputer
22 Ibis Mini-Course November 2011
Domain Example #1: Computational Astrophysics
Demonstrated live at SC’11, Nov 12-18, 2011, Seattle, USA (two weeks ago)
Multimedia Content Analysis (MMCA)
● Aim: ● Automatic extraction of ‘semantic concepts’ from image sets and
video streams
● Depending on specific problem & size of data set: ● May take hours, days, weeks, months, years…
Ibis Mini-Course November 2011 24
● Need for user-friendly programming tools ● Shield domain-experts from all complexities of parallel, distributed,
heterogeneous, and hierarchical computing ● Familiar (sequential) programming model(s)
Ibis Mini-Course November 2011 25
Solution: tool to make parallel & distributed computing
transparent to user
- familiar programming - easy execution Jungle Computing Systems User
Multimedia Content Analysis (MMCA)
● Applications in (a.o): ● Remote Sensing ● Security / Surveillance ● Medical Imaging ● Document Analysis ● Multimedia Systems ● Astronomy
● Application types: ● Real-time vs. off-line ● Fine-grained vs. coarse-grained ● Data-intensive / compute-intensive / information-intensive
Ibis Mini-Course November 2011 26
Multimedia Content Analysis (MMCA)
Domain Example #2: Color-based Object Recognition by a Grid-connected Robot Dog
27 Ibis Mini-Course November 2011
Seinstra et al (IEEE Multimedia, Oct-Dec 2007) Seinstra et al (AAAI’07: Most Visionary Research Award)
Successful…
● …but many fundamental problem unsolved! ● Scaling up to very large systems ● Platform independence ● Middleware independence ● Connectivity (a.o. firewalls, …) ● Fault-tolerance ● …
● Software support tool(s) urgently needed! ● Jungle-aware + transparent + efficient ● No progress until ‘discovery’ of Ibis
Ibis Mini-Course November 2011 29
The Ibis Software Framework
● Offers all functionality to efficiently & transparently implement & run Jungle Computing applications
● Designed for dynamic / hostile environments
● Modular and flexible ● Allow replacement of Ibis components by external ones, including
native code
● Open source ● Download: http://www.cs.vu.nl/ibis/
Ibis Mini-Course November 2011 31
General Requirements
● Resource independence ● Transparent / easy deployment
● Middleware independence & interoperability ● Jungle-aware middleware
● Jungle-aware communication ● Robust connectivity ● System-support for malleability and fault-tolerance ● Globally unique naming
● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software (legacy codes)
● MPI, CUDA, C++, Python, Java, Fortran, …
Ibis Mini-Course November 2011 32
Ibis Software Stack
Ibis Mini-Course November 2011 33
Resource Independence: Java
Transparent / Easy Deployment:
Jungle-aware Middleware
Middleware Independence & Interoperability
Transp. Parallelism Application-level FT
Jungle-aware Communication Unique naming Malleability & FT
Robust Connectivity
Constellation External software
JavaGAT
● Java Grid Application Toolkit ● High-level API for developing (Grid) applications independently of
the underlying (Grid) middleware ● Use (Grid) services; file cp, resource discovery, job submission, … ● Overcomes problems, incl:
● Functionality may not work on all sites, or for all users, … ● Middleware version differences & complex codes…
● Note: SAGA API standardized by OGF ● Simple API for Grid Applications (a.o. with LSU) ● SAGA on top of JavaGAT (and v.v.)
Ibis Mini-Course November 2011 34
Zorilla (not discussed; future)
● A prototype P2P middleware ● A Zorilla system consists of a collection of nodes, connected by a
P2P network ● Each node independent & implements all middleware functionality ● No central components ● Supports fault-tolerance and malleability ● Easily combines resources in multiple administrative domains
Ibis Mini-Course November 2011 35
Ibis Portability Layer (IPL)
● Java-centric ‘run-anywhere’ communication library ● Sent along with your application ● “MPI for the Grid” (quote 2005)
● Supports fault-tolerance and malleability ● Resource tracking (JEL model) ● Open-world / Closed world
● Efficient ● Highly optimized object serialization ● Can use optimized native libraries (e.g. MPI, MX)
Ibis Mini-Course November 2011 37
SmartSockets
● Robust connection setup
● Always connection in 30 different scenarios Ibis Mini-Course November 2011 38
Problems: Firewalls
Network Address Translation (NAT) Non-routed networks
Multi-homing …
Ibis Programming Models
● IPL-based programming models, a.o.: ● Satin:
● A divide-and-conquer model ● MPJ:
● The MPI binding for Java ● RMI:
● Object-Oriented remote Procedure Call ● Jorus:
● A ‘user transparent’ parallel model for multimedia applications
● Note: ● In most cases intended as ‘proof-of-concept’ (= scientific) ● Not discussed in this mini-course
Ibis Mini-Course November 2011 39
“The Future of Ibis”
● Ibis/Constellation: ● Generalized programming framework for
‘all’ Jungle Computing applications ● Automatically maps any application activity
(task) onto any appropriate executor (HW) ● By way of ‘contexts’, for example:
● Activity's context: “I need a GPU” ● Executor’s context: “I represent a GPU”
● Note: ● Activities may represent any type of task:
● Also legacy codes, scripts, 3rd party software, …
40 Ibis Mini-Course November 2011
Ibis as ‘Master Key’ (or ‘Passepartout’)
● Use JavaGAT to access ‘any’ system ● Develop/run applications independently of available middlewares ● JavaGAT ‘adaptors’ required for each middleware ● ‘Intelligent dispatching’ even allows for transparent use of multiple
middlewares
● Example: file copy ● JavaGAT vs. Globus
● Simple, portable, … ● SAGA API standardized
42 Ibis Mini-Course November 2011
package org.gridlab.gat.io.cpi.rftgt4; import java.net.MalformedURLException; import java.net.URL; import java.rmi.RemoteException; import java.security.cert.X509Certificate; import java.util.Calendar; import java.util.HashMap;
import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Vector; import javax.xml.namespace.QName; import javax.xml.rpc.ServiceException;
import javax.xml.rpc.Stub; import javax.xml.soap.SOAPElement; import org.apache.axis.message.addressing.EndpointReferenceType; import org.apache.axis.types.URI.MalformedURIException; import org.globus.axis.util.Util; import org.globus.delegation.DelegationConstants;
import org.globus.delegation.DelegationException; import org.globus.delegation.DelegationUtil; import org.globus.gsi.GlobusCredential; import org.globus.gsi.GlobusCredentialException; import org.globus.gsi.gssapi.GlobusGSSCredentialImpl; import org.globus.gsi.jaas.JaasGssUtil; import org.globus.rft.generated.BaseRequestType;
import org.globus.rft.generated.CreateReliableFileTransferInputType; import org.globus.rft.generated.CreateReliableFileTransferOutputType; import org.globus.rft.generated.DeleteRequestType; import org.globus.rft.generated.DeleteType; import org.globus.rft.generated.OverallStatus; import org.globus.rft.generated.RFTFaultResourcePropertyType; import org.globus.rft.generated.RFTOptionsType;
import org.globus.rft.generated.ReliableFileTransferFactoryPortType; import org.globus.rft.generated.ReliableFileTransferPortType; import org.globus.rft.generated.Start; import org.globus.rft.generated.TransferRequestType; import org.globus.rft.generated.TransferType; import org.globus.transfer.reliable.client.BaseRFTClient; import org.globus.transfer.reliable.service.RFTConstants;
import org.globus.wsrf.NotificationConsumerManager; import org.globus.wsrf.NotifyCallback; import org.globus.wsrf.ResourceException; import org.globus.wsrf.WSNConstants; import org.globus.wsrf.container.ContainerException; import org.globus.wsrf.container.ServiceContainer; import org.globus.wsrf.core.notification.ResourcePropertyValueChangeNotificationElementType;
import org.globus.wsrf.encoding.DeserializationException; import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.security.authentication.Constants; import org.globus.wsrf.impl.security.authorization.Authorization; import org.globus.wsrf.impl.security.authorization.HostAuthorization; import org.globus.wsrf.impl.security.authorization.IdentityAuthorization; import org.globus.wsrf.impl.security.authorization.SelfAuthorization; import org.globus.wsrf.impl.security.descriptor.ClientSecurityDescriptor;
import org.globus.wsrf.impl.security.descriptor.ContainerSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.GSISecureMsgAuthMethod; import org.globus.wsrf.impl.security.descriptor.GSITransportAuthMethod; import org.globus.wsrf.impl.security.descriptor.ResourceSecurityDescriptor; import org.globus.wsrf.impl.security.descriptor.SecurityDescriptorException; import org.globus.wsrf.security.SecurityManager; import org.gridlab.gat.CouldNotInitializeCredentialException;
import org.gridlab.gat.CredentialExpiredException; import org.gridlab.gat.GATContext; import org.gridlab.gat.GATInvocationException; import org.gridlab.gat.GATObjectCreationException; import org.gridlab.gat.Preferences; import org.gridlab.gat.URI; import org.gridlab.gat.io.cpi.FileCpi;
import org.gridlab.gat.security.globus.GlobusSecurityUtils; import org.ietf.jgss.GSSCredential; import org.ietf.jgss.GSSException; import org.oasis.wsn.Subscribe; import org.oasis.wsn.TopicExpressionType; import org.oasis.wsrf.faults.BaseFaultType; import org.oasis.wsrf.lifetime.SetTerminationTime;
import org.oasis.wsrf.properties.GetMultipleResourcePropertiesResponse; import org.oasis.wsrf.properties.GetMultipleResourceProperties_Element; import org.oasis.wsrf.properties.ResourcePropertyValueChangeNotificationType; class RFTGT4NotifyCallback implements NotifyCallback { RFTGT4FileAdaptor transfer; OverallStatus status;
public RFTGT4NotifyCallback(RFTGT4FileAdaptor transfer) { super(); this.transfer = transfer; this.status = null; }
@SuppressWarnings("unchecked") public void deliver(List topicPath, EndpointReferenceType producer, Object messageWrapper) { try { ResourcePropertyValueChangeNotificationType message = ((ResourcePropertyValueChangeNotificationElementType) messageWrapper) .getResourcePropertyValueChangeNotification(); this.status = (OverallStatus) message.getNewValue().get_any()[0]
.getValueAsType(RFTConstants.OVERALL_STATUS_RESOURCE, OverallStatus.class); if (status.getFault() != null) { transfer.setFault(getFaultFromRP(status.getFault())); } // RunQueue.getInstance().add(this.resourceKey); } catch (Exception e) {
} transfer.setStatus(status); } private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { if (fault == null) { return null;
} if (fault.getDelegationEPRMissingFaultType() != null) { return fault.getDelegationEPRMissingFaultType(); } else if (fault.getRftAuthenticationFaultType() != null) { return fault.getRftAuthenticationFaultType(); } else if (fault.getRftAuthorizationFaultType() != null) {
return fault.getRftAuthorizationFaultType(); } else if (fault.getRftDatabaseFaultType() != null) { return fault.getRftDatabaseFaultType(); } else if (fault.getRftRepeatedlyStartedFaultType() != null) { return fault.getRftRepeatedlyStartedFaultType(); } else if (fault.getTransferTransientFaultType() != null) { return fault.getTransferTransientFaultType();
} else if (fault.getRftTransferFaultType() != null) { return fault.getRftTransferFaultType(); } else { return null; } } }
@SuppressWarnings("serial") public class RFTGT4FileAdaptor extends FileCpi { public static final Authorization DEFAULT_AUTHZ = HostAuthorization .getInstance(); Integer msgProtectionType = Constants.SIGNATURE; static final int TERM_TIME = 20;
static final String PROTOCOL = "https"; private static final String BASE_SERVICE_PATH = "/wsrf/services/"; public static final int DEFAULT_DURATION_HOURS = 24; public static final Integer DEFAULT_MSG_PROTECTION = Constants.SIGNATURE; public static final String DEFAULT_FACTORY_PORT = "8443"; private static final int DEFAULT_GRIDFTP_PORT = 2811; NotificationConsumerManager notificationConsumerManager; EndpointReferenceType notificationConsumerEPR;
EndpointReferenceType notificationProducerEPR; String securityType;
String factoryUrl; GSSCredential proxy; Authorization authorization; String host; OverallStatus status; BaseFaultType fault; String locationStr; ReliableFileTransferFactoryPortType factoryPort;
public RFTGT4FileAdaptor(GATContext gatContext, Preferences preferences, URI location) throws GATObjectCreationException { super(gatContext, preferences, location); if (!location.isCompatible("gsiftp") && !location.isCompatible("gridftp")) { throw new GATObjectCreationException("cannot handle this URI");
} String globusLocation = System.getenv("GLOBUS_LOCATION"); if (globusLocation == null) { throw new GATObjectCreationException("$GLOBUS_LOCATION is not set"); } System.setProperty("GLOBUS_LOCATION", globusLocation);
System.setProperty("axis.ClientConfigFile", globusLocation + "/client-config.wsdd"); this.host = location.getHost(); this.securityType = Constants.GSI_SEC_MSG; this.authorization = null; this.proxy = null;
try { proxy = GlobusSecurityUtils.getGlobusCredential(gatContext, preferences, "globus", location, DEFAULT_GRIDFTP_PORT); } catch (CouldNotInitializeCredentialException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (CredentialExpiredException e) {
// TODO Auto-generated catch block e.printStackTrace(); } this.notificationConsumerManager = null; this.notificationConsumerEPR = null; this.notificationProducerEPR = null;
this.status = null; this.fault = null; factoryPort = null; this.factoryUrl = PROTOCOL + "://" + host + ":" + DEFAULT_FACTORY_PORT + BASE_SERVICE_PATH + RFTConstants.FACTORY_NAME; locationStr = setLocationStr(location); }
String setLocationStr(URI location) { if (location.getScheme().equals("any")) { return "gsiftp://" + location.getHost() + ":" + location.getPort() + "/" + location.getPath(); } else { return location.toString(); }
} protected boolean copy2(String destStr) throws GATInvocationException { EndpointReferenceType credentialEndpoint = getCredentialEPR(); TransferType[] transferArray = new TransferType[1]; transferArray[0] = new TransferType();
transferArray[0].setSourceUrl(locationStr); transferArray[0].setDestinationUrl(destStr); RFTOptionsType rftOptions = new RFTOptionsType(); rftOptions.setBinary(Boolean.TRUE); // rftOptions.setIgnoreFilePermErr(false); TransferRequestType request = new TransferRequestType();
request.setRftOptions(rftOptions); request.setTransfer(transferArray); request.setTransferCredentialEndpoint(credentialEndpoint); setRequest(request); while (!transfersDone()) { try {
Thread.sleep(1000); } catch (InterruptedException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } } return transfersSucc(); }
public void copy(URI dest) throws GATInvocationException { String destUrl = setLocationStr(dest); if (!copy2(destUrl)) { throw new GATInvocationException( "RFTGT4FileAdaptor: file copy failed"); }
} public void subscribe(ReliableFileTransferPortType rft) throws GATInvocationException { Map<Object, Object> properties = new HashMap<Object, Object>(); properties.put(ServiceContainer.CLASS, "org.globus.wsrf.container.GSIServiceContainer"); if (this.proxy != null) {
ContainerSecurityDescriptor containerSecDesc = new ContainerSecurityDescriptor(); SecurityManager.getManager(); try { containerSecDesc.setSubject(JaasGssUtil .createSubject(this.proxy)); } catch (GSSException e) { throw new GATInvocationException(
"RFTGT4FileAdaptor: ContainerSecurityDescriptor failed, " + e); } properties.put(ServiceContainer.CONTAINER_DESCRIPTOR, containerSecDesc); } this.notificationConsumerManager = NotificationConsumerManager
.getInstance(properties); try { this.notificationConsumerManager.startListening(); } catch (ContainerException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: NotificationConsumerManager failed, " + e);
} List<Object> topicPath = new LinkedList<Object>(); topicPath.add(RFTConstants.OVERALL_STATUS_RESOURCE); ResourceSecurityDescriptor securityDescriptor = new ResourceSecurityDescriptor(); String authz = null; if (authorization == null) { authz = Authorization.AUTHZ_NONE;
} else if (authorization instanceof HostAuthorization) { authz = Authorization.AUTHZ_NONE; } else if (authorization instanceof SelfAuthorization) { authz = Authorization.AUTHZ_SELF; } else if (authorization instanceof IdentityAuthorization) { // not supported throw new GATInvocationException(
"RFTGT4FileAdaptor: identity authorization not supported"); } else { // throw an sg throw new GATInvocationException( "RFTGT4FileAdaptor: set authorization failed"); } securityDescriptor.setAuthz(authz);
Vector<Object> authMethod = new Vector<Object>(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { authMethod.add(GSISecureMsgAuthMethod.BOTH); } else { authMethod.add(GSITransportAuthMethod.BOTH); } try { securityDescriptor.setAuthMethods(authMethod);
} catch (SecurityDescriptorException e) {
throw new GATInvocationException( "RFTGT4FileAdaptor: setAuthMethods failed, " + e); } RFTGT4NotifyCallback notifyCallback = new RFTGT4NotifyCallback(this); try { notificationConsumerEPR = notificationConsumerManager .createNotificationConsumer(topicPath, notifyCallback,
securityDescriptor); } catch (ResourceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: createNotificationConsumer failed, " + e); } Subscribe subscriptionRequest = new Subscribe();
subscriptionRequest.setConsumerReference(notificationConsumerEPR); TopicExpressionType topicExpression = null; try { topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, RFTConstants.OVERALL_STATUS_RESOURCE); } catch (MalformedURIException e) {
throw new GATInvocationException( "RFTGT4FileAdaptor: create TopicExpressionType failed, " + e); } subscriptionRequest.setTopicExpression(topicExpression); try { rft.subscribe(subscriptionRequest);
} catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: subscription failed, " + e); } } protected EndpointReferenceType getCredentialEPR()
throws GATInvocationException { this.status = null; URL factoryURL = null; try { factoryURL = new URL(factoryUrl); } catch (MalformedURLException e) { throw new GATInvocationException(
"RFTGT4FileAdaptor: set factoryURL failed, " + e); } try { factoryPort = BaseRFTClient.rftFactoryLocator .getReliableFileTransferFactoryPortTypePort(factoryURL); } catch (ServiceException e) { throw new GATInvocationException(
"RFTGT4FileAdaptor: set factoryPort failed, " + e); } setSecurityTypeFromURL(factoryURL); return populateRFTEndpoints(factoryPort); } protected void setRequest(BaseRequestType request) throws GATInvocationException {
CreateReliableFileTransferInputType input = new CreateReliableFileTransferInputType(); if (request instanceof TransferRequestType) { input.setTransferRequest((TransferRequestType) request); } else { input.setDeleteRequest((DeleteRequestType) request); }
Calendar termTimeDel = Calendar.getInstance(); termTimeDel.add(Calendar.MINUTE, TERM_TIME); input.setInitialTerminationTime(termTimeDel); CreateReliableFileTransferOutputType response = null; try { response = factoryPort.createReliableFileTransfer(input); } catch (RemoteException e) {
throw new GATInvocationException( "RFTGT4FileAdaptor: set createReliableFileTransfer failed, " + e); } EndpointReferenceType reliableRFTEndpoint = response .getReliableTransferEPR(); ReliableFileTransferPortType rft = null;
try { rft = BaseRFTClient.rftLocator .getReliableFileTransferPortTypePort(reliableRFTEndpoint); } catch (ServiceException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: getReliableFileTransferPortTypePort failed, " + e);
} setStubSecurityProperties((Stub) rft); subscribe(rft); Calendar termTime = Calendar.getInstance(); termTime.add(Calendar.MINUTE, TERM_TIME); SetTerminationTime reqTermTime = new SetTerminationTime(); reqTermTime.setRequestedTerminationTime(termTime);
try { rft.setTerminationTime(reqTermTime); } catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: setTerminationTime failed, " + e); } try { rft.start(new Start());
} catch (RemoteException e) { throw new GATInvocationException( "RFTGT4FileAdaptor: start failed, " + e); } } private void setSecurityTypeFromURL(URL url) {
if (url.getProtocol().equals("http")) { securityType = Constants.GSI_SEC_MSG; } else { Util.registerTransport(); securityType = Constants.GSI_TRANSPORT; } }
private void setStubSecurityProperties(Stub stub) { ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else {
secDesc.setGSITransport(this.getMessageProtectionType()); } secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { // set proxy credential
secDesc.setGSSCredential(this.proxy); } stub._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc); } public Integer getMessageProtectionType() {
return (this.msgProtectionType == null) ? RFTGT4FileAdaptor.DEFAULT_MSG_PROTECTION : this.msgProtectionType; } public Authorization getAuthorization() { return (authorization == null) ? DEFAULT_AUTHZ : this.authorization; }
private EndpointReferenceType populateRFTEndpoints( ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException { EndpointReferenceType[] delegationFactoryEndpoints = fetchDelegationFactoryEndpoints(factoryPort); EndpointReferenceType delegationEndpoint = delegate(delegationFactoryEndpoints[0]); return delegationEndpoint; }
private EndpointReferenceType delegate( EndpointReferenceType delegationFactoryEndpoint) throws GATInvocationException { GlobusCredential credential = null; if (this.proxy != null) { credential = ((GlobusGSSCredentialImpl) this.proxy) .getGlobusCredential(); } else {
try { credential = GlobusCredential.getDefaultCredential(); } catch (GlobusCredentialException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } }
int lifetime = DEFAULT_DURATION_HOURS * 60 * 60; ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor(); if (this.securityType.equals(Constants.GSI_SEC_MSG)) { secDesc.setGSISecureMsg(this.getMessageProtectionType()); } else { secDesc.setGSITransport(this.getMessageProtectionType());
} secDesc.setAuthz(getAuthorization()); if (this.proxy != null) { secDesc.setGSSCredential(this.proxy); }
// Get the public key to delegate on. X509Certificate[] certsToDelegateOn = null; try { certsToDelegateOn = DelegationUtil.getCertificateChainRP( delegationFactoryEndpoint, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e);
} X509Certificate certToSign = certsToDelegateOn[0]; // FIXME remove when there is a DelegationUtil.delegate(EPR, ...) String protocol = delegationFactoryEndpoint.getAddress().getScheme(); String host = delegationFactoryEndpoint.getAddress().getHost(); int port = delegationFactoryEndpoint.getAddress().getPort();
String factoryUrl = protocol + "://" + host + ":" + port + BASE_SERVICE_PATH + DelegationConstants.FACTORY_PATH; // send to delegation service and get epr. EndpointReferenceType credentialEndpoint = null; try { credentialEndpoint = DelegationUtil.delegate(factoryUrl,
credential, certToSign, lifetime, false, secDesc); } catch (DelegationException e) { throw new GATInvocationException("RFTGT4FileAdaptor: " + e); } return credentialEndpoint; } public EndpointReferenceType[] fetchDelegationFactoryEndpoints(
ReliableFileTransferFactoryPortType factoryPort) throws GATInvocationException { GetMultipleResourceProperties_Element request = new GetMultipleResourceProperties_Element(); request .setResourceProperty(new QName[] { RFTConstants.DELEGATION_ENDPOINT_FACTORY }); GetMultipleResourcePropertiesResponse response;
try { response = factoryPort.getMultipleResourceProperties(request); } catch (RemoteException e) { e.printStackTrace(); throw new GATInvocationException( "RFTGT4FileAdaptor: getMultipleResourceProperties, " + e); }
SOAPElement[] any = response.get_any(); EndpointReferenceType epr1 = null; try { epr1 = (EndpointReferenceType) ObjectDeserializer.toObject(any[0], EndpointReferenceType.class); } catch (DeserializationException e) {
throw new GATInvocationException( "RFTGT4FileAdaptor: ObjectDeserializer, " + e); } EndpointReferenceType[] endpoints = new EndpointReferenceType[] { epr1 }; return endpoints; }
synchronized void setStatus(OverallStatus status) { this.status = status; } public int transfersActive() { if (status == null) { return 1;
} return status.getTransfersActive(); } public int transfersFinished() { if (status == null) { return 0; }
return status.getTransfersFinished(); } public int transfersCancelled() { if (status == null) { return 0; }
return status.getTransfersCancelled(); } public int transfersFailed() { if (status == null) { return 0; }
return status.getTransfersFailed(); } public int transfersPending() { if (status == null) { return 1; }
return status.getTransfersPending(); } public int transfersRestarted() { if (status == null) { return 0; }
return status.getTransfersRestarted(); } public boolean transfersDone() { return (transfersActive() == 0 && transfersPending() == 0 && transfersRestarted() == 0); }
public boolean transfersSucc() { return (transfersDone() && transfersFailed() == 0 && transfersCancelled() == 0); } /* * private BaseFaultType getFaultFromRP(RFTFaultResourcePropertyType fault) { * if (fault == null) { return null; }
* * if (fault.getRftTransferFaultType() != null) { return * fault.getRftTransferFaultType(); } else if * (fault.getDelegationEPRMissingFaultType() != null) { return * fault.getDelegationEPRMissingFaultType(); } else if * (fault.getRftAuthenticationFaultType() != null) { return * fault.getRftAuthenticationFaultType(); } else if * (fault.getRftAuthorizationFaultType() != null) { return
* fault.getRftAuthorizationFaultType(); } else if * (fault.getRftDatabaseFaultType() != null) { return * fault.getRftDatabaseFaultType(); } else if * (fault.getRftRepeatedlyStartedFaultType() != null) { return * fault.getRftRepeatedlyStartedFaultType(); } else if * (fault.getTransferTransientFaultType() != null) { return * fault.getTransferTransientFaultType(); } else { return null; } }
*/ /* * private BaseFaultType deserializeFaultRP(SOAPElement any) throws * Exception { return getFaultFromRP((RFTFaultResourcePropertyType) * ObjectDeserializer .toObject(any, RFTFaultResourcePropertyType.class)); } */
void setFault(BaseFaultType fault) { this.fault = fault; } }
package tutorial; import org.gridlab.gat.GAT; import org.gridlab.gat.GATContext; import org.gridlab.gat.URI; import org.gridlab.gat.io.File; public class RemoteCopy {
public static void main(String[] args) throws Exception { GATContext context = new GATContext(); URI src = new URI(args[0]); URI dest = new URI(args[1]); File file = GAT.createFile(context, src);
file.copy(dest); GAT.end(); } }
Ibis as ‘Glue’
● Use IPL + SmartSockets, generally for wide-area communication ● Linking up separate ‘activities’ of an application
● Activities: often largely ‘independent’ tasks implemented in any popular language or model (e.g. C/MPI, CUDA, Fortran, Java…)
● Each typically running on a single GPU/node/Cluster/Cloud/… ● Automatically circumvent connectivity problems
● Example:
43 Ibis Mini-Course November 2011
With SmartSockets: No SmartSockets:
Ibis as ‘HPC Solution’
● Use Ibis as replacement for e.g. C++/MPI code ● Benefits:
● (better) portability ● malleability (open world) ● fault-tolerance ● (run-time) task migration
● Downside: ● requires recoding
● Comparable speedups: ● Not discussed here
44 Ibis Mini-Course November 2011
C++/MPI
Sockets + SSH Tunneling
SSH
● Code pre-installed at each cluster site ● Instable / faulty communication ● Connectivity problems ● Execution on each cluster ‘by hand’
MMCA: Situation in 2004/2005
Parallel
Horus
Client
Parallel
Horus
Server
Parallel
Horus
Client
Ibis Mini-Course November 2011 45
C++/MPI
Sockets + SSH Tunneling
JavaGAT + IbisDeploy
Phase 1: Ibis as ‘Master Key’ (2006)
Parallel
Horus
Client
Parallel
Horus
Server
Parallel
Horus
Client
● Code pre-installed at each cluster site ● Instable / faulty communication ● Connectivity problems ● Execution on each cluster ‘by hand’
Ibis Mini-Course November 2011 46
C++/MPI
IPL + SmartSockets
JavaGAT + IbisDeploy
Phase 2: Ibis as ‘Glue’ (2006/2007)
Parallel
Horus
Client
Parallel
Horus
Server
Parallel
Horus
Client
● Code pre-installed at each cluster site ● Instable / faulty communication ● Connectivity problems ● Execution on each cluster ‘by hand’
Ibis Mini-Course November 2011 47
Ibis/Java
IPL + SmartSockets
JavaGAT + IbisDeploy
Phase 3: Ibis as ‘HPC Solution’ (2008)
Parallel
Jorus
Client
Parallel
Jorus
Server
Parallel
Jorus
Client
● Code pre-installed at each cluster site ● Instable / faulty communication ● Connectivity problems ● Execution on each cluster ‘by hand’
Ibis Mini-Course November 2011 48
‘Master Key’ + ‘Glue’ + ‘HPC’
● Step-wise conversion to 100% Ibis / Java ● Phase 1: JavaGAT as ‘Master Key’ ● Phase 2: IPL + SmartSockets as ‘Glue’ ● Phase 3: Ibis as ‘HPC Solution’ ● After each phase a fully functional, working solution was available!
● Eventual result: ● ‘wall-socket computing from a memory stick’ ● Remember: the ‘Promise of the Grid’? ● Awards at AAAI 2007 and CCGrid 2008
49 Ibis Mini-Course November 2011
Ibis Mini-Course November 2011 50
100% Ibis Implementation (2008++)
Seinstra, Maassen, Drost et al. (SCALE 2008 @ CCGrid 2008: First Prize Winner) Bal, Maassen, Drost, Seinstra et al. (IEEE Computer, Aug 2010)
Conclusions
● Ibis enables problem solving (avoids system fighting)
● Successfully applied in many domains ● Astronomy, multimedia analysis, climate modeling,
remote sensing, semantic web, medical imaging, … ● Data intensive, compute intensive, real-time…
● One of key technologies selected by NLeSC
● Open source, download: ● www.cs.vu.nl/ibis/
51 Ibis Mini-Course November 2011
Conclusions (2)
● Jungle Computing is hard
● High-Performance Jungle Computing even harder
● While research into efficient & transparent Jungle-aware programming models has only just begun…
● …Ibis provides the basic functionality to efficiently & transparently overcome most Jungle Computing complexities
Ibis Mini-Course November 2011 52