Outline
description
Transcript of Outline
Widening the number of e-Infrastructure users
with Science Gateways and Identity Federations
Giuseppe Andronico ([email protected]) INFN - ItalyWorkshop on Science Applications And Infrastructure In Clouds And
Grids– Oxford,15-16 March 2012
Outline Introduction and driving considerations
The Science Gateway paradigm: Architecture Authentication and Authorisation Schema Access workflow Grid transaction model The Authentication process Use cases and statistics The forthcoming Cloud Engine
Summary and conclusions2
Path to technology uptake
The Rogers “bell-shape” curve - Rogers, E. M. (1962), “Diffusion of Innovations”, Glencoe: Free Press.
3
IT acceptance model – the Web
Davis, F. D. (1989), "Perceived usefulness, perceived ease of use, and user acceptance of information technology", MIS Quarterly 13(3): 319–340
Development of web browsers
The World Wide Web
4
5
The evolution leap in web browsers
evolution leap
5
The eResearch2020 report(http://www.eresearch2020.eu/eResearch%20Brochure%20EN.pdf)
• Some barriers in the adoption of Grids: Changes on Grids means changes on
applications Time required to adapt usual workflows Lack of structure to support anonymous
access Download and installation of applications Interface Slow to get to compared to other
resources Difficult to use in the beginning Time spent to get the application
compiled and running
6
Using Grids is not straightforward
Users have to cope with complex security
procedures, execution scripts, job
description languages, command line based
interfaces and lack of standards.
This makes the learning curve very steep
and keeps non IT-experts away.
Another consideration…
VRCs
# of
use
rs
There is a huge number of non IT-experts out there who do not belong to any constituted Virtual Research Community.
How can we attract them ?8
I have a dream…
Can we increase the number of potential grid users by a factor of 1,000 ?
… or even by a factor of 25,000 and more ?9
A new paradigm: the Science Gateway
“A Science Gateway is a community-developed set of tools, applications, and data that is integrated via a portal or a suite of applications, usually in a graphical user interface, that is further customized to meet the needs of a specific community.”
Teragrid
10
Davis, F. D. (1989), "Perceived usefulness, perceived ease of use, and user acceptance of information technology", MIS Quarterly 13(3): 319–340
Development of Science Gateway
Requirement for sustainability
IT acceptance model – the Grid
11
Primary requirement: building Science Gateways should be like playing with
Sc. G
twy
E
Sc. G
twy
D
Sc. G
twy
C
Sc. G
twy
B
Sc. G
twy
A
• Standards• Simplicity• Easiness of use• Re-usability
12
Our reference model
13
....... Science
Gatew
ay
App. 1 App. 2 App. N
Embedded Applications AdministratorPower UserBasic User
Users from different
organisations having different
roles and privileges
Standard-based (SAGA) middleware-independent
Grid Engine
13
AuthN & AuthZ SchemaAuthorisationScience Gateway
GrIDP(“catch-
all”)
IDPCT(“catch-
all”)IDP_y
LDAP
......
...
1. Register to a Service
2. Sign in
Authentication
Social Networks’ Bridge IdP
The Grid IDentity Pool (GrIDP)
(http://gridp.ct.infn.it)
This is a “catch-all” Identity
Federation
eduGAIN(www.edugain.org)
All the Science Gateways are registered as Service Providers
of eduGAIN
16
17
Grid Engine
UsersTracking
DB
Science GW Interface
SAGA/JSAGA API
Job EngineData Engine UsersTrack &Monit.
ScienceGW 1
ScienceGW 2
ScienceGW 3
Grid MWs
Liferay Portlets
eTokenServer
DONE By end of April
Catania Grid Engine
By mid AprilDONE DONE17
Job Engine - Architecture
WT
Worker Threads for Job Submission
WT
Worker Threads forStatus Checking
USERTRACKING
DB
MON
ITOR
ING
MOD
ULE
GRID
INFR
ASTR
UCTU
RE(S
)
Job Queue
WT WT
WT WT WT
WT
WT WT
JobSubmission
JobCheck Status/
Get Output
18
19
Job Engine - Features The Job Engine has been designed with the following features
in mind:Feature Description StatusMiddleware Independent
Capacity to submit job to resources running different middleware
DONE
Easiness Create code to run applications on the grid in a very short time
DONE
Scalability Manage a huge number of parallel job submissions fully exploiting the HW of the machine where the Job Engine is installed
DONE
Performance Have a good response time DONE
Accounting & Auditing
Register every grid operation performed by the users
DONE
Fault Tolerance
Hide middleware failure to end users ALMOST DONE
Workflow Providing a way to easily create and run workflows
IN PROGRESS
20
Job Engine – Scalability
40,000 jobssubmitted in parallel !
Time to submit 10,000 jobs (h)
Job submission time (h) Submission time scales linearly with number of jobs
>10,000 jobs a hour
20
• Both sequential and MPI-enabled jobs successfully executed
• Tests with Globus planned
21
Job Engine – Middleware interoperability
Job Engine – Accounting & Auditing
A powerful accounting & auditing system is included in the Job Engine
It is fully compliant with EGI VO Portal Policy and EGI Grid Security Traceability and Logging Policy
The following values are stored in the DB for each job submitted: User ID Job Submission timestamp Job Done timestamp Application name Job ID Robot certificate ID VO name Execution site (name, latitude, longitude)
22
Catania Science Gateways in numbers
Dec Jan Feb0
50
100
150
200
250
300
350Overall usage (arb. units)
23
09/2011 10/2011 11/2011 12/2011 01/2012 02/2012020406080
100120140160180
Registered users
Data Engine – Requirements A file browser shows Grid files in a tree
File system exposed by the Science Gateway is virtual
Easy transfers from/to Grid (through the SG at the moment) are done in a few clicks
Users do not need to care about how and where their files are really located
24
Data Engine – Usage Workflow
25
1. Sign in
eTokenServer
User Track. DB
DOGS DB
5. File Upload
3. Proxy request
4. Proxy transfer
6. Update DB
7. Upload on Grid7.
Tracking
2. Upload request
25
DOGS: Data On Grid Services – Back-end implementation
26
JSAGA API used to transfer data from/to storage elements
Hibernate to manage the VFS collecting information on files stored on Grid; any changes/actions in the user view affect the VFS
MySQL as underlying RDBMS An additional component has been
developed in order to keep track of each transaction in the users tracking DB
DOGS: Data On Grid Services – Front-end implementation
A portlet has been created wit access provided only to federated users with given roles and privileges
The portlet view component includes elFinder, a web-based file manager developed in Javascript using jQuery UI for a dynamic and user friendly interface http://elrte.org/elfinder
27
Data Engine in action (1/2)
28
Data Engine in action (2/2)
«Share» to be added soon
29
Summary of standards adopted The framework for Science Gateways developed at
Catania is fully web-based and adopts official worldwide standards and protocols, through their most common implementations
These are: The JSR 168 and JSR 286 standards (also known as "portlet 1.0" and
"portlet 2.0" standards) The OASIS Security Assertion Markup Language (SAML) standard and
its Shibboleth and SimpleSAMLphp implementations The Lightweight Direct Access Protocol, and its OpenLDAP
implementation The Cryptographic Token Interface Standard (PKCS#11) standard and
its Cryptoki implementation The Open Grid Forum (OGF) Simple API for Grid Applications (SAGA)
standard and its JSAGA implementation 30
INDICATE ReviewRoberto BarberaLyon, 20/09/2011
http://www.indicate-project.euhttp://indicate-gw.consorzio-cometa.it
Science Gateways in action: e-Culture Science Gateway @ INDICATE
Use the HTTPS interface of Storage Elements
Important for large-size files
Science Gateways in action: e-Culture Science Gateway @ INDICATE
32
Science Gateways in action: e-Culture Science Gateway @ INDICATE
Thanks to the collaboration with
Science Gateways in action: e-Culture Science Gateway @ INDICATE
Science Gateways in action: GATE @ EUMEDGRID
Science Gateways in action: MrBayes @ GISELA
36
37
Science Gateways in action: GridEEG @ DECIDE
37
The CHAIN Application Database(www.chain-project.eu/applications)
Project-specific Science Gateways can be accessed from the CHAIN AppDB
38
Cloud Engine
UsersTracking
DBOCCI API
UsersTrack &Monit.
Cloud App 1
Cloud App 2
Cloud App N
Cloud MW
Cloud Gateway
The forthcoming Cloud Engine
AWS
Host Management Layer: Host Manager Performs physical resources monitoring and VEs allocation
Cluster Management Layer: Cluster Manager Monitoring the overall state of the cluster, “coordinates” HMs
External components: XMPP Server and Distributed Database XMPP advantages: host presence, open standard Central failure point does not exist: fault tolerance mechanism with
multiple CM instances
Virtual execution environment: CLEVER
Summary and conclusions e-Infrastructures can be very beneficial platforms (especially for
cultural heritage), provided they are really «easy to use» Science Gateways with support for Identity Federations and
Social Networks can revolutionize the way Grid infrastructures are used, hugely widening their potential user base, especially non-IT experts and the “citizen scientist”
The adoption of standards (JSR 286, SAGA, SAML, etc.) represents a concrete investment towards sustainability
By design, the components (the “portlets” – our “Lego bricks”) of our Science Gateways have maximum re-usability and, indeed, they have been already adopted in/by several projects (CHAIN, DECIDE, EarthServer, EUMEDGRID-Support, GISELA, INDICATE, etc.)
If you want to integrate your applications in our Science Gateways, or simply enable your websites with our authentication tools, please contact me at [email protected]
41
Thank you
42