Ischia, Italy - 9-21 July 20061 Session 1 Second Part Day 1 Monday 10 th July Malcolm Atkinson.
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
-
Upload
jasper-robinson -
Category
Documents
-
view
223 -
download
1
Transcript of Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
Extensible Framework for Data Access &
Integration
Malcolm AtkinsonDirector
www.nesc.ac.uk
10th November 2004
Database GrowthPDB Content Growth
Wellcome Trust: Cardiovascular Functional Genomics
Glasgow Edinburgh
Leicester
Oxford
LondonNetherlands
Shared dataPublic curated
data
BRIDGESIBM
Biochemical Pathway Simulator
Closing the inf ormation loop – between lab and computational model.
(Computing Science, Bioinformatics, Beatson Cancer Research Labs)
DTI Bioscience Beacon Project Harnessing Genomics Programme
Slide from Muffy Calder, Glasgow
Now largest EU project in the Life Sciences – see http://www.cancerresearchuk.org/news/pressreleases/scottishscientists_22july04
Walter Kolch
eDiaMoND – Compute
Mammograms have different appearances, depending on image settings and acquisition systems
StandardMammoFormat
StandardMammoFormat
Temporal mammography
ComputerAidedDetection
3D View
Provided by eDiamond project: Prof. sir Mike Brady et al.
Automatic registration technology
Rigid registration of MR and CT imagesof the head
Inter-subject image warpingProvided by IXI project: Prof. Derek Hill et al.
Move Computation to Data
Code scaleDepends on wet-ware
No noticeable rate of improvement
Data scaleGrows Moore’s Law or Moore’s Law2
Analysis of data Extracts & derivatives used
Often smaller – more value for current investigation
Implies move code to dataSQL, Xquery, Java code, …
Extensibility mechanisms used by OGSA-DAIersJava mobility (e.g. DataCutter), database procedures, …
Increasingly
necessary
Application control or
higher-level service
decisions
Integration is Everything
MotivationNo business or research team is satisfied with one data resource
Data Curation Expertise Human CentredIntegration Human centredDomain-specialist driven
Dynamic specification of combination functionIterative processes
Revised request minutes later Revised request after months of thought
Sources inevitably heterogeneousTime-varying content, structure & policiesRobust, stable steerable integration services
Higher-level services over multiple resourcesFundamental requirements for (re)negotiation
Federation or Virtualisation
preceding integration
or kit of integration tools to be interwoven
with an application?
OGSA
Infrastructure Architecture
Grid or Web Service Infrastructure
Data Intensive Applications for Science X
Compute, Data & Storage Resources
Distributed
Simulation, Analysis & Integration Technology for Science X
Data Intensive X Scientists
Virtual Integration Architecture
Generic Virtual Data Access and Integration Layer
Structured DataIntegration
Structured Data Access
Structured Data Relational XML Semi-structured-
Transformation
Registry
Job Submission
Data Transport Resource Usage
Banking
Brokering Workflow
AuthorisationOGSA-DAI
Database (Xindice, MySQL
Oracle, DB2)
Request to Registry for sources of data about “x”
Registry responds with Factory handle
Request to Factory for access to database
Factory creates GridDataService
Factory returns handle of GDS to client
Client queries GDS with SQL, XPath, XQuery etc
GDS interacts
with database
Query results
returned XML
SOAP/HTTP
service creation
API interactions
Analyst
RegistryGDSR
FactoryGDSF
Grid Data Service
GDS
Consumer
ORdelivered to consumer
as XML
OGSA-DAI
OGSA-DAI Downloads R4
690 downloads since May 04-Actual user downloads not search engine crawlers-Does not include downloads as part of GT3.2 releases
Total of 838 registered users
R1.0 (Jan 03) 104R1.5 (Feb 03) 108R2.0 (Apr 03) 250R2.5 (Jun 03) 291R3.0 (Jul 03) 792R3.1 (Feb 04) 630
Total 2865
United Kingdom21%
China26%
United States
13%
Japan
5%
Unknown7%
Germany5%
Italy5%
Austria2%
Australia2%
France3%
Taiwan2%
Downloads by Country – OGSA-DAI R4.0
Multiple tasks / request
1
2
R E Q U E S T O R S T U B
C L I E N T A P I
Data Set
Data Set
dr
IdentTypeValue
IdentTypeValue
IdentTypeValue
IdentTypeValue
IdentTypeValue
IdentTypeValue
IdentTypeValue
IdentTypeValue1234567 0
Be Direct
Double Handling costs too muchMemory cycles, bus capacity, cache disruption, …
Double Handling via discs pathologically badData translation expensive
Avoid Deliver as stored, …
ComposeStream
Main memory is not big enoughStream or use Disk
Couple generator & consumer directlyStream from RAM to RAM
Requires coupled computation execution
Breaks downboundaries and
merges data, execution &
transport requirements.
Demands smart workflow
enactment service &
foundation services
Models for process transformation and optimisation
Take Home Message
Data Access & IntegrationTwo Models
kit of parts Virtualisation
Ubiquitous NeedsPervasive and growing number and diversity of data collectionsOpportunity and power to integrate and mine
OGSA-DAI PioneeringTalk by Amrey Krause - 5:15 Today
Growing CommunityImplementationStandardsUsersJoin the party of users, contributors & researchers