System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site...
-
Upload
mervyn-stevenson -
Category
Documents
-
view
215 -
download
0
description
Transcript of System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site...
System Development & Operations
NSF DataNet site visit to MITFebruary 8, 2010
2/8/2010 1NSF Site Visit to MIT DataSpace
DataSpace
Other USA Nodes
International Nodes
DataSpaceHigh-Level
Architecture
Global Network (Web)
Local Network
Metadata Repository
for Scientific Data
Multiple Scientific Data Repositories (DataSpace Native Architecture)
Interface to Legacy Scientific
Data Repositories
. . .
Distributed Data Management Services: Security, Replication, Administration
Policy Management, Workflow Services
Additional Data User Services : • Data Analytics • Data Visualization
Basic Data User Services:Discovery, Quality, Conversion, IntegrationData Curation Services:Process, Catalog, Annotate, Preserve
DataSpace Services
MIT Node
. . .
Scientist Curator UserProvides
data,preliminary metadata
Process and ingests data,
complete metadata, and policies (e.g.
retention)
Searches (meta)data, accesses/integrates data, analyzes/visualizes data (via DataSpace data services or 3rd party data services)
Basic Workflow
DataSpace
3rd par
3rd Party Specialized Data Services
2
PLATFORM ARCHITECTURE
2/8/2010 NSF Site Visit to MIT DataSpace 3
DataSpace
Platform Architecture
Version 0.1 Version 1.0
2/8/2010 4NSF Site Visit to MIT DataSpace
2/8/2010 5NSF Site Visit to MIT DataSpace
Federated Architecture
2/8/2010 6NSF Site Visit to MIT DataSpace
Multiple Implementations
2/8/2010 7NSF Site Visit to MIT DataSpace
Federated Model• Data can be widely distributed; Web-based Services
can be centralized or federated– e.g. centralized, domain-specific search service that
harvests metadata from relevant archives (“google for biological oceanography”)
– e.g. real-time data integration across small sets of archives identified via subject search
• DataSpace will develop some , but more importantly create an ecosystem that others can contribute to (e.g. technology & scientific companies, universities, researchers, labs)
February 8, 2010 NSF Site Visit to MIT DataSpace 8
Development Methodology
• Behavior-Driven Development model• Continuous Integration Process– iteratative research prototyping and production
implementation phases• Small centralized development team to start • Institutional partners add developers in years 1-2• Transparent, open source process• Close collaboration with Data Conservancy
2/8/2010 9NSF Site Visit to MIT DataSpace
OPERATIONS
2/8/2010 NSF Site Visit to MIT DataSpace 10
DataSpace
Local Operations – MIT Example
• Scientists– data production, early-stage curation– lots of domain expertise, little or no curation expertise
• Libraries– outreach and recruitment (e.g. HMI study)– later-stage data curation, ingest– some domain expertise, lots of curation expertise
• IS&T – identifying, operating hardware & system– Enterprise systems management expertise– lots of IT expertise, some curation expertise
2/8/2010 11NSF Site Visit to MIT DataSpace
Project-Wide Operations
• Platform governance– distributed open source software model– transparent decision-making process
• Service model(s) for each institutional partner– including all data curation activities– including CI templates (e.g. hardware, cloud)– associated cost model for each service model
2/8/2010 12NSF Site Visit to MIT DataSpace
Project-Wide Operations
• Ongoing usability studies with researchers, students, public audiences
• Develop certification strategy for TDRs using DataSpace (.arc domain)
2/8/2010 13NSF Site Visit to MIT DataSpace
Data Curation Lifecycle Highlights
• Deposit workflows for researchers based on locally-produced data (interactive and batch)
• Data Curators– outreach, marketing, data recruitment– metadata creation and data ontology application– curatorial policies developed, applied– tailored preservation strategies (local, consortial, outsourced)
Direct access to data creators and boots on the ground support services
2/8/2010 NSF Site Visit to MIT DataSpace 14
Data Curation Lifecycle Highlights
• Novel distributed, standards-based policy management strategy based on emerging Semantic Web standards and TRAC
• Semantic Web standards (e.g. RDF) to support improved data integration and interoperability
• Separation of access layer (discovery, use) from curation layer, in support of broad federation, distributed tool development
2/8/2010 NSF Site Visit to MIT DataSpace 15