Post on 13-Jan-2016
Ocean Observatories Initiative
Data Management (DM) Subsystem Overview
Michael Meisinger
September 29, 2009
2
OOI CI Kick-Off MeetingSept 9-11, 2009
Outline
• Subsystem Architecture Overview• Scope of Release 1• Selected Components
– Data Distribution based on the Exchange– Data Store as a Service
3
OOI CI Kick-Off MeetingSept 9-11, 2009
Data Distribution w/ Exchange
• Context of DM within CI
• Exchange handles Data distribution
Common Operating Infrastructure
Data Management
(Science)
Sensing & Acquisition
Data Management (Information Distribution)
Analysis & Synthesis
Identity Management
State Management
GovernanceFramework
Resource Management
Planning & Prosecution
Exchange
Service Framework
Presentation Framework
Common Execution
Infrastructure
4
OOI CI Kick-Off MeetingSept 9-11, 2009
Data Processing and Availability• Multiple
aspects of data management
• Data processing and analysis at various levels of abstraction
• Data distribution critical to global scientific research
5
OOI CI Kick-Off MeetingSept 9-11, 2009
Requirements
• Focus on High risk requirementsThe CI shall implement an OOI-standard metadata model for resources
The OOI-standard metadata model shall support a description of physical resource behavior
The OOI-standard metadata model shall support a description of physical resource content
The OOI-standard metadata model shall support a syntactic description for the content of an
information resource
The OOI-standard metadata model shall support a semantic description for the content of an
information resource
The OOI-standard metadata model shall support tracking of resource provenance
The OOI-standard metadata model shall support tracking of quality
The OOI-standard metadata model shall support tracking of context
The OOI-standard metadata model shall support tracking of correspondence
The OOI-standard metadata model shall support tracking of citation
The OOI-standard metadata model shall support tracking of lineage
The OOI-standard metadata model shall be extensible
The CI shall provide semantic services to support ontological representations and relationships
The semantic services shall utilize domain-specific vocabularies
A user interface to define vocabulary terms shall be provided
The vocabularies shall be extensible
The semantic services shall recommend new terms to enter into the vocabulary
The semantic services shall implement an ontological language
The semantic services shall implement an ontological engine
The CI shall provide persistent archive services
The persistent archive services shall be data format agnostic
The persistent archive services shall be subject to policy
The persistent archive services shall preserve all associations between data and metadata
The persistent archive services shall ingest data independent of delivery order
The persistent archive services shall guarantee the integrity of archived data
The persistent archive services shall support distributed data repositories
The persistent archive services shall support federation
The persistent archive services shall support data versioning
The persistent archive services shall acknowledge requests for data and provide an estimate for response time
Com
mon
dat
a an
d m
eta-
data
mod
els
Com
mon
dat
a an
d m
eta-
data
mod
els
Arc
hiva
l, sy
ntax
, sem
antic
s, U
I…
Arc
hiva
l, sy
ntax
, sem
antic
s, U
I…
6
OOI CI Kick-Off MeetingSept 9-11, 2009
Scope of Release 1• Common data and metadata model
– Resource metadata, behavior, lifecycle, content, provenance, lineage, citation, quality, context, correspondence
– Extensible vocabularies and ontologies– Data formats (syntax and semantics)
• Dynamic data distribution services– Pub/sub, topics, processing chaining, sequestration
• Data catalog and repository– Discovery, metadata management
• Persistent archive services– Repository management, common repository
framework, ingestion services, long-term archival
7
OOI CI Kick-Off MeetingSept 9-11, 2009
DM Functional ComponentsData Management Services Network
Science Data Services
Information Distribution Services
“Standardized“Data products
Ingestion Transformation
ExchangeSensing &
AcquisitionSN
Observed dataData Products
Metadata
Presentation
Analysis & Synthesis SN
Preservation Inventory
DX Prototype
• Data Exchange (DX) prototype barely touches the Ingestion/Transformation/ Exchange/Preservation in the context of a Data distribution model
• DX strongly informs further refinements of the DM architecture and technology choices
8
OOI CI Kick-Off MeetingSept 9-11, 2009
Information Container Model
• Encapsulates all kinds of information resources, such as: scientific data, user identities, process definitions, virtual machine images, etc.
• Multiple levels of meta-data
• Separation of concerns between Information services
Information Container
Meta-data(L1)
Information Block
Meta-data(L2)
Information Content
Header (optional)
Body (Content)
Information Container
Information Block
Meta-data (L1)
describe
Information Content
Meta-data (L2)
Header(Meta-data L3)
Body
describe
describe
Process Spec
Science Data
Ingestion
Transformation
operates
operates
InformationServices
InformationModel
9
OOI CI Kick-Off MeetingSept 9-11, 2009
Ingestion
• Provides basic mechanisms for identifying the data streams and formats, parsing the content and identifying the associated meta-data, adding version information, and registering the streams with a ISN Repository
Ingestion
Versioning
Exchange
Data Format Detector
Registrar
Metadata Extractor
Data Parser
10
OOI CI Kick-Off MeetingSept 9-11, 2009
Ingestion Service Data Model
• Relationship between the constituents of the Ingestion Service and the Information Container Model
Versioning
Data Format Detector
Registrar
Meta-Data Extractor
Data Parser
Information Container
Information Block
Meta-data (L1)
describe
Registration Information
Version
Ownership
Authorship
Policies
Annotations
operates
operates
operates
operates
operates
IngestionServices
InformationModel
11
OOI CI Kick-Off MeetingSept 9-11, 2009
Transformation Service Data Model
• Relationship between the constituents of the Transformation Service and the Information Container Model
Format Conversion
Data Parser
Mediation
Meta-Data Extraction
V&V
Information Block
InformationModel
Information Content
Meta-data (L2)
describe
Header(Meta-data L3)
Bodydescribe
operates
operates
operates
Syntax
SemanticsOntology rely
Standard rely
operates
operates
TransformationServices
12
OOI CI Kick-Off MeetingSept 9-11, 2009
Preservation Service Data Model
• Relationship between the constituents of the Preservation Service and the Information Repository Model
ReplicationHistory Backup Archive
Information Services
Information Repository
*
InformationRepresentations
Syntax
SemanticsOntology rely
*
InformationEntities
*
represent
InformationContainer
Meta-data (L0) Data Product
*
Process Definition
Model Process Repository
Instrument Process Repository
Process Definition Repository
Data Product Repository
Resource Repository
retainsretains
PreservationServices
Distribution Strategy*
abstracts
Resource Reference
locates
Resource
Organizational Architecture
Deployment Architecture
locates
Standard rely
describe
operates on
13
OOI CI Kick-Off MeetingSept 9-11, 2009
Scientific Data Transport
• As DAP evolves, Unidata’s CDM may be its successor*– OpenDAP– netCDF– HDF5
DataRepresentation
Meta-dataRepresentation
Type Domain
Protocol
Dataset
describe
Variables AttributesVar
Value
Var Type
VarName
AttrName
Attr Type
Attr Value
semantic meta-data *
Syntactic metadata
DAP Data Type
DDSDataset Descriptor Structure
DAP Atomic Type
String0..1
DASDataset Attribute Structure
characterize
encode
Variable Attributes
Global Attributes
Structure
0..1
Arrayunidimensional
0..1
0..1
DataDDSData Dataset Descriptor Structure expose
provide data
* Comparison available at: http://wiki.opendap.org/twiki/bin/view/Developers/ModelSummary
• Currently DAP as canonical form
14
OOI CI Kick-Off MeetingSept 9-11, 2009
Data Store as Service
• Exchange makes data transport possible and physical location of data becomes transparent to application
• Storage mechanisms abstracted to improve flexibility• Ability to choose the best technology for the available
platform that fits the intended purpose• Multiple different storage “back-ends” possible
• Attribute Store prototype as the predecessor to a storage architecture
15
OOI CI Kick-Off MeetingSept 9-11, 2009
Attribute Store
Attribute Store
Command Processor
Repository
Command Set
Commandsimplements
READ WRITE DELETE QUERY SEARCH
Entities
*Key
Value
operate on
Lookup<<Specification>>
matches
describesRepresentational<<Specification>>
understands
Composite
*<<Specification>>
Atom
Map StringList Wildcard
Regexp
• generic repository of information organized around key + value pairs• intended to provide fast, reliable data storage and retrieval for lightweight data
elements (not a full-blown SQL engine).• Decomposition:
– Command Processor – interfaces with other OOI entities and abstracts from Repository technology
– Repository – stores the actual content in using the best technology available for the selected platform
– Specification – describes Repository and how to store/retrieve/match elements to/from Repository
16
OOI CI Kick-Off MeetingSept 9-11, 2009
Attribute Store - DesignFundamental Interaction Pattern
ALT
ALT
Attribute Store
Command Processor Repository
WRITE(key,newvalue)
query(key)
Application
Key
KEY_NOT_FOUND
set(key, newvalue)
ERRORFAILURE
OKOldValue
WRITE(key, newvalue)
FAILURE
OldValue
get(key)
OldValue
Assign(OldValue, Newvalue)
Application Attribute Store
COMMAND(arguments)
RESPONSE
Internal Interaction Pattern for the WRITE Cmd.
Command SetCommand Arguments
(Input) Response (Output)
Semantics
WRITE Key, NewValue
OldValue, FAILURE
Locate pair (key, *) and if exists then assign to OldValue to the current value, otherwise assign to OldValue to NewValue. Set/Create pair (key, NewValue) and return OldValue or failure when creation failed.
READ Key Value, INVALID KEY, FAILURE
Locate pair (key, *) and return the value associated with the key (if found). Return invalid key when there’s no pair with that key, or failure when the read could not be performed.
DELETE Key SUCCESS, INVALID KEY, FAILURE
Locate pair (key, *) and delete it. Return invalid key when the pair could not be found, failure when the pair could not be deleted.
QUERY Regexp [Keylist] Searches for keys matching the regexp pattern and returns a list of them. The list is void when there are no matching keys.
17
OOI CI Kick-Off MeetingSept 9-11, 2009
Data Representation
SDXF Element
Composite Chunk
Atomic Chunk
HeaderData
Chunk Identifier
(16bit)
Flags (8 bit)
Length0..1
Data Type (3 bit)
Compressed (1 bit)
Encrypted(1 bit)
Short Chunk (1 bit)
Array(1 bit)
Reserved(1 bit)
Pending structure
String
Bit string
Numeric (16 bit int)
Structure
Float
UTF-8
Reserved
SDXF
Data TypeInteger32 bit
Enum(integers)
Hyper int(64 bit)
Float(32 bit)
Unsigned Int32 bit
Boolean
String
ElementUnique
Identifier
Structure
XDR Data Object
Composite
Union
void
Double(64 bit)
Quadruple(128 bit)
Opaque
Fixed sized
Variable sized
Array
Container
Typedef
Optional Data
Constant
XDR
Data Type Boolean
Signed Int(16 bit)
Double Float(64 bit)
Byte(8 bit)
Long(64 bit)
String
ElementUnique
Identifier
Extprot Data
Composite
ListTuple ArrayDisjoint Sum
(tagged union)
PolymorphicMessagesExtprot
Data Type Bool
Signed Int 16
Signed Int 64
String
Byte
Signed Int 32
Double
Element
UniqueIdentifier
Structure
List
Set
Map
Thrift Object
Composite
Container Struct
Exception
Thrifft
TSocket
TFileTransport
Transport
Implementation
Protocol
void
Service
Methods
Interfaces
Bidirectional Sequenced Messaging
Encoding of Types
STOP
Version
carries
Facebook Thrift
• Data Representation/Encoding Standards – Processing– Transport– Storage
• Many choices… with overlapping capabilities
Technology MappingFunctional Component Technology TRLDataset Catalog THREDDS 7Semantic Framework VSTO Semantic Framework 8Semantic Query ESG Facetted Search 8Data Integration with Applications NetCDF lib 8Data Integration with Applications Matlab lib for OpenDAP access 8Dataset Management & Distribution OpenDAP Hyrax Server 7Dataset Preservation iRODS 7General Purpose Database MySQL cluster 9Data Grid File Transfer GridFTP 9Dataset Access Protocol DAP 9Dataset File Format NetCDF 9Metadata Conventions CF Metadata (Climate & Forecast)9Dataset Aggregation Language NcML 9Query language for RDF SPARQL 8Knowledge Discovery Model URIQA 9Oceangraphic Vocabularies & MappingsMMI 7External Data Presentation OGC Services 9 & 7
19
OOI CI Kick-Off MeetingSept 9-11, 2009
Thanks !
20
OOI CI Kick-Off MeetingSept 9-11, 2009
DM Components• Base is DM FDR presentation• Data Distribution based on the Exchange
– Data Exchange architecture after services OV2 slide as example for a data distribution (vs storage model, the older model); real architecture has not been chosen; DX strongly informs. Covers Ingestion, Transformation, Preservation in the context of a Data distribution model
– DAP as canonical form for transport of data. For given streams there are canonical forms (e.g. DAP), but not for the system in general (i.e. a database). That’s why we chose the new model. Be aware that the underlying data model of DAP is in evolution. Unidata CDM. Insert a few references to these models.
– Reference to encoding formats, FIPA header– Query against the past (e.g. archive query) or the future (e.g.
subscriptions). Pointer to SQLstream prototype• Data Store as a Service
– Attribute store as the predecessor to a storage architecture– Model, commands
21
OOI CI Kick-Off MeetingSept 9-11, 2009
• FIPA Provides valuable models for– Communication
patterns– Message structure
Message
Header
*
Message Parameters
Message TypeEnd of
Message
Message ID
Version
User defined<<Predefined Msg Type>>
User defined<<Predefined
Message Parameters>>
End of Collection
PredefinedMessage
Parameters
1
Control
*Sender
<<identity>>ConversationParticipants
0..1
<<Communicative Act>> Performative
Receiver<<identity set>>
Reply-To<<identity>>
0..1
0..1
Content
1
Semantics
Encoding0..1
LanguageOntology
0..10..1
*
Protocol
ConversationID
Reply-with
In-Reply-To
Reply-by
FIPA ACL Message Parameters
Content Description
Syntax
22
OOI CI Kick-Off MeetingSept 9-11, 2009
Subsystem
• Data and Information Access– Search & Navigation– External observatory access (IOOS, Neptune
Canada, …)
• Transformation and Mediation– Attribution & Association– Aggregation– Syntactical Transformation– Ontology-based mediation between vocabularies
• Dynamic Data/Information Distribution– Persistent Archive– Information Catalog & Repository
Sensing & Acquisition
Data Management
Planning & Prosecution
Analysis & Synthesis
Common Execution Infrastructure
Common Operating Infrastructure
Capability Container
Data Management