Post on 13-Aug-2015
METADATAa modern approach
Daniele Bailo
CHARACTERS
Leading Actor
Digital Data
Sequence of (digital) symbols- With a meaning- Can be stored- Can be transmitted- Can be computed
Guest Star
Metadata
Data about Data (really?)
FunctionsManage Data (discovery, selection etc)
Issues (selection of)- What is metadata to
me, can be data to others
- Many standards- Ontologies
Actor
Broker(ing system)Intermediary software
Functions- Access to several
system at your place
- Collects data for you (integration)
Issues (selection of)- Performances- Works better with
metadata
Actor
The Triad
A set of 3 elements to fully manage data
FunctionsPID – persistent identifierMetadata – discovery & selectionDO – data of interest
<PID, metadata, DO>
Technical support staff
Data Base
Collection of (organized) Data
AliasRepository, Data Center etc.
Superpowers- DBMS (allows definition, creation, querying, update, and administration of databases)
Technical support staff
APIs Application programming Interface
Standard procedures or instructions to access to a service (or function)
AliasWEB service, RESTful service, [thin layer] etc..
Needs- Standards for
requests- Standards for
response
Themes1. Optimizaton of
resources
2. Single point access…to several Database and services
3. OPEN ACCESS obligationsBerlin Declaration,DPC…
4. Interoperation for data re-use New multidisciplinary science
5. Citationand data provenance
Comments?
Questions?
SCENARIOS1. Friendship based
discovery
2. Manual discovery
3. Advanced manual discovery
4. Brokering (canonical form
5. Metadata driven canonical brokering
6. Metadata driven canonical brokering with contextualization
PREMISEStructured data (standards)
#0 friendship based discovery1. data stored on USB
pendrives, CDs etc.
2. Phone calls
3. Emails
Issues
Works well in masonry clubs
#1 Manual discovery
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. User discovers data
2. Repository do not have web services
3. No metadata (or embedded into file or diectory structure)
4. Manual match & mapping
Issues
Performances, efficiency, error prone, partial datasets
Dataset
Dataset
DatasetData
setDataset
Dataset
#2 Advanced manual discovery
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. User discovers data
2. Repository have access interfaces (APIs, WS…)
3. Minimal metadata set
4. Manual match & mapping
Issues- Performances,
efficiency, error prone
- Some standardization in place
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
#4 Brokering (canonical form)
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. Broker discovers data
2. Repository have access interfaces (APIs, WS…)
3. Minimal metadata set
4. Minimal match &mapping
5. Multdisciplinary (ontologies)
Issues- Single AP- development and
maintenance- “hardcoded”
metadata
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
Broker
API Metadata canonical form
#5 Metadata driven canonical Brokering
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. Broker discovers data
2. Access interfaces3. Full metadata set4. Advance match
&mapping5. Multdisciplinary
(ontologies)Issues- Single AP- Stored graph
metadata- Huge metadata
superset
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
Broker
API Metadatacatalog
#6 Metadata driven canonical Brokeringwith contextualization
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. Map & match only contextualization metadata
2. Pointers to detailed metadata
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
Broker
API Metadatacatalog
#6 Metadata driven canonical Brokeringwith contextualization
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
1. Map & match only contextualization metadata
2. Pointers to detailed metadata
3. Export metadata in any standard
3 layer metadata model
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
Gen
erat
e
Point to
Question
There is a missing actor.
WHO?
Dataset
Dataset
DatasetData
setDataset
DatasetData
setDataset
Dataset
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
<PID, metadata, DO>1. PID univocally
identifies a Digital Object
2. Metadata provides description of the Object
3. DO is the Digital Object… to be defined
Data from Irpinia
<PID, metadata, DO>
request response
Wrapping up
We need1. Metadata describing
data2. APIs & web services3. Defined WS output
format4. PID system -5. Brokering system6. Metadata catalogue
supporting1. Ontologies2. Contextualization
Q&A
#3 Metadata driven canonical brokering
= data Format A – repository A
= data Format B – repository B
= data Format C – repository C
Dataset
Dataset
Dataset
Data from Irpinia
1. Broker discovers data
2. Repository have access interfaces (APIs, WS…)
3. Significant metadata set
4. Good match &mapping
Issues
- development and maintenance
- Single AP
- “hardcoded” metadata
Dataset
Dataset
DatasetData
setDataset
Dataset
API API API
Broker
API Metadatacatalog
#4 Metadata driven canonical brokering
Broker
= any data format
Dataset
Issues
1. Predefined tools for matching and mapping
2. Writing software: n conversion algorithms to canonical form
3. Ontologies
4. Multidisciplinarybut many formats
5. Good data discovery
6. Not all metadata used
Dataset Data
set
Dataset
Dataset
= metadata format A
= metadata format B
Data from Irpinia
catalog
#1 Conventional
Brokering
Broker
= data Format A
= data Format B
= data Format C
Dataset
Dataset Data
set
Dataset
Dataset
Dataset
Dataset
DatasetData
set Dataset
Dataset
Dataset
Data from Irpinia
Issues
1. Writing software: n*(n-1) conversion algorithms
2. does not scale in costs of development and maintenance
3. matching and mapping
4. works within a restricted research domain
5. “Complex” data discovery
#2 Brokering with canonical form
Broker
= data Format A
= data Format B
= data Format C
Dataset
Dataset Data
set
Dataset
Dataset
Dataset
Dataset
DatasetData
set Dataset
Dataset
Dataset
Data from Irpinia
Issues
1. Writing software: n conversion algorithms to canonical form
2. works within a restricted research domain
3. matching and mapping
4. “Complex” data discovery
= canonical Format A
#3 Metadata driven simple brokering
Broker
= any data format
Dataset
Issues
1. Good data discovery
2. Predefined tools for matching and mapping
3. Multidisciplinarybut many formats
4. Writing software: n*(n-1) conversion algorithms
5. Ontologies
Dataset Data
set
Dataset
Dataset
= metadata format A
= metadata format B
Data from Irpinia
METADATA
#2 Metadata driven canonical brokering
Broker
= any data format
Dataset
Issues
1. Predefined tools for matching and mapping
2. Writing software: n conversion algorithms to canonical form
3. Ontologies
4. Multidisciplinarybut many formats
5. Good data discovery
Dataset Data
set
Dataset
Dataset
= metadata format A
= metadata format B
Data from Irpinia
catalog
METADATA