DTL FOCUS MEETING ON DATA INTEGRATION, STANDARDS AND FAIR PRINCIPLES IN PROTEOMICS ·...
Transcript of DTL FOCUS MEETING ON DATA INTEGRATION, STANDARDS AND FAIR PRINCIPLES IN PROTEOMICS ·...
DTL FOCUS MEETING ON DATA INTEGRATION, STANDARDS AND FAIR PRINCIPLES IN PROTEOMICS
Luiz Olavo Bonino - [email protected] August, 2016
WHAT IS FAIR DATA?FAIR Data aims to support existing communities in their attempts to enable valuable scientific data and knowledge to be published and utilised in a ‘FAIR’ manner.
Findable - (meta)data is uniquely and persistently identifiable. Should have basic machine readable descriptive metadata.
Accessible - data is reachable and accessible by humans and machines using standard formats and protocols.
Interoperable - (meta)data is machine readable and annotated with resolvable vocabularies/ontologies.
Reusable - (meta)data is sufficiently well-described to allow (semi)automated integration with other compatible data sources.
FAIR DATA PRINCIPLESTo be Findable:F1. (meta)data are assigned a globally unique and persistent identifierF2. data are described with rich metadata (defined by R1 below)F3. metadata clearly and explicitly include the identifier of the data it describesF4. (meta)data are registered or indexed in a searchable resource
To be Accessible:A1. (meta)data are retrievable by their identifier using a standardized communications protocolA1.1 the protocol is open, free, and universally implementableA1.2 the protocol allows for an authentication and authorization procedure, where necessaryA2. metadata are accessible, even when the data are no longer available
To be Interoperable:I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.I2. (meta)data use vocabularies that follow FAIR principlesI3. (meta)data include qualified references to other (meta)data
To be Reusable:R1. meta(data) are richly described with a plurality of accurate and relevant attributesR1.1. (meta)data are released with a clear and accessible data usage licenseR1.2. (meta)data are associated with detailed provenanceR1.3. (meta)data meet domain-relevant community standards
http://www.nature.com/articles/sdata201618
THE FAIR COMPONENTS
FAIR Data Principles
FAIR Data Protocol
FAIR Data Resources
FAIR Data Core Technologies
FAIR Data Systems/Tools
Normative
Artefact
Software
Raw data(many formats)
FAIR download(in local format)
Processed data(primary storage format)
FAIR transformation
FAIR (meta)data(RDF,XML etc.)
High-PerformanceAnalysis
ProvenanceInitial transformation
Analysis transformation
FAIR DATA RESOURCEDatasets expressed using one of the prescribed standards of the FAIR Data Protocol, with metadata complying with the protocol and license. The original dataset is transformed into a FAIR format and proper metadata and license are added to produce a FAIR Data Resource. The original and the FAIR version can co-exist, each one fulfilling its own purpose.
FAIR transformation
FAIR Data Resource
BRING YOUR OWN DATA - BYOD Goals:
■ Learn how to make data linkable “hands-on” with experts■ Create a “telling story” to demonstrate its use
Composition:■ Data owners – specialists on given datasets■ Data interoperability experts■ Domain experts
Source: Marcos Roos
Bring Your Own Data - BYOD
• Goals:• Learn how to make data linkable “hands-on” with experts
• Create a “telling story” to demonstrate its use
• Make FAIR Data at the source
• Composition:• Data owners – specialists on given datasets
• Data interoperability experts
• Domain experts
Source: Marcos Roos
BYOD Planning
Preparation
Identify Plan
Driving question
Datasets
Attendees' profile
Output data access
Tentative dates
Tentative venue
Costs
Funds
Coordination
Set date
Invite attendees
Set venue
Catering
Lodging
Financial planning
Publicity
Working document
Preparatory calls
Data hosting
Software hosting
Documentation hosting
BYOD Planning
Execution
Day One
Introduction
SW, LD, Ontology intro
Use case intro
Workgroups division
Working sessions
WWW/TTTALA
Day Two
Progress report
Working sessions
Groups reports
WWW/TTTALA
Day Three
Data integration
Answer driving question
Explore data
Demo improvement
Final report
WWW/TTTALA
MAIN TASKS Retrieve original data
Dataset identification and analysis
Definition of the semantic model
Data transformation
License assignment
Metadata definition
FAIR Data resource (data, metadata, license) deployment
BYOD Planning
Follow-Up
D+15
Report difficulties
Clarifications
Next steps
D+45
Report difficulties
Clarifications
Next steps
Implementation
Expand FAIRification
Implement solution
Scale-up solution
Deploy
DTL’s BYOD Roadmap
• Rare diseases biobanks companies (Sept 2016)• Rare diseases patient registry companies (Sept 2016)• Rare diseases + WikiPathways (Oct/Nov 2016)• ENSEMBL• Plants• Metabolomics• Human data• Proteomics, …
https://wiki.dtls.nl/index.php/BYOD_meetings
A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a regular (non-FAIR) dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.
FAIR DATA POINT
This is the detailed information about the
genomic dataset
Dataset &Data Record
Metadata
FAIR DATA POINTOk, now that I know what you have, give me the data.
reads
Dataset &Data Record
Metadata
METADATA LAYERSLayer Description URL Example Standard
FDP (Data repository)
Information about the FDP as a data repository
http://myfdp/ PID, title, description, license, owner, API version, etc.
OAI-PMH (extended)
Catalog Information about the catalog of datasets offered
http://myfdp/catalog
PID, title, description, publisher, etc.
W3C DCAT #Catalog
Dataset Information about each of the offered datasets
http://myfdp/[datasetID]/
AccessURL, downloadURL, format, mediaType, etc.
W3C DCAT #Dataset, #Distribution
Data record Information about the actual data, types, identifiers, etc.
http://myfdp/[datarecordID]
data types, domain, range, predicates, etc.
RML-Community/domain, ex.: DICOM, VCF,
FDP METADATA
@prefix dbp: <http://dbpedia.org/resource/> .@prefix dcat: <http://www.w3.org/ns/dcat#> .@prefix dct: <http://purl.org/dc/terms/> .@prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xml: <http://www.w3.org/XML/1998/namespace> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/fdp> a dct:Agent ;rdfs:label "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ;dct:description "This FDP provides metadata on plant-specific genotype/phenotype data sets"^^xsd:string ;dct:hasPart "catalog-01"^^xsd:string ;dct:identifier "FDP-WUR-PB"^^xsd:string ;dct:issued "2015-11-24"^^xsd:date ;dct:language lang:en ;dct:modified "2015-11-24"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-4368-8058> ;dct:title "FAIR Data Point of the Plant Breeding Group, Wageningen UR"^^xsd:string ;dct:version "1.0"^^xsd:string ;
CATALOG METADATA
@prefix dbp: <http://dbpedia.org/resource/> .@prefix dcat: <http://www.w3.org/ns/dcat#> .@prefix dct: <http://purl.org/dc/terms/> .@prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xml: <http://www.w3.org/XML/1998/namespace> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/catalog/catalog-01> a dcat:Catalog ;rdfs:label "Plant Breeding Data Catalog"^^xsd:string ;dct:description "Plant Breeding Data Catalog"^^xsd:string ;dct:hasPart <breedb> ;dct:issued "2015-11-24"^^xsd:date ;dct:language lang:en ;dct:modified "2015-11-24"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-4368-8058> ;dct:title "Plant Breeding Data Catalog"^^xsd:string ;dct:version "1.0"^^xsd:string ;dcat:dataset <breedb> ;
DATASET METADATA
@prefix dbp: <http://dbpedia.org/resource/> .@prefix dcat: <http://www.w3.org/ns/dcat#> .@prefix dct: <http://purl.org/dc/terms/> .@prefix lang: <http://id.loc.gov/vocabulary/iso639-1/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xml: <http://www.w3.org/XML/1998/namespace> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://fdp.biotools.nl:8080/dataset/breedb> a dcat:Dataset ;rdfs:label "BreeDB tomato passport data"^^xsd:string ;dct:description "BreeDB tomato passport data"^^xsd:string ;dct:issued "2015-11-24"^^xsd:date ;dct:language lang:en ;dct:modified "2015-11-24"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-4368-8058> ;dct:title "BreeDB tomato passport data"^^xsd:string ;dct:version "1.0"^^xsd:string ;dcat:distribution <breedb-sparql>,
<breedb-sqldump> ;
METADATA DISTRIBUTION
<http://fdp.biotools.nl:8080/distribution/breedb-sparql> a dcat:Distribution ;rdfs:label "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ;dct:description "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ;dct:issued "2015-11-24"^^xsd:date ;dct:language lang:en ;dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ;dct:modified "2015-11-24"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-4368-8058> ;dct:title "SPARQL endpoint for BreeDB tomato passport data"^^xsd:string ;dct:version "1.0"^^xsd:string ;dcat:accessURL <http://virtuoso.biotools.nl:8888/sparql> .
<http://fdp.biotools.nl:8080/distribution/breedb-sqldump> a dcat:Distribution ;rdfs:label "SQL dump of the BreeDB tomato passport data"^^xsd:string ;dct:description "SQL dump of the BreeDB tomato passport data"^^xsd:string ;dct:issued "2015-11-24"^^xsd:date ;dct:language lang:en ;dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ;dct:modified "2015-11-24"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-4368-8058> ;dct:title "SQL dump of the BreeDB tomato passport data"^^xsd:string ;dct:version "1.0"^^xsd:string ;dcat:downloadURL <http://virtuoso.biotools.nl:8888/DAV/home/breedb/breedb.sql> .