DTL Partners Event - FAIR Data Tech overview - Day 1
-
Upload
luiz-olavo-bonino-da-silva-santos -
Category
Data & Analytics
-
view
154 -
download
0
Transcript of DTL Partners Event - FAIR Data Tech overview - Day 1
Mark WilkinsonUP Madrid
Michel DumontierStanford /Maastricht U.
Jan 2014
SWT1st Skunkworks hackathon
Maastricht - NLW3C DCAT - FAIR Profiles
Apr 2014 Sept 2014
FAIR Data Principles @ FORCE 11
Mar 2015 Apr 2015 Aug 2015
DFDETODEX4all project
SWT2nd Skunkworks hackathon
Hinxton - UKFAIR Profiles, Beacons,
Molgenis
DFDETReleased first beta version of
ORKA
Sept 2015 Feb 2016 Jun 2016
The FAIR Guiding Principlespaper on Scientific Data *
SWTSkunkworks @ Biohackathon
Final version of the PrinciplesFirst attempt FAIR Projection
DFDETStarts the work on
FAIR Data Point
DFDET + SWTFAIR Data Point paper **
* http://www.nature.com/articles/sdata201618** http://www.iste.co.uk/index.php?f=a&ACTION=View&id=1073
Mar 2016
SWT + DFDET + othersStarts work group
on FAIR metrics for data and services
Sept 2016
DFDETStarts the work on the
FAIR Data Search Engineand on the FAIRifier.
FAIR Data Point incorporatesRML
Oct 2016 Nov 2016
SWT + DFDETFAIR Technologies paper
LDP, LDF, RML, FAIR Projectors
DFDETFirst FAIR Hackathons with
Molgenis, Castor EDC, RDRF and OSSE FAIR Data Point
DFDETFAIR Hackathons with
Mendeley and Quaero SystemsFAIR Data Point
DFDETFAIR Data Point v. 1.0
FAIR Data Point
DFDET + SWTFAIR Data workshop
@ ECCB 2016The Hague - NL
FAIR transformation FAIR transformation
Analysis transformation Analysis transformation
FAIRNESS LEVELS
PID\\\
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
Non-FAIR
PID
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
FindableUsable for Humans
PID
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
FAIR metadata
PID
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
FAIR data-restricted access
PID
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
FAIR data-Open Access
PID
Metadata (intrinsic)'provenance' (user
defined)
Data (elements)
FAIR data-Open Access/Functionally Linked
WHAT?
HOW?
WHO?
BYODs, FAIR hackathons, FAIR Data Points, FAIR Search Engine, FAIRifier, FAIR Data Model Registry, Data FAIRport, …
Training, capacity building;
FAIR DATA TOOLS
FAIR DATA POINTA particular class of FAIR Data System that provides access to datasets in a FAIR manner. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a non-FAIR dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.
FAIR DATA POINT
Who are you? Can I trust
you?
FAIR DATA POINT
Here is information
about myself
FDPMetadata
Who is responsible?
FDP license?
Description?
FAIR DATA POINTOk, now that I know you, tell me what you have to offer
reads
FDPMetadata
FAIR DATA POINT
Here is information about
my catalog of datasets
CatalogMetadata
FAIR DATA POINTTell me more about your genomic dataset
reads
CatalogMetadata
FAIR DATA POINT
This is the information about the genomic
dataset
DatasetMetadata
License?Publisher?
Last modified date?
Theme?
FAIR DATA POINTIn which forms the dataset is
available?
reads
DatasetMetadata
FAIR DATA POINT
This is the information about the dataset
distributions
DistributionMetadata
Access or download URL?
Format?
Size?
Media type?
FAIR DATA POINTTell me more
about the dataset content
reads
DatasetMetadata
FAIR DATA POINT
This is the information about the data record of
the dataset
Data recordMetadata
Types?
Domain?
Range?
FAIR DATA POINTOk, now that I know what you have, give me the data.
reads
Dataset, distribution,data record metadata
FAIR DATA POINT
Here is my data.
FAIR DATA POINT - ARCHITECTURE
FAIR Data Point metadata
TitleResponsible institution(s)ContactFAIR API versionLicense…
FDP METADATA<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp> dct:alternative "DTL FDP"@en ;
dct:description "The DTL FAIR Data Point hosts the FAIR Data versions of datasets that have been made FAIR during BYODs as well as other relevant life sciences datasets"@en ;
dct:subject "FAIR Data" , "Life Sciences" ;dct:title "DTL FAIR Data Point"@en ;<http://www.re3data.org/schema/3-0#api> <http://dtls.nl/fdp#api=1> ;<http://www.re3data.org/schema/3-0#catalog>
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/comparativeGenomics> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/patient-registry> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/textmining> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/transcriptomics> ;
<http://www.re3data.org/schema/3-0#institution> <http://dtls.nl> ;<http://www.re3data.org/schema/3-0#institutionCountry> <http://lexvo.org/id/iso3166/NL> ;<http://www.re3data.org/schema/3-0#lastUpdate> "2016-10-27"^^xsd:date ;<http://www.re3data.org/schema/3-0#software> "FAIR Data Point" ;<http://www.re3data.org/schema/3-0#startDate> "2016-10-27"^^xsd:date ;a <http://www.re3data.org/schema/3-0#Repository> ;rdfs:label "DTL FAIR Data Point"@en ;<http://xmlns.com/foaf/0.1/landingpage>
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html> .
FAIR Data Point metadata
Catalog metadataTitleTheme taxonomyIssued date…
CATALOG METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> dct:hasVersion "1.0" ;dct:identifier "biobank" ;dct:issued "2016-02-01"^^xsd:date ;dct:language lang:en ;dct:modified "2016-08-01"^^xsd:date ;dct:title "Rd connect's biobank catalog"@en ;a dcat:Catalog ;rdfs:label "Rd connect's biobank catalog"@en ;dcat:dataset <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> ;dcat:themeTaxonomy <http://dbpedia.org/resource/Biobank> ,
<http://edamontology.org/topic_3337> .
FAIR Data Point metadata
Catalog 1 metadata
Dataset metadataTitlePublisherLicenseTheme(s)Version…
DATASET METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1> dct:creator <http://orcid.org/0000-0002-1215-167X> ;
dct:hasVersion "1.0" ;dct:identifier "77350-collection1" ;dct:issued "2016-02-01"^^xsd:date ;dct:language lang:en ;dct:modified "2016-08-01"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-1215-167X> ;dct:title "Galliera Genetic Bank"@en ;<http://rdf.biosemantics.org/ontologies/fdp-o#dataRecord> <http://dev-vm.fair-
dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> ;a dcat:Dataset ;rdfs:label "Galliera Genetic Bank"@en ;rdfs:seeAlso <http://catalogue.rd-connect.eu/web/galliera-genetic-bank/bb_home> ;dcat:distribution <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-
collection1/csv> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> ;
dcat:keyword "Galliera Genetic Bank" , "biobank" ;dcat:landingPage <http://ggb.galliera.it> ;dcat:theme <http://dbpedia.org/resource/Biobank> ,
<http://edamontology.org/topic_3337> , <http://www.orpha.net/ORDO/Orphanet_1023> …
FAIR Data Point metadata
Catalog 1 metadata
Dataset 1 metadataDistribution metadata
TitleMedia typeDownload/access URLLicense…
DISTRIBUTION METADATA<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/distributionTurtle> dct:description "Ring14 biobank turtle distribution"@en ;
dct:hasVersion "1.0" ;dct:identifier "distributionTurtle" ;dct:issued "2016-02-01"^^xsd:date ;dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ;dct:modified "2016-07-07"^^xsd:date ;dct:title "Ring14 biobank turtle distribution"@en ;a dcat:Distribution ;rdfs:label "Ring14 biobank turtle distribution"@en ;dcat:downloadURL <http://semlab1.liacs.nl:8080/rdc-demo-dataset/RING_14_dummy-Biobank.ttl> ;dcat:mediaType "text/turtle" .
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank/77350-collection1/ldf> dct:description "Ring14 biobank linked data fragment distribution"@en ;
dct:hasVersion "1.0" ;dct:identifier "ldf" ;dct:license <http://rdflicense.appspot.com/rdflicense/cc-by-nc-nd3.0> ;dct:title "Ring14 biobank linked data fragment distribution"@en ;a dcat:Distribution ;rdfs:label "Ring14 biobank linked data fragment distribution"@en ;dcat:accessURL <http://dev-vm.fair-dtls.surf-hosted.nl:5050/ring14-biosample> .
FAIR Data Point metadata
Catalog metadata
Dataset metadata
Distribution metadata
Data record metadataTypeDomainRange…
DATA RECORD METADATA<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/datarecord/77350-collection1-datarecord-1> dct:hasVersion "1.0" ;
dct:identifier "77350-collection1-datarecord-1" ;dct:issued "2016-02-01"^^xsd:date ;dct:language lang:en ;dct:modified "2016-08-01"^^xsd:date ;dct:publisher <http://orcid.org/0000-0002-1215-167X> ;dct:title "Galliera Genetic Bank datarecord metadata" ;<http://rdf.biosemantics.org/ontologies/fdp-o#refersTo> <http://dev-vm.fair-dtls.surf-
hosted.nl:8082/fdp/biobank/77350-collection1/csv> ;<http://rdf.biosemantics.org/ontologies/fdp-o#rmlMapping>
<https://git.lumc.nl/biosemantics/ring14-fdp-metadata/raw/bd01b84fb792ae3860fdda646e9cb96a1a11205c/rml/biobank/RING_14_biobank_mapping.ttl> ;
a <http://rdf.biosemantics.org/ontologies/fdp-o#DataRecord> ;rdfs:label "Galliera Genetic Bank datarecord metadata" .
<#ring14-biobank-id-resource> rml:logicalSource <#inputFile>; rr:subjectMap [ rr:template "http://rdf.biosemantics.org/dataset/ring14/resource/identifier/{Sample ID}" ; rr:class <http://rdf.biosemantics.org/ontologies/rd-connect/21f6df30_1f72_45fb_bfc1_2b3d1af1410a> ];
FAIR Data Point metadataCatalog 2 metadata
Catalog 1 metadata
Dataset 1 metadata
Distribution 1.a
metadata
Data record metadata
Distribution 1.b
metadata
Dataset 2 metadata
Distribution 2.a
metadata
Data record metadata
Distribution 2.b
metadata
Dataset 3 metadata
Distribution 3.a
metadata
Data record metadata
FAIR DATA POINT
METADATA LAYERSLayer Description Example StandardFDP (Data repository)
Information about the FDP as a data repository
PID, title, description, license, owner, API version, etc.
RE3Data
Catalog Information about the catalog of datasets offered
PID, title, description, publisher, etc.
W3C DCAT #Catalog
Dataset Information about each of the offered datasets
Publisher, issue date, theme, etc.
W3C DCAT #Dataset,
Distribution Information about how the dataset is distributed
AccessURL, downloadURL, format, mediaType, etc.
W3C DCAT #Distribution
Data record Information about the actual data, types, identifiers, etc.
Data items types, identifiers, domain, range, etc.
RML
OAI-P
MH
DEMO FAIR DATA POINT
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/API
GUI
FAIR DATA POINT
EXISTING DATA REPOSITORIES
EXTENDING EXISTING DATA REPOSITORIES
+
011001111001011001100
Development started in October 2016
011001111001011001100
metadataindex
retrievesmetadata
searchinterfaces
(GUI and API)
011001111001011001100
Development started in October 2016Based on OpenRefine
FAIRIFICATION PROCESS Retrieve original data
Dataset identification and analysis
Definition of the semantic model
Data transformation
License assignment
Metadata definition
FAIR Data resource (data, metadata, license) deployment
FAIRIFICATION
FAIR Data Resource
submit generate
Generic semantic
model
FAIRIFIER Transform non-FAIR datasets into FAIR Data
Resources (dataset in FAIR format, license and metadata)
Data munging
Semantic modeling
License definition
Metadata definition and extraction
Data publication
FAIRIFIER
FAIR DATA MODEL REGISTRY
FAIRIFICATION
FAIR Data Resource
submit generate
Generic semantic
model
FAIRIFICATION - NEW DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model Registry
store
Non-FAIR - FAIR
mapping
FAIRIFICATION - RECURRING DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model Registry
query
Non-FAIR - FAIR
mappingretri
eve
A particular class of FAIR Data System to provide support for data interoperability;
Supports publication and access to FAIR data. Fosters an ecosystems of applications and
services; Federated architecture: different FAIRports
(and other FAIR Data Systems) are interconnectable;
Supports citations of datasets and data items; Provides metrics for data usage and citation;
F A IR
FAIRPORT
Allow third-party annotation on existing knowledge bases
Capture the provenance of the annotator and the original statement
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/
ANNOTATIONS GO TO NANOPUB STORE
TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17
FAIR Data Point
Version 1Metadata editor,
release metadata, POST,
FAIR accessor
Version 1.1Reintroduce
OAI-PMH compliance
Version 1.2Update
notification
FAIR Data Search Engine
Beta 1Crawler,
metadata index and search GUI
Beta 2Improved
search GUI, search API
FAIRifier
Beta 1OpenRefine + RDF plugin,
publication to FAIR Data Point
Beta 2Metadata
definition and extraction
(RML), license picker
TOOLS ROADMAPDec 16 Jan 17 Feb 17 Mar 17
FAIR Data Model
Registry
Alpha 1Start of the integration
work
ORKA
Beta 1Definition of 2-3
use cases
Beta 2Extended with
features required by the
use cases
Data FAIRport
Alpha 1Start of the integration
work
TECHNOLOGY TRANSFER EVENTS
EXTENDING EXISTING DATA REPOSITORIES
+
FAIR HACKATHON - GOALS Align solutions with FAIR Data Point
specifications.
Metadata content
API
Data
FAIR HACKATHON OUTCOME FAIR data model for solutions content;
Architecture of the required adjustments/extensions;
Technical specification of the adjustments/extensions;
Proof-of-concept of the adjusted solution;
FAIR HACKATHONS
RDRF
MOLGENIS FAIR HACKATHON
MOLGENIS FAIR HACKATHON
MOLGENIS FAIR HACKATHON
DTL’S FAIR HACKATHONS ROADMAP EUDAT (pilot project ongoing) EGA (July 6-8 2016) Molgenis (Oct 19-20 2016) Patient registry solution providers (Oct 25-27 2016) Mendeley (Nov 18 2016) Quaero Systems (Nov 24 2016) tranSMART (TBD) phenotypeDB (TBD) Euretos Knowledge Platform (TBD) NIH, Australian National Data Services, Brazilian open
government data, …
BRING YOUR OWN DATA - BYOD Goals:
■ Learn how to make data linkable “hands-on” with experts
■ Create a “telling story” to demonstrate its use■ Make FAIR Data at the source
Composition:■ Data owners – specialists on given datasets■ Data interoperability experts■ Domain experts
Source: Marcos Roos
Domain Expert
Data Owner FAIR Data Expert
BYOD
BYOD
BYOD Planning
Preparation Execution Follow Up
BYOD Planning
Preparation
Identify Plan
DatasetsAttendees' profileOutput data accessTentative datesTentative venueCostsFunds
CoordinationSet dateInvite attendeesSet venueCateringLodgingFinancial planning
PublicityWorking documentPreparatory callsData hostingSoftware hostingDocumentation hosting
BYOD Planning
Execution
Day One
IntroductionSW, LD, Ontology introUse case introWorkgroups divisionWorking sessionsWWW/TTTALA
Day Two
Progress reportWorking sessionsGroups reportsWWW/TTTALA
Day Three
Data integrationAnswer driving questionExplore dataDemo improvementFinal reportWWW/TTTALA
BYOD Planning
Follow-Up
D+15
Report difficultiesClarificationsNext steps
D+45
Report difficultiesClarificationsNext steps
Implementation
Expand FAIRificationImplement solutionScale-up solutionDeploy
BYOD
FA
IR
FAIR HACKATHON
BBMRI
2.0
FAIRdICT
RDConnectt
ODE
X4A
LL
myFAIR El
ixir
Exce
llera
te
Core FAIR
TechnologyFAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
FAIR Data E.T.
RELATED PROJECTS
ODEX4allFAIR-dICT
myFAIR
QUESTIONS?