Force11 JDDCP workshop presentation, @ Force2015, Oxford
-
Upload
mark-wilkinson -
Category
Internet
-
view
421 -
download
0
Transcript of Force11 JDDCP workshop presentation, @ Force2015, Oxford
EU LeadMark Wilkinson
Fundacion BBVA Chair in Biological Informatics
Isaac Peral Distinguished Researcher, CBGP-UPM
USA LeadMichel Dumontier
Associate Professor, Biomedical Informatics, Stanford
FAIRport Project LeadBarend Mons
Professor, Leiden University Medical Centre
FAIRport Skunkworks
“Skunkworks”
Team Update
Objectives and Outcomes
(...so far...)
What is a FAIRport?
● Findable - (meta)data should be uniquely and persistently identifiable
● Accessible - identifiers should provide a mechanism for (meta)data
access, including authentication, access protocol, license, etc.
● Interoperable - (meta)data should be machine-accessible, using a
machine-parseable syntax and, where possible, shared common
vocabularies.
● Reusable - there should be sufficient machine-readable metadata that it is
possible to “integrate like-with-like”, and that component data objects can
be precisely and comprehensively cited post-integration.
“Skunkworks”
“...a group within an organization given a high
degree of autonomy and unhampered by
bureaucracy, tasked with working on advanced
or secret projects.” -- Wikipedia: http://en.wikipedia.org/wiki/Skunk_Works
“Skunkworks” FAIRport group
Objective (ongoing) - explore existing technologies and attempt to build
prototype FAIRport code components using, whenever possible, existing
standards. Once desirable FAIR behaviors have been achieved, hand-off
to a professional coding team to ensure production-quality outcomes.
● Self-selected “hackers”
● Self-identified tasks (next few slides)
● Led to a series of Web meetings, and a joint Hackathon, with
participants at venues in Netherlands and USA.
Typical Problem
I’m looking for microarray data of human liver cells on a
time-course following liver transplant.
What repositories *could* contain this data?
● GEO? EUDat? NPG Scientific Data?
● What fields in those repositories would I need to
search, using what vocabularies, to find what I
need?
“Skunkworks” - initial observations
There are a lot of repositories out there!
General Purpose: Dryad, EUDat, Figshare, DataVerse, etc.
Special Purpose: PDB, UniProt, NCBI, EnsEMBL
Lack of rich, machine-readable descriptions of the contents of these
repositories hinders us from (for example):
● knowing where we can look for certain types of data
● knowing if two repositories contain records about the same thing
● Cross-referencing or “joining” across repositories to integrate
disparate data about the same thing
● Knowing which repository I could/should deposit my data to (and how)
If we wanted to enable this kind of FAIR discovery and
integration over myriad repositories, what infrastructure
(existing/new) would we need?
Challenge
Task:
harmonized cross-repository meta-descriptors
Though self-selected as a FAIRport Skunkworks task, this significantly
overlaps with the Force11 Data Citation Implementation Working Group
Team 4 - “Common repository interfaces”.
...so we joined forces :-)
Exemplar use-cases:
A piece of software that can generate a “sensible” query form/interface for
any repository
A piece of software that can generate a “sensible” and comprehensive
data submission form for any repository
Task:
harmonized cross-repository meta-descriptors
Prior Art?
“DCAT is an RDF vocabulary designed to facilitate interoperability
between data catalogs published on the Web…. By using DCAT to
describe datasets in data catalogs, publishers increase discoverability
and enable applications easily to consume metadata from multiple
catalogs. It further enables decentralized publishing of catalogs and
facilitates federated dataset search across sites. Aggregated DCAT
metadata can serve as a manifest file to facilitate digital preservation.”
http://www.w3.org/TR/vocab-dcat/
W3C Recommendation 16 January 2014
DCAT Data Catalog Vocabulary
DCAT is an RDF Schema that defines core metadata elements describing
dataset collections and the datasets within those collections. e.g.
:dataset-001
a dcat:Dataset ;
dct:title "Imaginary dataset" ;
dcat:keyword "accountability","transparency" ,"payments" ;
dct:issued "2011-12-05"^^xsd:date ;
dct:modified "2011-12-05"^^xsd:date ;
dct:temporal <http://reference.data.gov.uk/id/quarter/2006-Q1> ;
dct:spatial <http://www.geonames.org/6695072> ;
dct:publisher :finance-ministry ;
dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
dcat:distribution :dataset-001-csv ;
Prior Art?
DCAT Data Catalog Vocabulary
So the core metadata of a repository’s collections
could be described in DCAT...
So the core metadata of a repository’s collections
could be described in DCAT...
...if the repositories used DCAT…
So the core metadata of a repository’s collections
could be described in DCAT...
...if the repositories used DCAT…
...generally speaking, they don’t...
So the core metadata of a repository’s collections
could be described in DCAT...
...if the repositories used DCAT…
...generally speaking, they don’t...
...and we need more than just core metadata to enable
cross-repository search anyway…
So DCAT itself isn’t the solution to our problem
because, among other things, it does not
provide sufficiently rich descriptors
What exactly *is* our problem?
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Defines
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Metadata Record (e.g. DCAT-compliant RDF)
Defines
Describes
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Metadata Record (e.g. DCAT-compliant RDF)
DCAT RDFS Schema
Defines
Describes
Defines
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Metadata Record (e.g. DCAT-compliant RDF)
DCAT RDFS Schema
If everyone was using all elements of the DCAT schema
to define their core metadata
then (that part of) the problem would be solved at this point
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Metadata Record (e.g. DCAT-compliant RDF)
DCAT RDFS Schema
If everyone was using all elements of the DCAT schema
to define their core metadata
then (that part of) the problem would be solved at this point
We could use THIS
What exactly *is* our problem?
Data Record (e.g. XML, RDF)
Data Schema (e.g. XMLS, RDFS)
Metadata Record (e.g. DCAT-compliant RDF)
DCAT RDFS Schema
If everyone was using all elements of the DCAT schema
to define their core metadata
then (that part of) the problem would be solved at this point
To build queries
about THIS
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
REALITY
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
Repositories don’t all use DCAT Schema
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
Those that use DCAT Schema, use only parts of it
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
Those that don’t use DCAT
use a myriad of alternatives (some very loosely defined)
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
And don’t necessarily use
all elements of those alternatives either
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
So how are we going to do RICH queries over all
of these?
What exactly *is* our problem?
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
We need a way to describe the descriptors...
The DCAT WG suggested the same thingThey said there was a need for “DCAT Profiles”
A DCAT Profile is a specification for data catalogs that adds additional
constraints to DCAT. Additional constraints in a profile MAY include:
● A minimum set of required metadata fields
● Classes and properties for additional metadata fields not covered in DCAT
● Controlled vocabularies or URI sets as acceptable values for properties
● Requirements for specific access mechanisms (RDF syntaxes, protocols) to the catalog's RDF
description
http://www.w3.org/TR/vocab-dcat/
The DCAT WG suggested the same thingThey said there was a need for “DCAT Profiles”
A DCAT Profile is a specification for data catalogs that adds additional
constraints to DCAT. Additional constraints in a profile MAY include:
● A minimum set of required metadata fields
● Classes and properties for additional metadata fields not covered in DCAT
● Controlled vocabularies or URI sets as acceptable values for properties
● Requirements for specific access mechanisms (RDF syntaxes, protocols) to the catalog's RDF
description
http://www.w3.org/TR/vocab-dcat/
A DCAT Profile is:
A generic way to describe what metadata fields a repository has
and what the constraints on those fields are
But the DCAT WG also suggested...
A DCAT Profile is a specification for data catalogs that adds additional
constraints to DCAT. Additional constraints in a profile MAY include:
● A minimum set of required metadata fields
● Classes and properties for additional metadata fields not covered in DCAT
● Controlled vocabularies or URI sets as acceptable values for properties
● Requirements for specific access mechanisms (RDF syntaxes, protocols) to the catalog's RDF
description
DCAT Profiles don’t exist!
http://www.w3.org/TR/vocab-dcat/
“FAIR Profiles”
At the Hackathon, the “Skunkers” decided to invent the DCAT Profile technology.
Since they are intended to allow descriptions of
● Descriptor metadata fields not included in DCAT...
● ...in many cases, Descriptors with ZERO metadata fields from DCAT...
● ...and in many cases, Descriptors that are not even in RDF...
We call them “FAIR Profiles” rather than DCAT profiles
(However, clear acknowledgements to the
DCAT Working Group for conceiving of the idea!)
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
What the FAIR profile technology accomplishes
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
FAIR Profile
DCAT Schema
FAIR Profile
UniProt Metadata
Schema
FAIR Profile
DragonDB Metadata
Schema
What the FAIR profile technology accomplishes
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
FAIR Profile
DCAT Schema
FAIR Profile
UniProt Metadata
Schema
FAIR Profile
DragonDB Metadata
Schema
Though they are potentially describing very different things
(from Web FORM fields to OWL Ontologies!)
all FAIR Profiles are written using the same vocabulary and structure, defined by...
XML
Data Record
XMLS
Data Schema
DCAT RDF
Metadata Record
RDF
Data Record
RDFS
Data Schema
UniProt RDF
Metadata Record
ACEDB
Data Record
ACEDB
Data Schema
DragonDB Form
Metadata Record
DCAT
RDFS SchemaUniProt RDFS
MetadataSchema
DragonDB Form
Metadata Schema
FAIR Profile of
DCAT Schema
FAIR Profile of
UniProt Metadata
Schema
FAIR Profile of
DragonDB Metadata
Schema
The FAIR Profile
Schema
(the thing the Skunkworks team invented)
Repo. Data Record (e.g. XML, RDF)
Repo. Data Schema (e.g. XMLS, RDFS)
Repository Metadata Record
Repository Metadata Schema
Defines
Describes
Defines
Defines
Describes
Repository’s Fair Profile
Fair Profile Schema
“All problems in computer
science can be solved by
another level of indirection”-- David Wheeler
inventor of the subroutine
"...But that usually will create
another problem."-- David Wheeler
“All problems in computer
science can be solved by
another level of indirection”-- David Wheeler
inventor of the subroutine
Diomidis Spinellis. Another level of indirection. In Andy Oram and Greg Wilson, editors, Beautiful Code: Leading Programmers Explain How They Think, chapter 17, pages 279–
291. O'Reilly and Associates, Sebastopol, CA, 2007.
Desiderata for FAIR Profile Schema
● Must describe legacy data (i.e. not just DCAT or other “modern” data)
● Must describe a multitude of data formats (XML, RDF, Key/Value, etc.)
● Must be capable of describing OWL-DL-governed data (still rare, but
increasingly used… Classes, property-restrictions, etc.)
● Must be capable of describing any kind of value constraint, e.g. arbitrary CV,
rdf:range, or equivalent OWL construct
● Must be hierarchical (i.e. the value-constraint of a field can be set as an
entirely separate FAIR Profile)
● Must be modular, identifiable, shareable, and reusable (to stem the
proliferation of new formats)
● Must use standard technologies, and re-use existing vocabularies if poss.
● Must be extremely lightweight
● Must NOT require the participation of the repository host (no buy-in required)
FAIR Profile SchemaA very lightweight meta-meta-descriptor, in RDFS language
FAIR Profile FP Class FP Property
Property
Restriction
Definition
hasClass hasProperty allowed
Values
classType propertyType
External Ontology
or RDFS Class
(optional)
External Ontology
or RDFS Predicate
(optional)
http://github.com/DataFairPort/DataFairPort/blob/Master/Schema/DCATProfile.rdfs
FAIR Profile SchemaA very lightweight meta-meta-descriptor, in RDFS language
FAIR Profile FP Class FP Property
Property
Restriction
Definition
hasClass hasProperty allowed
Values
classType propertyType
External Ontology
or RDFS Class
(optional)
External Ontology
or RDFS Predicate
(optional)
Requirement Status?
Cardinality?
Other Constraint?
http://github.com/DataFairPort/DataFairPort/blob/Master/Schema/DCATProfile.rdfs
Property Restriction
Definition
(XSD, FAIR Profile, SKOS)
Describes the constraints on the possible
values for a predicate in the target-
Repository’s metadata Schema
Property Restriction
Definition
(XSD, FAIR Profile, SKOS)
Describes the constraints on the possible
values for a predicate in the target-
Repository’s metadata Schema
NOTE: we cannot use rdfs:range because
we are meta-modelling! The predicate is a
CLASS at the meta-model level, so use of
rdfs:range is not appropriate.
Property Restriction
Definition
(XSD, FAIR Profile, SKOS)
Describes the constraints on the possible
values for a predicate in the target-
Repository’s metadata Schema
The possible values are:
● An XSD Datatype
● Another DCAT Profile (i.e. hierarchical profiles)
● A SKOS View on a set of ontology terms from
one or more ontologies
A FAIR Profile (an RDF document that follows the FAIR Profile Schema)
This!
Metadata Record (e.g. DCAT-compliant RDF)
DCAT RDFS Schema
Fair Profile
Fair Profile Schema
A FAIR Profile
FAIR Profile FP Class FP Property
Property
Restriction
DefinitionhasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
A FAIR Profile
FAIR Profile FP Class FP Property
Property
Restriction
DefinitionhasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
FAIR Profiles are FAIR!
(Identifiable, Re-usable, and Shareable)
A FAIR Profile
FAIR Profile FP Class FP Property
Property
Restriction
DefinitionhasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
A FAIR Profile
FAIR Profile FP Class FP Property
Property
Restriction
Definition
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
A FAIR ProfileThe CoreMicroarrayDistributionMetadata Descriptor Class
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadataClass Descriptor
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor
The Class follows the “DCAT Distribution” Class model
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor
It uses only 3 properties from the “DCAT Distribution” Class model
FAIR Profile FP Class FP Property
hasClass hasPropertyallowed
Values
propertyType
External Class External Predicate
Property
Restriction
Definition
classType
CoreMicroarrayDistributionMetadata Descriptor: Property #1
It uses only 3 properties from the “DCAT Distribution” Class model...let’s look at one of them in detail
FAIR Profile FP Class FP Property
hasClass hasPropertyallowed
Values
propertyType
External Class External Predicate
classType
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #1
This Meta-Descriptor element is a ‘FAIR Profile Property’ Class
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #1
This is it’s label within that organizations metadata descriptor
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #1
This is the URL of the Predicate used by that descriptor
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #1
This is the “range” of that Predicate within the organizations descriptor
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
ValuesclassType
External Class External Predicate
Property
Restriction
Definition
propertyType
CoreMicroarrayDistributionMetadata Descriptor: Property #2
Let’s look at a different property from the CoreMicroarrayDistributionMetadata Class
FAIR Profile FP Class FP Property
hasClass hasPropertyallowed
Values
propertyType
External Class External Predicate
classType
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #2
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #2
This is the label for that property
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #2
The URL of the predicate of this Property
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
CoreMicroarrayDistributionMetadata Descriptor: Property #2
In the Metadata Descriptor, this property is constrained by the set of ontology terms defined in the SKOS Concept Scheme
EDAM_Microarray_Data_Format
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
ValuesclassType
External Class External Predicate
Property
Restriction
Definition
propertyType
<rdf:Description xmlns:ns1="http://www.w3.org/2002/07/owl#"
rdf:about="http://biordf.org/DataFairPort/ConceptSchemes/EDAM_Microarray_Data_Format">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/>
<ns1:imports rdf:resource="http://purl.bioontology.org/ontology/EDAM"/>
</rdf:Description>
<rdf:Description
xmlns:ns1="http://www.w3.org/2000/01/rdf-schema#"
xmlns:ns2="http://www.w3.org/2004/02/skos/core#"
rdf:about="http://edamontology.org/format_1641">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<ns1:label>affymetrix-exp</ns1:label>
<ns2:broader rdf:resource="http://edamontology.org/format_2056"/>
<ns2:inScheme rdf:resource="http://biordf.org/DataFairPort/ConceptSchemes/EDAM_Microarray_Data_Format"/>
</rdf:Description>
<rdf:Description
xmlns:ns1="http://www.w3.org/2000/01/rdf-schema#"
xmlns:ns2="http://www.w3.org/2004/02/skos/core#"
rdf:about="http://edamontology.org/format_2056">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<ns1:label>Microarray experiment data format</ns1:label>
<ns2:broader rdf:resource="http://biordf.org/DataFairPort/ConceptSchemes/EDAM_Microarray_Data_Format"/>
<ns2:inScheme rdf:resource="http://biordf.org/DataFairPort/ConceptSchemes/EDAM_Microarray_Data_Format"/>
</rdf:Description>
http://biordf.org/DataFairPort/ConceptSchemes/EDAM_Microarray_Data_Format
This is a “SKOSified” view of the EDAM Ontology
Jupp, et al., “Taking a view on bio-ontologies” ceur-ws.org/Vol-897/session4-paper22.pdf
A DCAT ProfileReturn to the very top of our FAIR Profile
Follow the ExtendedAuthorship Class
FAIR Profile FP Class FP Property
Property
Restriction
Definition
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
ExtendedAuthorship
Follow one of the properties of the ExtendedAuthorship Class
FAIR Profile FP Class FP Property
hasClass hasPropertyallowed
Values
propertyType
External Class External Predicate
classType
Property
Restriction
Definition
Author ORCID
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
Property
Restriction
Definition
Author ORCIDThe allowed values of this Property are constrained to be
individuals that follow the FAIR Profile Schema “DemoORCIDProfileScheme”
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
ValuesclassType
External Class External Predicate
Property
Restriction
Definition
propertyType
http://biordf.org/DataFairPort/ProfileSchemas/DemoORCIDProfileScheme.rdf
FAIR Profile FP Class FP Property
Property
Restriction
DefinitionhasClass hasProperty allowed
Values
classType propertyType
External Class External Predicate
http://biordf.org/DataFairPort/ProfileSchemas/DemoORCIDProfileScheme.rdf
FAIR Profile FP Class FP Property
hasClass hasProperty allowed
ValuesclassType
External Class External Predicate
propertyType
This is parsed in exactly the same way as our originalDemoMicroarrayProfileScheme, but is embedded within
it as the value of the author_ORCID property.
…Arbitrary, hierarchical layers of complexity…
FAIR Profile FP Class
hasClass hasProperty
classType
External Class
So to build an interface(e.g. query or data-capture)
from a FAIR Profile:
[1] Parse all FAIR Profile classes
Parse the properties of each class
Determine the target predicate
Determine the target value-restrictions
Call [1] if restriction is a FAIR
Profile
Create a metadata [capture/query] facet with
that
predicate and that restriction
DCAT Profile Class #1
DCAT Profile Class #2
DCAT Profile Class #3
DCAT Profile Class #4 (embedded)
Value constraints
Descriptor-specific labels associatedwith ontology predicates (if applicable)
“Classes” may be associated with an ontologyto allow reasoning, or may just represent an“arbitrary” grouping of properties within theTarget metadata descriptor
Metadata Descriptor-specific details are capturede.g. this field is required by this target Metadata Descriptor
Other features of FAIR profiles
● Do not require repository participation
● Provides a purpose-driven, potentially non-comprehensive “view” on a
repository, of which there may be many, according to what the profile
author needs to cross-query
● Profiles of any given repository facet are not required to be identical! e.g.
A different profile might utilize a different controlled vocabulary over any
given facet (e.g. a freetext facet)
● Anybody can define a profile (of course, the profile defined by the
repository owner should be considered “canonical”... the rest are just
purpose-built “best-guesses”)
● FAIR profiles can/should be indexed and shared, to facilitate cross-
repository interoperability and integration
● There is no (obvious) reason why a FAIR profile could not be used to
describe the DATA in the repository, not just the metadata...
Nothin’ ain’t worth nothin’, but it’s free!-- Kris Kristofferson
“All problems in computer
science can be solved by
another level of indirection
...But that usually will create
another problem."-- David Wheeler
Nothin’ ain’t worth nothin’, but it’s free!
The FAIR profile isn’t “a magic bean”!
It DOES NOT ACCOMPLISH SEMANTIC MAPPING
between one field in one repository, and a semantically-
related field in another repository
Nothin’ ain’t worth nothin’, but it’s free!
The FAIR profile isn’t “a magic bean”!
It does give us a standard way to identify, describe, and
meta-link these fields, and a predictable place where a
mapping mechanism could be injected.
Nothin’ ain’t worth nothin’, but it’s free!
The FAIR profile isn’t “a magic bean”!
...we don’t inject it (yet!) because that would require
invention of yet another “standard”, and we want to avoid
that if possible!
Nothin’ ain’t worth nothin’, but it’s free!
The FAIR profile isn’t “a magic bean”!
There may be some in the audience who, like me,
recognize that this problem is nearly identical to the
problem faced by the WSDL -> SAWSDL community.
I will be looking at their solution for guidance in the next
phase of FAIR Profiles...
… so we still have problems, but at least they are now
re-defined as problems for which there are solutions!
Skunkworks Participants
● Mark Wilkinson
● Michel Dumontier
● Barend Mons
● Tim Clark
● Jun Zhao
● Paolo Ciccarese
● Paul Groth
● Erik van Mulligen
● Luiz Olavo Bonino da
Silva Santos
● Matthew Gamble
● Carole Goble
● Joël Kuiper
● Morris Swertz
● Erik Schultes
● Erik Schultes
● Mercè Crosas
● Adrian Garcia
● Barend Mons
● Philip Durbin
● Jeffrey Grethe
● Katy Wolstencroft
● Sudeshna Das
● M. Emily Merrill
Post-presentation comments
We should look at ISO 11179 -> are we
duplicating those efforts or are we creating
something that is an implementation of those
efforts?
See also Dublin Core’s similar initiative.