KIT Graduiertenkolloquium 11.05.2016
-
Upload
dr-ing-thomas-hartmann -
Category
Technology
-
view
265 -
download
0
Transcript of KIT Graduiertenkolloquium 11.05.2016
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu
Validation Frameworkfor RDF-based Constraint Languages
M.Sc. (TUM) Thomas Hartmann
Graduiertenkolloquium, 11.05.2016
2
enthusiasm for SW technologies
problem statement
3
common need for RDF Validation
problem statement
4
common needs of data practitioners
W3C RDF Validation Workshop
2 international working groups on RDF validation
constraint languagesSPARQL Query Language for RDF
SPARQL Inferencing Notation (SPIN)
Web Ontology Language (OWL)
Shape Expressions (ShEx)
Resource Shapes (ReSh)
Description Set Profiles (DSP)
no clear favorite
RDF validation as research field
problem statement
5
Which types of research data and related metadata
are not yet representable in RDF and
how to adequately model them
to be able to validate RDF data
against constraints extractable from these vocabularies?
research question 1
RQ1
LDOW (WWW 2013)SemStats (ISWC 2013)DC 2012IASSIST Quarterly, 38(4) & 39(1), 7-16IASSIST Quarterly, 38(4) & 39(1), 17-24IASSIST Quarterly, 38(4) & 39(1), 25-37IASSIST Quarterly, 38(4) & 39(1), 38-46ESWC 2011 (Poster)
6
development of 3 RDF vocabularies:
1. DDI-RDF Discovery Vocabulary (DDI-RDF)
to support the discovery of metadata on unit-record data
2. Physical Data Description (PHDD)
to describe data in tabular format and its physical properties
3. The SKOS Extension for Statistics (XKOS)
to describe the structure and textual properties of formal statistical classifications
to describe relations between classifications and concepts and among concepts
contribution 1
RQ1
www.kit.edu7
research question 2
8
XML, XML Schema (XSD)
RDF, Web Ontology Language (OWL)
XML Schemas > OWL ontologies
time-consuming work designing domain ontologies from scratch by hand
reuse information contained in XML Schemas
designing OWL domain ontologies
RQ2
9
How to directly validate XML data
on semantically rich OWL axioms
using common RDF validation tools
when XML Schemas, adequately representing particular domains,
have already been designed?
research question 2
RQ2
10
sub-class relationships
OWL hasValue restrictions on data properties
OWL universal restrictions on object properties
semantically rich OWL axioms
<library>
<book year="February 1890">
<author>
<name>Arthur Conan Doyle</name>
</author>
<title>The Sign of the Four</title>
</book>
</library>
Title ⊑ value.string
Year ⊑ value.integer
RQ2
11
on formal logics based transformations
OWL axioms extracted out of XML Schemas
Explicitly
Implicitly
formally underpin transformations
to formally define and model semantics in a semantically correct way
complete extraction of XML Schemas' structural information
XML can directly be validated against semantically rich OWL axioms
any XML Schema is convertible to OWL
minimized effort designing OWL domain ontologies
contributions
IJMSO, 8(3)
RQ2
12
DC (ISWC 2012)ICITST 2011
OCAS (ISWC 2011)
RQ2
13
1. step of approach
executed generic test cases created out of the XML Schema meta-model
transformed XML Schemas of 6 XML standards
2. step of approach
specified SWRL rules for 3 OWL domain ontologies
verified hypothesis
determined effort for traditional manual approach
estimated effort for semi-automatic approach
DDI-RDF serves as OWL domain ontology
The effort and the time needed to deliver high quality domain ontologies from scratch
by reusing information of already existing XML Schemas is much less than
creating domain ontologies completely manually and from the ground up.
evaluation
IJMSO, 8(3)
RQ2
www.kit.edu14
research question 3
15
development of constraint languages
http://purl.org/net/rdf-validation
DC 2014RQ3
16
Which types of constraints
must be expressible by constraint languages to meet
all collaboratively and comprehensively identified requirements
to formulate constraints and validate RDF data?
research question 3
RQ3
17
published 81 constraint types
constraints are instantiated from constraint types
each constraint type corresponds to a specific requirement
types of constraints on RDF data
RQ3
18
expressivity of constraint languages
low-level implementation languages vs. high-level constraint languages
OWL 2 is the most expressive high-level constraint language
RQ3
19
high-level constraint languages either
lack an implementation or
are based on different implementations
How to consistently validate RDF data
against constraints of any constraint type
expressed in any RDF-based constraint language?
research question 4-1
RQ4
20
SPIN as basic validation framework
validation environment for RDF-based constraint languages
constraint languages are translated into SPARQL
represented in RDF in form of a SPIN mapping
a SPIN mapping contains one SPIN construct templatefor each supported constraint type
consistent validation across RDF-based constraint languages
DC 2014
RQ4
21
validation process
RQ4
22
validation results
CONSTRUCT {
_:constraintViolation
a spin:ConstraintViolation ;
spin:violationRoot ?subject ;
rdfs:label ?violationMessage ;
spin:violationSource ?violationSource ;
:severityLevel ?severityLevel ;
spin:violationPath ?violationPath ;
spin:fix ?violationFix }
RQ4
23
full implementations forall OWL 2 and DSP language constructs
all constraint types expressible in OWL 2 and DSP
major constraint types representable by ShEx and ReSh
validation environment
http://purl.org/net/rdfval-demo
RQ4
24
constraints and constraint language constructs must be representable in RDF
constraint languages and supported constraint types must be expressible in SPARQL
limitations
RQ4
25
How to represent constraints of any constraint type and
how to reduce the representation of
constraints of any constraint type
to the absolute minimum?
research question 4-2
RQ4
26
abstraction layer
enables to express each constraint type
straight-forward mappings from high-level constraint languages
based on formal logics
validation framework for RDF-based constraint languages
RQ4
27
conceptual model
DC 2015
RQ4
75%
28
minimum qualified cardinality restrictions (R-75)
OWL:
SHACL:
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:minQualifiedCardinality 1 ;
owl:onProperty :author ;
owl:onClass :Person ] .
:PublicationShape
a sh:Shape ;
sh:scopeClass :Publication ;
sh:property [
sh:predicate :author ;
sh:valueShape :PersonShape ;
sh:minCount 1 ; ] .
:PersonShape
a sh:Shape ;
sh:scopeClass :Person .
RQ4
29
ShEx:
ReSh:
DSP:
:Publication { :author @:Person{1, } }
:Publication a rs:ResourceShape ; rs:property [
rs:propertyDefinition :author ;
rs:valueShape :Person ;
rs:occurs rs:One-or-many ; ] .
[ dsp:resourceClass :Publication ; dsp:statementTemplate [
dsp:minOccur 1 ;
dsp:property :author ;
dsp:nonLiteralConstraint [ dsp:valueClass :Person ] ] ] .
RQ4
minimum qualified cardinality restrictions (R-75)
30
SPARQL and SPIN:
CONSTRUCT { [ a spin:ConstraintViolation ... . ] } WHERE {
?subject
a ?C1 ;
?predicate ?object .
BIND ( qualifiedCardinality( ?subject, ?predicate, ?C2 ) AS ?c ) .
BIND( STRDT ( STR ( ?c ), xsd:nonNegativeInteger ) AS ?cardinality ) .
FILTER ( ?cardinality < ?minimumCardinality ) .
FILTER ( ?minimumCardinality = 1 ) .
FILTER ( ?C1 = :Publication ) .
FILTER ( ?C2 = :Person ) .
FILTER ( ?predicate = :author ) . }
SELECT ( COUNT ( ?arg1 ) AS ?c )
WHERE { ?arg1 ?arg2 ?object . ?object a ?arg3 . }
RQ4
minimum qualified cardinality restrictions (R-75)
31
minimum qualified cardinality restrictions (R-75):
simple constraints
RQ4
[ a rdfcv:SimpleConstraint ;
rdfcv:contextClass :Publication ;
rdfcv:leftProperties ( :author ) ;
rdfcv:classes ( :Person ) ;
rdfcv:constrainingElement "minimum cardinality" ;
rdfcv:constrainingValue "1" ] .
32
framework is solely based on the abstract definitions of constraint types
just 1 SPIN mapping for each constraint type
How to ensure for any constraint type that
RDF data is consistently validated against
semantically equivalent constraints of the same constraint type
across RDF-based constraint languages?
research question 4-3
RQ4
33
mappings from constraint languages to the abstraction layer and back enable…
How to ensure for any constraint type that
semantically equivalent constraints of the same constraint type
can be transformed
from one RDF-based constraint language to another?
RQ4
research question 4-4
34
What is the role reasoning plays in practical data validation?
research question 5-1
RQ5
SEMANTiCS 2015
35
reasoning solves redundency
Publication ⊑ ∃ publicationDate . xsd:date
Book ⊑ Publication
Conference-Proceeding ⊑ Publication
Journal-Article ⊑ Publication
RQ5
36
For which constraint types reasoning may be performed
prior to validation to enhance data quality?
research question 5-2
RQ5
37
> 2/5 of constraint types
property domains (R-25):
constraint types with reasoning
∃ author.⊤ ⊑ Publication
author(Alices-Adventures-In-Wonderland, Lewis-Carroll)
→ rdf:type(Alices-Adventures-In-Wonderland, Publication)
RQ5
38
For which constraint types validation results differ
(1) if the CWA or the OWA and
(2) if the UNA or the nUNA is assumed?
CWA dependent: 56.8%
UNA dependent: 66.6%
research question 5-3
RQ5
39
expressivity of constraint languages
RQ5
40
collected 115 constraints
from vocabularies or domain experts
on 3 common vocabularies
well-established (QB, SKOS)
under development (DDI-RDF)
classified constraints
implemented constraints
evaluation
evaluation
ICSC 2016
33 SPARQL endpoints
41
classification of constraint types
RDFS/OWL based
constraint language based
SPARQL based
classification of constraints
informational
warning
error
evaluation
classification
42
C (constraints), CV (constraint violations)
values in %
evaluation
main finding
C CV
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
43
evaluation based on 3 vocabularies
evaluation
limitation
44
RQ1: future work
publication of RDF vocabularies
DDI Alliance specifications
W3C recommendation for DDI-RDF
DDI-Lifecycle MD (Model-Driven)
new requirements based on experiences with DDI-RDF
international working group: DDI Moving Forward Project
individual contributions
formalize conceptual model (using UML 2)
conceptualize and implement diverse model serializations (e.g., RDFS/OWL)
future work
45
aligning PHDD and CSV on the WEB
overlap in the description of tabular data in CSV format
broader scope of PHDD
description of tabular data with fixed record length
description of tabular data with multiple records per case
evaluation for use in DDI-Lifecycle MD
RQ1: future work
future work
46
RQ2: future work
bidirectional transformations from models of any meta-model to OWL
generalize from XSD meta-model based unidirectional transformations from XSD models into OWL models
enable to validate any data against constraints extractable from models of any meta-model using common RDF validation tools
future work
47
RQ3: future work
maintain and extend RDF validation database
collect case studies and use cases
extract requirements
publish constraint types
future work
48
RQ4: future work
SPIN mappings for constraint languages not expressible in SPARQL
keep framework and constraining elements in sync
combine the framework with SHACL
derive SHACL extensions with SPARQL bodies
define mappings from SHACL to the abstraction layer and back
synchronize consistent implementations of constraint types
future work
49
acknowledgements, publications, research data
29 publications5 journal articles, 9 conference articles, 3 workshop articles, 2 specifications, 10 technical reports
1. author of all (except 1) journal articles, conference articles, workshop articles
research dataKIT research data repository: http://dx.doi.org/10.5445/BWDD/11
GitHub repository: https://github.com/github-thomas-hartmann/phd-thesis
international working groups
DCMI RDF Application Profiles Task Group
part of the editorial board
RDF Vocabularies Working Group
editor for DDI-RDF and PHDD
W3C RDF Data Shapes Working Group
DDI Moving Forward Project
50
outlook and summary of main contributions
provide a basis for continued research
incorporate findings of this thesis into the working groups
RDF vocabularies
RDFication of XML
set of constraint types
validation framework for RDF-based constraint languages
role of reasoning for data validation
THANK YOU!
www.kit.edu51
appendix
52
publications: journal articles
1. Bosch, Thomas & Mathiak, B. (2015). Use Cases Related to an Ontology of the Data Documentation Initiative. IASSIST Quarterly, 38(4) & 39(1), 25–37. http://iassistdata.org/iq/issue/38/4
2. Bosch, Thomas, Olsson, O., Gregory, A., & Wackerow, J. (2015c). DDI-RDF Discovery - A Discovery Model for Microdata. IASSIST Quarterly, 38(4) & 39(1), 17–24. http://iassistdata.org/iq/issue/38/4
3. Bosch, Thomas & Zapilko, B. (2015). Semantic Web Applications for the Social Sciences. IASSIST Quarterly, 38(4) & 39(1), 7–16. http://iassistdata.org/iq/issue/38/4
4. Schaible, J., Zapilko, B., Bosch, Thomas, & Zenk-Möltgen, W. (2015). Linking Study Descriptions to the Linked Open Data Cloud. IASSIST Quarterly, 38(4) & 39(1), 38–46. http://iassistdata.org/iq/issue/38/4
5. Bosch, Thomas & Mathiak, B. (2013b). How to Accelerate the Process of Designing Domain Ontologies based on XML Schemas. International Journal of Metadata, Semantics and Ontologies - Special Issue on Metadata, Semantics and Ontologies for Web Intelligence, 8(3), 254 – 266. http://www.inderscience.com/info/inarticle.php?artid=57760
Please note that in 2015, my last name changed from Bosch to Hartmann.
53
publications: articles in conference proceedings
1. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2016). Validating RDF Data Quality using Constraints to Direct the Development of Constraint Languages. In Proceedings of the 10th International Conference on Semantic Computing (ICSC 2016) Laguna Hills, California, USA: IEEE. http://www.ieee-icsc.com/
2. Bosch, Thomas & Eckert, K. (2015). Guidance, Please! Towards a Framework for RDF-based Constraint Languages. In Proceedings of the 15th DCMI International Conference on Dublin Core and Metadata Applications (DC 2015) São Paulo, Brazil. http://dcevents.dublincore.org/IntConf/dc-2015/paper/view/386/368
3. Bosch, Thomas, Acar, E., Nolle, A., & Eckert, K. (2015a). The Role of Reasoning for RDF Validation. In Proceedings of the 11th International Conference on Semantic Systems (SEMANTiCS 2015) (pp. 33–40). Vienna, Austria: ACM. http://doi.acm.org/10.1145/2814864.2814867
4. Bosch, Thomas & Eckert, K. (2014a). Requirements on RDF Constraint Formulation and Validation. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/257
5. Bosch, Thomas & Eckert, K. (2014b). Towards Description Set Profiles for RDF using SPARQL as Intermediate Language. In Proceedings of the 14th DCMI International Conference on Dublin Core and Metadata Applications (DC 2014) Austin, Texas, USA. http://dcevents.dublincore.org/IntConf/dc-2014/paper/view/270
Please note that in 2015, my last name changed from Bosch to Hartmann.
54
publications: articles in conference proceedings
6. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2012). Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences. In Proceedings of the 12th DCMI International Conference on Dublin Core and Metadata Applications (DC 2012) Kuching, Sarawak, Malaysia. http://dcpapers.dublincore.org/pubs/article/view/3654
7. Bosch, Thomas (2012). Reusing XML Schemas’ Information as a Foundation for Designing Domain Ontologies. In P. Cudré-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. Parreira, J. Hendler, G. Schreiber, A. Bernstein, & E. Blomqvist (Eds.), The Semantic Web - ISWC 2012, volume 7650 of Lecture Notes in Computer Science (pp. 437–440). Springer Berlin Heidelberg. http://dx.doi.org/10.1007/978-3-642-35173-0_34
8. Bosch, Thomas & Mathiak, B. (2012). XSLT Transformation Generating OWL Ontologies Automatically Based on XML Schemas. In Proceedings of the 6th International Conference for Internet Technology and Secured Transactions (ICITST 2011), IEEE Xplore Digital Library (pp. 660–667). Abu Dhabi, United Arab Emirates. http://edas.info/web/icitst2011/program.html
9. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2011). Designing an Ontology for the Data Documentation Initiative. In Proceedings of the 8th Extended Semantic Web Conference (ESWC 2011), Poster-Session Heraklion, Greece. http://www.eswc2011.org/content/accepted-posters.html
Please note that in 2015, my last name changed from Bosch to Hartmann.
55
publications: articles in workshop proceedings
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Gregory, A., & Wackerow, J. (2013a). DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data. In Proceedings of the 6th Workshop on Linked Data on the Web (LDOW 2013), 22nd International World Wide Web Conference (WWW 2013), volume 996 Rio de Janeiro, Brazil. http://ceur-ws.org/Vol-996/
2. Bosch, Thomas, Zapilko, B., Wackerow, J., & Gregory, A. (2013b). Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases. In Proceedings of the 1st International Workshop on Semantic Statistics (SemStats 2013), 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia. http://semstats.github.io/2013/proceedings
3. Bosch, Thomas & Mathiak, B. (2011). Generic Multilevel Approach Designing Domain Ontologies Based on XML Schemas. In Proceedings of the 1st Workshop Ontologies Come of Age in the Semantic Web (OCAS 2011), 10th International Semantic Web Conference (ISWC 2011) (pp. 1–12). Bonn, Germany. http://ceur-ws.org/Vol-809/
56
publications: specifications
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Bosch, Thomas, Cyganiak, R., Wackerow, J., & Zapilko, B. (2016). DDI-RDF Discovery Vocabulary: A Vocabulary for Publishing Metadata about Data Sets (Research and Survey Data) into the Web of Linked Data. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/discovery
2. Wackerow, J., Hoyle, L., & Bosch, Thomas (2016). Physical Data Description. DDI Alliance Specification, DDI Alliance. http://rdf-vocabulary.ddialliance.org/phdd.html
57
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
1. Hartmann, Thomas (2016a). Validation Framework for RDF-based Constraint Languages - PhD Thesis Appendix. Karlsruhe Institute of Technology (KIT), Karlsruhe. http://dx.doi.org/10.5445/IR/1000054062
2. Vompras, J., Gregory, A., Bosch, Thomas, & Wackerow, J. (2015). Scenarios for the DDI-RDF Discovery Vocabulary. DDI Working Paper Series. http://dx.doi.org/10.3886/DDISemanticWeb02
3. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015b). Report on Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/Requirements
4. Alonen, M., Bosch, Thomas, Charles, V., Clayphan, R., Coyle, K., Dröge, E., Isaac, A., Matienzo, M., Pohl, A., Rühle, S., & Svensson, L. (2015a). Report on the Current State: Use Cases and Validation Requirements. DCMI Draft, Dublin Core Metadata Initiative (DCMI). http://wiki.dublincore.org/index.php/RDF_Application_Profiles/UCR_Deliverable
5. Bosch, Thomas, Nolle, A., Acar, E., & Eckert, K. (2015b). RDF Validation Requirements - Evaluation and Logical Underpinning. Computing Research Repository (CoRR), abs/1501.03933. http://arxiv.org/abs/1501.03933
58
publications: technical reports
Please note that in 2015, my last name changed from Bosch to Hartmann.
6. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015a). Constraints to Validate RDF Data Quality on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04479. http://arxiv.org/abs/1504.04479
7. Hartmann, Thomas, Zapilko, B., Wackerow, J., & Eckert, K. (2015b). Evaluating the Quality of RDF Data Sets on Common Vocabularies in the Social, Behavioral, and Economic Sciences. Computing Research Repository (CoRR), abs/1504.04478. http://arxiv.org/abs/1504.04478
8. Bosch, Thomas, Wira-Alam, A., & Mathiak, B. (2014). Designing an Ontology for the Data Documentation Initiative. Computing Research Repository (CoRR), abs/1402.3470. http://arxiv.org/abs/1402.3470
9. Bosch, Thomas & Mathiak, B. (2013a). Evaluation of a Generic Approach for Designing Domain Ontologies Based on XML Schemas. Gesis Technical Report 08, Gesis - Leibniz Institute for the Social Sciences, Mannheim, Germany. http://www.gesis.org/publikationen/archiv/gesis-technical-reports/
10. Block, W., Bosch, Thomas, Fitzpatrick, B., Gillman, D., Greenfield, J., Gregory, A., Hebing, M., Hoyle, L., Humphrey, C., Johnson, J., Linnerud, J., Mathiak, B., McEachern, S., Radler, B., Risnes, Ø., Smith, D., Thomas, W., Wackerow, J., Wegener, D., & Zenk-Möltgen, W. (2012). Developing a Model-Driven DDI Specification. DDI Working Paper Series
59
research questions
1. Which types of research data and related metadata are not yet representable in RDF and how to adequately model them to be able to validate RDF data against constraints extractable from these vocabularies?
2. How to directly validate XML data on semantically rich OWL axioms using common RDF validation tools when XML Schemas, adequately representing particular domains, have already been designed?
3. Which types of constraints must be expressible by constraint languages to meet all collaboratively and comprehensively identified requirements to formulate constraints and validate RDF data?
4. How to ensure for any constraint type that (1) RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages and (2) semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another?
5. What is the role reasoning plays in practical data validation and for which constraint types reasoning may be performed prior to validation to enhance data quality?
appendix
60
summary of contributions
1. Development of three RDF vocabularies (1) to represent all types of research data and related metadata in RDF and (2) to validate RDF data against constraints extractable from these vocabularies
2. Direct validation of XML data using common RDF validation tools against semantically rich OWL axioms extracted from XML Schemas properly describing certain domains
3. Publication of 81 types of constraints that must be expressible by constraint languages to meet all jointly and extensively identified requirements to formulate constraints and validate RDF data against constraints
4.1 Consistent validation across RDF-based constraint languages
4.2 Minimal representation of constraints of any type
4.3 For any constraint type, RDF data is consistently validated against semantically equivalent constraints of the same constraint type across RDF-based constraint languages
4.4 For any constraint type, semantically equivalent constraints of the same constraint type can be transformed from one RDF-based constraint language to another
5. We delineate the role reasoning plays in practical data validation and investigated for each constraint type (1) if reasoning may be performed prior to validation to enhance data quality, (2) how efficient in terms of runtime validation is performed with and without reasoning, and (3) if validation results depend on different underlying semantics
6. Evaluation of the Usability of Constraint Types for Assessing RDF Data Quality
appendix
61
summary of limitations
1. XML Schemas must adequately represent particular domains in a syntactically and semantically correct way
2. Constraints of supported constraint types must be representable in RDF
3. Constraint languages and supported constraint types must be expressible in SPARQL
4. The generality of the findings of the large-scale evaluation has to be proved for all vocabularies
appendix