Bringing semantic publishing into TEI: ideas and pointers

14
Bringing semantic publishing into TEI ideas and pointers Silvio Peroni Fabio Vitali Department of Computer Science and Engineering University of Bologna Italy

Transcript of Bringing semantic publishing into TEI: ideas and pointers

Page 1: Bringing semantic publishing into TEI: ideas and pointers

Bringing semantic publishing into TEI

ideas and pointers

Silvio Peroni Fabio Vitali

Department of Computer Science and Engineering University of Bologna

Italy

Page 2: Bringing semantic publishing into TEI: ideas and pointers

Outline

•  Semantic publishing •  SPAR ontologies and semantic lenses •  TEI and EARMARK

Page 3: Bringing semantic publishing into TEI: ideas and pointers

Semantic Web / Open Linked Data

Yet another definition of Semantic Web: The evolution of the World Wide Web encompassing the integration of the WWW with formal semantics to:

Yet another definition of Open Linked Data: The incremental implementation of many layers of semantics of data released to the Commons: •  Structured and semi-structured data •  Abstraction and conceptualisation of data •  Inferences on data

•  enable visualisation and elaboration of complex data

•  provide languages (e.g., OWL) to formalise the meaning of data (e.g., using description logics)

Page 4: Bringing semantic publishing into TEI: ideas and pointers

Semantic publishing

« anything that •  enhances the meaning of a published journal article, •  facilitates its automated discovery, •  enables its linking to semantically related articles, •  provides access to data within the article in actionable form, or •  facilitates integration of data between papers. Among other things, it involves enriching the article with appropriate metadata that •  are amenable to automated processing and analysis, •  allowing enhanced verifiability of published information and •  providing the capacity for automated discovery and summarization »

Shotton, D. (2009). Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing, 22(2): 85–94. DOI: 10.1087/2009202

Page 5: Bringing semantic publishing into TEI: ideas and pointers

Why Semantic Publishing?

•  Increase the intrinsic value of publications, •  Increase the richness of information, understanding

and knowledge that can be extracted from publications;

•  Enable the development of additional services •  Integrate information from multiple enhanced articles, •  Provide additional business opportunities for the

publishers

Page 6: Bringing semantic publishing into TEI: ideas and pointers

Goals of semantic publishing

•  Evaluating the pertinence of a document to a scientific field •  Discovering research trends and propagation of research findings •  Tracking of research activities, institutions and disciplines •  Analysing quantitative aspects of the output of researchers •  Evaluating the multi-disciplinarity of the output of scholars •  Measuring positive/negative citations to a particular work •  Designing and including algorithms to compute metrics indicators •  Helping final users to find related materials to a topic and/or article •  Evaluating the social acceptability of the scientific production •  Enabling users to annotate documents with related semantic data •  Querying (semantic) bibliographic data

Page 7: Bringing semantic publishing into TEI: ideas and pointers

SPAR •  One of the most complete

set of ontologies to describe scholarly objects

•  It uses: –  Common vocabulary of

terms –  External metadata

schemas (SKOS, PRISM, DC)

–  FRBR concepts to distinguish between work, version, edition and copy

–  Document components –  Roles of people, status of

documents and publishing workflows

–  Citations, citation contexts, reference lists

Page 8: Bringing semantic publishing into TEI: ideas and pointers

Semantic lenses •  Particular points of

view on scholarly entities

•  Contextual data: –  Research context –  Roles and contribution –  Publishing context

•  Content data: –  Text:

•  Text structure •  Rhetoric

–  Message: •  Argumentation •  Citation network •  Textual semantics

Page 9: Bringing semantic publishing into TEI: ideas and pointers

An example The Tempest by William Shakespeare

as available in the Oxford Text Archive

:work a fabio:Play ; frbr:realization :expression ;! dcterms:creator [ a foaf:Person ; foaf:name “William Shakespeare” ] .!!:expression a fabio:Book ; frbr:embodiment :manifestation .!!:manifestation a fabio:DigitalManifestation ; frbr:exemplar :item ;! dcterms:format [ a dcterms:MediaType ; dcterms:description “application/tei+xml”] ;! dcterms:publisher [ a foaf:Organization ; foaf:name “OUCS” ] ; ! !:item a fabio:ComputerFile ; fabio:storedOn fabio:web .!

Closed view

dbpedia:The_Tempest a fabio:Play ; frbr:realization <http://ota.ox.ac.uk/id/5725> ;! dcterms:creator dbpedia:William_Shakespeare .!!<http://ota.ox.ac.uk/id/5725> a fabio:Book ; ! frbr:embodiment <http://ota.ox.ac.uk/text/5725/xml> .!!<http://ota.ox.ac.uk/text/5725/xml> a fabio:DigitalManifestation ; ! frbr:exemplar <http://ota.ox.ac.uk/text/5725.xml> ; dcterms:format application:tei+xml ; dcterms:publisher dbpedia:Oxford_University_Computing_Services .!!<http://ota.ox.ac.uk/text/5725.xml> a fabio:ComputerFile ; fabio:storedOn fabio:web .!

Open (Linked Data) View

Page 10: Bringing semantic publishing into TEI: ideas and pointers

Annotating the content <body> ! ...! <sp> ! <speaker rend="italic">Ari.</speaker>! <ab>! All haile, great Master, graue Sir, haile: I come<lb n="301"/>! To answer thy best pleasure; be’t to fly,<lb n="302"/>! To swim, to diue into the fire: to ride<lb n="303"/>! On the curld clowds: to thy strong bidding,taske<lb n="304"/>! <hi rend="italic">Ariel,</hi> and all his Qualitie.<lb n="305"/>! </ab>! </sp>! <sp> ! <speaker rend="italic">Pro.</speaker>! <ab>! Hast thou, Spirit,<lb n="306"/> ! Performd to point, the Tempest that I ! <seg type="homograph">bad</seg> thee.<lb n="307"/>! </ab>! </sp>! ... !</body>!

“Ari.”, “Ariel”, “Spirit” refer to the same entity “Master.”, “Pro.” refer to the same entity

Both are defined in DBPedia! How can I annotate such an XML document without having permission to modify it?

Page 11: Bringing semantic publishing into TEI: ideas and pointers

•  The Extremely Annotational RDF

Markup, a.k.a. EARMARK, is an OWL 2 DL ontology that defines document meta-markup

•  It is an ontologically precise definition of markup that instantiates the markup of a text document as an independent OWL document outside of the text strings it annotates

•  It can define structures such as trees or graphs (i.e. overlapping markup) and can be used to generate validity constraints (including co-constraints currently unavailable in most validation languages)

•  Using the Linguistic Meta-Model, it becomes possible to express and assess facts, constraints and rules about the markup structure as well as about the semantics of the content of the document

URIDocuverse to define the whole textual content of the document to annotate – in this case the Oxford Text Archive TEI version of the play The Tempest, available at a particular URL PointerRange to define textual ranges upon it LinguisticAct to represent annotations made on ranges by someone at a certain time

Page 12: Bringing semantic publishing into TEI: ideas and pointers

Multiple interpretations <ab>! All haile, great Master, graue Sir, haile: I come<lb n="301"/>! ...!</ab>!

# The textual content of the document to annotate !:content a earmark:URIDocuverse ;! earmark:hasContent "http://ota.ox.ac.uk/text/5725.xml"^^xsd:anyURI .!

# The string "Master"!:master-string a earmark:PointerRange ;! earmark:refersTo :content ;! earmark:begins "34023"^^xsd:nonNegativeInteger ; ! earmark:ends "34029"^^xsd:nonNegativeInteger .!

# Silvio’s interpretation!:prospero-as-person a la:LinguisticAct ;! la:hasInformationEntity :master-string ; ! la:hasReference dbpedia:Prospero ; ! la:hasMeaning foaf:Person ; ! prov:wasAttributedTo :silvio ; ! prov:generatedAtTime! "2013-06-18T17:23:23Z"^^xsd:dateTime .!

# Fabio’s interpretation!:prospero-as-character a la:LinguisticAct ;! la:hasInformationEntity :master-string ; ! la:hasReference dbpedia:Prospero ; ! la:hasMeaning yago:ShakespeareanCharacters ;! prov:wasAttributedTo :fabio; ! prov:generatedAtTime! "2013-07-23T17:45:23Z"^^xsd:dateTime .!

Page 13: Bringing semantic publishing into TEI: ideas and pointers

Conclusions

•  Semantic Publishing is a natural and inevitable evolution of the technological advances of the publishing industry

•  Shared ontologies are the only way to provide interoperability of data between publishers

•  SPAR and Earmark do provide interesting contact points between metadata hidden in XML vocabularies and shared publishing ontologies

•  TEI, which is orthogonal to these languages, can and should work well with them.

Page 14: Bringing semantic publishing into TEI: ideas and pointers

Thank you for your attention Emails:

[email protected] [email protected]