Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID...

23
Metadata for the OpenCitations Corpus Version 1.5.2, July 7, 2016 Publication date for this document: July 7, 2016 Version number of this document: 1.5.2 Previous version v1.5.1, published June 28, 2016 Authors Silvio Peroni University of Bologna, Italy [email protected] http://orcid.org/0000-0003-0530-4305 David Shotton University of Oxford, UK [email protected] http://orcid.org/0000-0001-5506-523X License This document is published under a Creative Commons Attribution 4.0 International license 1 . Citation Silvio Peroni, David Shotton (2016). Metadata for the OpenCitations Corpus. figshare. https://dx.doi.org/10.6084/m9.figshare.3443876 The Open Citations Corpus The Open Citations Corpus (herewithin abbreviated “the corpus” or “OCC”) is an open access corpus of scholarly citation data, namely information about the author-created bibliographic references present in publications that cite other publications. The Open Citations project has a persistent URL at w3id.org, https://w3id.org/oc, which resolves to our OCC server at http://opencitations.net. The OCC stores metadata relevant to these citations in RDF, specifically BibJSON 2 encoded as JSON-LD 3 and makes them available through a SPARQL endpoint and as downloadable datasets. RDF resources in the Open Citations Corpus Kinds of metadata The OCC stores three levels of metadata: Corpus metadata Bibliographic entity metadata Provenance metadata Within the corpus, different classes of information (different types of entity) are identified and described using unique names and accompanying two-letter abbreviations (“short names”), for example Bibliographic resource (short: br). 1 https://creativecommons.org/licenses/by/4.0/legalcode 2 http://okfnlabs.org/bibjson/ 3 https://www.w3.org/TR/json-ld/

Transcript of Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID...

Page 1: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

MetadatafortheOpenCitationsCorpus

Version1.5.2,July7,2016Publicationdateforthisdocument:July7,2016

Versionnumberofthisdocument:1.5.2

Previousversionv1.5.1,publishedJune28,2016

AuthorsSilvioPeroni UniversityofBologna,Italy [email protected]

http://orcid.org/0000-0003-0530-4305

DavidShotton UniversityofOxford,UK [email protected]://orcid.org/0000-0001-5506-523X

LicenseThisdocumentispublishedunderaCreativeCommonsAttribution4.0Internationallicense1.CitationSilvioPeroni,DavidShotton(2016).MetadatafortheOpenCitationsCorpus.figshare.https://dx.doi.org/10.6084/m9.figshare.3443876

TheOpenCitationsCorpusTheOpenCitationsCorpus(herewithinabbreviated“thecorpus”or“OCC”)isanopenaccesscorpusofscholarlycitationdata,namelyinformationabouttheauthor-createdbibliographicreferencespresentinpublicationsthatciteotherpublications.TheOpenCitationsprojecthasa persistent URL at w3id.org, https://w3id.org/oc, which resolves to our OCC server athttp://opencitations.net. The OCC stores metadata relevant to these citations in RDF,specifically BibJSON2encoded as JSON-LD3and makes them available through a SPARQLendpointandasdownloadabledatasets.

RDFresourcesintheOpenCitationsCorpusKindsofmetadataTheOCCstoresthreelevelsofmetadata:

• Corpusmetadata• Bibliographicentitymetadata• Provenancemetadata

Within the corpus, different classes of information (different types of entity) are identifiedand described using unique names and accompanying two-letter abbreviations (“shortnames”),forexampleBibliographicresource(short:br).

1https://creativecommons.org/licenses/by/4.0/legalcode2http://okfnlabs.org/bibjson/3https://www.w3.org/TR/json-ld/

Page 2: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

CorpusmetadataThe Open Citations Corpus is itself a dataset, as are the contents of the individual entityclasseswithinit,forexamplealltheentrieswithintheclassBibliographicresource(short:br). Thesedatasetsaredescribedappropriatelybymeansofstandardvocabularies,suchastheDataCatalogVocabulary4and theVoIDVocabulary5. Such datasets can have particulardistributions.

• Dataset (short: the related entity short name if appropriate, e.g. br for all thebibliographicresources);noneincaseofthemainOCCdataset,i.e.thecorpusitself):asetofcollectedinformationaboutsomething.

• Distribution (short: di): an accessible form of an OCC dataset, for example adownloadablefile.

BibliographicentitymetadataThefollowingOCCbibliographicentities(short:en)arehandledasproperRDFresources:

• Bibliographic resource (short: br): a bibliographic resource that cites/is cited byanotherbibliographicresource.Subclasses(extractedfromCrossRef6Types7)include:

o Booko Bookchaptero Bookparto Booksectiono Bookserieso Bookseto Booktracko Componento Dataseto Dissertationo Editedbooko Journalarticleo JournalIssueo JournalVolumeo Journalo Monographo Proceedingsarticleo Proceedingso Referencebooko Referenceentryo Reportserieso Reporto Standardserieso Standard

Thoseinitalicsreferstoresourcesthatcanalsobetreatedascontainerresources,i.e.thosethatmaycontainanothercitedresource(e.g.ajournalcontainingacitedarticle,

4http://www.w3.org/TR/vocab-dcat/5http://www.w3.org/TR/void/6http://crossref.org/7http://api.crossref.org/types

Page 3: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

a book containing a cited chapter). Using the Functional Requirements forBibliographicRecords(FRBR)8distinctionbetweenworks,expressions,manifestationsand items, these bibliographic resources are expressions of works, that may bemanifestedinphysical(e.g.printedpaper)orelectronicform.

• Resourceembodiment(short:re):theparticularphysicalordigitalformatinwhichabibliographicresourcewasmadeavailablebyitspublisher.Subclasses:

o Digitalembodimento Printembodiment

• Bibliographic entry (short: be): the particular textual bibliographic entry (“areference”) occurring in the reference list (or elsewhere) within a bibliographicresource,thatreferencesanotherbibliographicresource.

• Responsible agent (short: ra): the agent having a certain role with respect to abibliographicresource(e.g.anauthorofapaperorbook,oraneditorofajournal).

• Agent role (short: ar): a particular role held by an agent with respect to abibliographicresource.

IdentifiersforbibliographicentitiesAlltheaforementionedbibliographicentitiesmusthaveacorpusidentifier:

• Thecorpus identifierassignedtotheentityuponinitialcurationintotheOCCisthetwo-lettershortnamefortheclassofitems(e.g.beforabibliographicentry)followedby an oblique slash (“/”) and a number assigned to each resource, unique amongresourcesofthesametype,whichincrementsforeachnewentryinthatresourceclass(e.g. “be/537”). Note that this identifier is for internal OCC use only, and is distinctfromany“public”InternationalizedResourceIdentifier(abbreviatedIRI)thatmaybeusedtoidentifytheentity.

Inaddition,thebibliographicentitymayhaveoneormoreotheridentifiersassignedtoitbyexternalthirdparties:

• Identifier (short: id): an external identifier (e.g. DOI 9 , ORCID10 , PubMedID11 )associatedwith thebibliographicentity. Membersof this classofOCCmetadataarethemselvesgivenuniquenumbers,e.g.“id/129”.

ProvenancemetadataAll the aforementioned OCC bibliographic entities and identifiers must have metadatadescribingtheirprovenance.Theseprovenancemetadataentitiesare:

• Snapshot of entity metadata (short: se): a particular snapshot recording themetadata associated with an individual entity (either a bibliographic entity or anidentifier)ataparticulartime.

• Curatorial activity (short: ca): a curatorial activity relating to that entity. Possibleactivitiesintheare:

o Creation:theactivityofcreatinganewentityandofassociatingnewmetadatawithit,withinthecorpus;

8http://www.ifla.org/publications/functional-requirements-for-bibliographic-records9https://www.doi.org/10http://orcid.org/11http://www.ncbi.nlm.nih.gov/pubmed

Page 4: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

o Modification: the activity of modifying (adding/removing) the metadataassociatedwithanexistingentity,orevenofdeprecatingtheentireentity;

o Merging: the activity of unifying themetadata relating to two different OCCbibliographicentitydescriptions,iftheyactuallyrepresentthesamething.Thiscanresultinthedeprecationofoneofthecorpusentriesinfavouroftheotherone.

• Provenance agent (short:pa): theagent, suchasaperson,organisationorprocess,that creates ormodifies entitymetadata, or that is used as sourceproviderof thosemetadata(e.g.CrossRef).

• Curatorialrole(short:cr):aparticularroleheldbyaprovenanceagentwithrespecttoacuratorialactivity(e.g.OCCcurator,metadatasource).

NamingconventionforentitiesandprovenancedataInthecorpuswedistinguishthreedifferentkindsofURLs:URLfordatasetsanddistributions,URLsforbibliographicentities,andURLsforprovenancedata.URLsfordatasetsanddistributionsTheURLidentifyingthecorpusisthefollowing:

[corpusURL]:[baseURL]/corpus/where the base URL has been chosen for guaranteeing persistency over time. The OpenCitationsprojecthasapersistentURLatw3id.org,https://w3id.org/oc.Therefore,theURLoftheOpenCitationsCorpusis:

• https://w3id.org/oc/corpus/ThecorpusURLidentifiesthemainaggregateddataset,whichissplitinseveralsub-datasets,oneforeachkindofentityincludedinthecorpus.TheURLsofsuchsub-datasetsfollowthefollowingschema:

[sub-datasetURL]:[corpusURL][entityshortname]/wheretheentityshortnameisthattwo-characterabbreviationspecifiedaboveforeachoftheentityclasseswithinthecorpus.Forexample,theURLofthedatasetofallOCCbibliographicresourcesis:

• https://w3id.org/oc/corpus/br/TheURLdefiningoneormoredistributionsofthemaindataset(i.e.theentirecorpus)is:

[corpusdistributionURL]:[corpusURL]di/[iterativenumber]

Page 5: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

where the iterative number is a number assigned to each distribution, unique amongdistributionsof resourcesof the same type.Forexample, the firstdistributionof theentirecorpusis:

• https://w3id.org/oc/corpus/di/1Similarly,theURLdefiningadistributionofanyofthecorpussub-datasetsis: [sub-datasetdistributionURL]:[sub-datasetURL]di/[iterativenumber]All thedistributions of a datasetmust be assigned to the relevantdistributiondataset (e.g.within the OCC, “https://w3id.org/oc/corpus/di/1” is stored in the dataset graph“https://w3id.org/oc/corpus/di/”).URLsforbibliographicentitiesandtheiridentifiersThe URL of each of the bibliographic entities in the corpus is constructed according to aparticularnamingconventionscheme,introducedasfollows:

[entityURL]:[corpusURL][entityshortname]/[iterativenumber]where corpusURL, entity short name, and iterative number are as previously defined. Forexample,thethirdentrywithintheOCCclassofbibliographicresources,andthe129thentrywithintheOCCclassofidentifiers,havethefollowingURLsrespectively:

• https://w3id.org/oc/corpus/br/3• https://w3id.org/oc/corpus/id/129

All theseentitiesmustbeassigned to thedataset related to theentity class (e.g.within theOCC,“https://w3id.org/oc/corpus/br/3”isstoredinthebibliographicresourcedatasetgraph“https://w3id.org/oc/corpus/br/”).URLsforprovenancemetadataEach of the OCC bibliographic entities and identifiers has associated with it a particularprovenance RDF graph that record information about its creation, modification and/ormerging.TheURLforsuchanentityprovenancegraphhasthefollowingstructure:

[entityprovenanceURL]:[entityURL]/prov/Such a graph contains all the provenance information related to the bibliographicentity/identifierunderconsideration,exceptthatrelatingtoprovenanceagents.Forexample,URLfortheprovenancegraphforthe15thbibliographicresourceinthecorpusis:

• https://w3id.org/oc/corpus/br/15/prov/The only exception to the aforementioned graph URL construction concerns the graphcontainingprovenance informationaboutprovenanceagents (curators,metadataproviders,etc.), since they can be involved in several curatorial activities of different bibliographic

Page 6: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

entitiesor identifierswithintheOCCand,thus,arenotnecessarytiedtoonespecificentity.For this reason, we store provenance agent metadata in the more appropriate generalprovenancegraph,namely: [corpusprovenanceURL]:[corpusURL]prov/OCCprovenancemetadataentities(i.e.snapshots,curatorialactivities,provenanceagentsandcuratorial roles) relating to a particular OCC bibliographic entity or identifier use thefollowingconventionfortheirURLs:

[provenanceagentURL]:[corpusprovenanceURL]pa/[iterativenumber]

[otherprovenancemetadataentityURL]:[entityprovenanceURL][provenancemetadataentityshortname]/[iterativenumber]

whereprovenancemetadataentityshortnameanditerativenumberareassignedasexplainedfortheotherentitiesinthecorpus.Forexample,thesecondcuratorialactivityrelatedtothefifteenthbibliographicresourceandthethirdprovenanceagentinvolvedinthatcuratorialactivityhavethefollowingURLs:

• https://w3id.org/oc/corpus/br/15/prov/ca/2• https://w3id.org/oc/corpus/prov/pa/3

Please note that all the provenance entities (except the information about the provenanceagents,suchastheirnames)areassignedtotheprovenancedatasetgraphassociatedwiththeentity of the corpus for which they provide provenance information (e.g.“https://w3id.org/oc/corpus/br/15/prov/ca/2” is stored in provenance graph“https://w3id.org/oc/corpus/br/15/prov/”). This has been done so as to make it easy toretrieveall theprovenanceinformationrelatedtoaparticularentitysimplybyaccessingallthestatementsintherelevantprovenancegraph.

MetadataelementsassociatedwithOCCdatasetsanddistributionsIn this section we introduce all the metadata elements that may be associated with eachdatasetordistribution.MetadataelementsthatmaybeassociatedwithanyOCCdataset(graph:https://w3id.org/oc/corpus/[entityshortname]/)

• hastitle:literalThetitleofthedataset.

• hasdescription:literalAshorttextualdescriptionofthecontentofthedataset.

• hasreleasedate:dateThedateofpublicationofaparticulardatasetbytheOCC.

Page 7: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

• hasmodificationdate:dateThedatedescribingwhenthedatasethasbeenmodified.

• haskeyword:literalAkeyworddescribingthecontentofthedataset.

• hassubject:conceptAconceptdescribingtheprimarysubjectofthedataset.

• hasdistribution:distributionAdistributionofthedataset.

MetadataelementsthatmaybeassociatedwiththemainOCCdataset(graph:https://w3id.org/oc/corpus/)Alltheattributesfordatasetsdefinedintheprevioussection,plusthefollowingones:

• haslandingpage:documentAnHTMLpage(indicatedbyitsURL)representingabrowseablepageforthecorpus.

• hassub-dataset:datasetAlinktoasubsetofthewholecorpusdataset.

• hasSPARQLendpoint:URLThelinktotheSPARQLendpointforqueryingthecorpus.

Metadataelementsthatmaybeassociatedwithadistribution(graph:https://w3id.org/oc/corpus/[entityshortnameornoneforthemaincorpus]/di/)

• hastitle:literalThetitleofthedistribution.

• hasdescription:literalAshorttextualdescriptionofthecontentofthedistribution.

• hasreleasedate:dateThefirstdateofpublicationofthedistribution.

• haslicense:documentTheresourcedescribingthelicenceassociatedwiththedatainthedistribution.

• hasdownloadURL:documentTheresourcewhichistherepresentationofthedistributioninacertainformat.

• hasfiletype:mediatypeThefiletypeofthedownloadablerepresentationofthedistribution(accordingtoIANAmediatypes).

• hasbytesize:literalThesizeinbytesofthedownloadabledistribution.

MetadataelementsassociatedwithanindividualbibliographicentityInthissectionweintroduceallthemetadataelementsthatmaybeassociatedwitheachofthefollowingOCCbibliographicentities.

Page 8: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

MetadataelementsthatmaybeassociatedwithanyOCCbibliographicentity

• hasidentifier:identifierInadditiontotheinternalcorpusidentifierassignedtotheentityuponinitialcurationintotheOCC(format:[entityshortname]/[iterativenumber],asspecifiedabove),otherexternalthird-partyidentifierscanbespecifiedthroughthisattribute(e.g.DOI,ORCID,PubMedID).

Metadataelementsthatmaybeassociatedwithabibliographicresource(graph:https://w3id.org/oc/corpus/br/)

• hastype:thingThetypeofthebibliographicresource,conformingtothoseintroducedabove.

• hastitle:literalThetitleofthebibliographicresource.

• hassubtitle:literalThesubtitleofthebibliographicresource.

• ispartof:bibliographicresource(br)Thecorpusidentifierofthebibliographicresource(e.g.issue,volume,journal,conferenceproceedings)thatisacontainerforthesubjectbibliographicresource.

• cites:bibliographicresource(br)Thecorpusidentifierofthebibliographicresourcecitedbythesubjectbibliographicresource.

• haspart:bibliographicentry(be)Theliteraltextofareferencewithinthebibliographicresource

• haspublicationyear:gYearTheyearofpublicationofthebibliographicresource.

• isembodiedas:resourceembodiment(re)Thecorpusidentifieroftheresourceembodimentdefiningtheformatinwhichthebibliographicresourcehasbeenembodied,whichcanbeeitherprintordigital.

• hasnumber:literalThenumberidentifyingthebibliographicresourceasaparticularitemwithinalargercollection(e.g.anarticlenumberwithinajournalissue,avolumenumberofajournal,achapternumberwithinabook).

• hasedition:literalAnidentifierforoneofseveralalternativeeditionsofaparticularbibliographicresource.

• hascontributor:agentrole(ar)Therole(e.g.author,editor,orpublisher)ofoneofthecontributorsofthisbibliographicresource.

Metadataelementsthatmaybeassociatedwitharesponsibleagent’srole(graph:https://w3id.org/oc/corpus/ar/)

• hasroletype:thingThespecifictypeofroleunderconsideration(e.g.author,editororpublisher).

Page 9: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

• isheldby:responsibleagent(ra)Theagentholdingthisrolewithrespecttoaparticularbibliographicresource.

• hasnext:agentrole(ar)Thefollowingroleinasequenceofagents’rolesofthesametypeassociatedwiththesamebibliographicresource(soastodefine,forinstance,itsorderedlistofauthors).

Metadataelementsthatmaybeassociatedwitharesponsibleagent(graph:https://w3id.org/oc/corpus/ra/)

• hasnamestring:literalThenameofanagent(forpeople,usuallyintheformat:givennamefollowedbyfamilyname,separatedbyaspace).

• hasgivenname:literalThegivennameofanagent,ifaperson.

• hasfamilyname:literalThefamilynameofanagent,ifaperson.

Metadataelementsthatmaybeassociatedwitharesourceembodiment(graph:https://w3id.org/oc/corpus/re/)

• hastype:thingItidentifiestheparticulartypeoftheembodiment,eitherdigitalorprint.

• hasformat:mediatypeItallowsonetospecifytheIANAmediatypeoftheembodiment.

• hasfirstpage:literalThefirstpageofthebibliographicresourceaccordingtothecurrentembodiment.

• haslastpage:literalThelastpageofthebibliographicresourceaccordingtothecurrentembodiment.

• hasurl:documentTheURLatwhichtheembodimentofthebibliographicresourceisavailable.

Metadataelementsthatmaybeassociatedwithabibliographicentry(graph:https://w3id.org/oc/corpus/be/)

• hasbibliographicentrytext:literalTheliteraltextofabibliographicentry(i.e.areference)occurringinthereferencelist(orelsewhere)withinabibliographicresource,thatreferencesanotherbibliographicresource.Thereferencetextshouldberecorded“asgiven”inthecitingbibliographicresource,includinganyerrors(e.g.mis-spellingsofauthors’names,orchangesfrom“β”intheoriginalpublishedtitleto“beta”inthereferencetext)oromissions(e.g.omissionofthetitleofthereferencedbibliographicresource,oromissionofsixthandsubsequentauthors’names,asrequiredbycertainpublishers),andinwhateverformatithasbeenmadeavailable.Forinstance,thereferencetextcanbeeitherasplaintextorasablockofXML.

• references:bibliographicresource(br)Thecorpusidentifierofthecitedbibliographicresourcetowhichthisbibliographicentryrelates.

Page 10: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

MetadataelementsassociatedwithidentifiersInthissectionweintroduceallthemetadataelementswithwhichanexternalthird-partyidentifierofanOCCbibliographicentitymaybeassociated.Metadataelementsassociatedwithanidentifier(graph:https://w3id.org/oc/corpus/id/)

• hasliteralvalue:literalThestringrepresentingtheidentifier(e.g.10.1987/4567.98).

• hasscheme:thingTheparticularidentifierschemetowhichtheidentifierbelongs(e.g.DOI).

ProvenanceinformationEachoftheaforementionedbibliographicentitiesintroducedintothecorpushasassociatedprovenanceinformationthatdocumentsthecuratorialprocessesthathaveledtothecurrentOCCdescriptionofthatresource.InthissectionweintroducealltheprovenancemetadataelementsthatconstitutetheprovenanceinformationforaparticularOCCbibliographicentity,allofwhichelementsarestoredwithintheentity’ssingleprovenancegraph.Metadataelementsthatmaybeassociatedwithasnapshotofentitymetadata(se)(graph:[entityprovenanceURL])

• hascreationdate:datetimeThedateonwhichaparticularsnapshotofabibliographicentity’smetadatawascreatedwithintheOCC.

• hasinvalidationdate:datetimeThedateonwhichasnapshotofabibliographicentity’smetadatawasinvalidatedduetoanupdate(e.g.theadditionofsomemetadatathatwasnotspecifiedintheprevioussnapshot)oramergerwithanotherone.

• issnapshotof:bibliographicentity(en)ThispropertyisusedtolinkasnapshotofentitymetadatatothebibliographicentityintheOCCtowhichthesnapshotrefers.

• isderivedfrom:snapshotofentitymetadata(se)Thispropertyisusedtoidentifytheimmediatelyprevioussnapshotofentitymetadataassociatedwiththesamebibliographicentity.

• hasprimarysource:thingThispropertyisusedtoidentifytheprimarysourcefromwhichthemetadatadescribedinthesnapshotarederived(e.g.theresultofqueryingtheCrossRefAPI).

• isgeneratedby:curatorialactivity(ca)Thispropertyisusedtospecifythecuratorialactivitywherebythesnapshotofentitymetadataentitywasgenerated.

Page 11: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

• isinvalidatedby:curatorialactivity(ca)Thispropertyisusedtospecifythecuratorialactivitywherebythesnapshotofentitymetadataentitywasinvalidated,i.e.thereasonfortheinvalidation.

Metadataelementsthatmaybeassociatedwithacuratorialactivity(ca)(graph:[entityprovenanceURL])

• hastype:thingThetypeofOCCcuratorialactivity,conformingtooneofthosedefinedabove(creation,modificationormerging).

• hasdescription:literalAtextualdescriptionoftheactivityanditsconsequence.

• hasupdateaction:thingTheUPDATESPARQLquerythatkeepstrackofwhichmetadatahavebeenmodifiedastheresultofamodificationofsomeofthemetadataorthemergingofthemetadatarelatingtoaparticularbibliographicentity.

• involvesagentwithrole:curatorialrole(cr)Thecuratorialroleoftheprovenanceagentinvolvedinthiscuratorialactivity.

Metadataelementsthatmaybeassociatedwithacuratorialrole(cr)(graph:[entityprovenanceURL])

• hasroletype:thingThespecifictypeofroleunderconsideration(e.g.themergingactivityofanOCCcurator,oranexternalauthorityactingasametadatasource).

• heldbyagent:provenanceagent(pa)Theprovenanceagent(OCCcuratororexternalauthority)holdingthatcuratorialrole.

Metadataelementsthatmaybeassociatedwithaprovenanceagent(pa)(graph:[entityprovenanceURL])

• hasnamestring:literalThenameofaprovenanceagent(forpeople,usuallyintheformat:givennamefollowedbyfamilyname,separatedbyaspace).

• hasgivenname:literalThegivennameofaprovenanceagent,ifaperson.

• hasfamilyname:literalThefamilynameofaprovenanceagent,ifaperson.

MappingwithOWLThissectionintroducesallthemappingoftheentitiesmentionedintheprevioussectionwithOWLontologydefinitions.MappingentitiestypesWeprovideamappingtoRDFofthebibliographicentitiesusedintheOpenCitationsCorpususing OWL ontologies, in particular the Semantic Publishing and Referencing (SPAR)

Page 12: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

Ontologies12,thewell-knownWeb,libraryandpublishingvocabulariesDublinCore13,FRBR14,PRISM15andRDF16,andthefollowingadditionalmodels:DCAT17,FOAF18,LiteralReification19,OCO20,PROV-O21,PROV-DC22,andVOID23.Thefollowingprefixesareemployed:

biro: http://purl.org/spar/biro/ cito: http://purl.org/spar/cito/ c4o: http://purl.org/spar/c4o/ datacite: http://purl.org/spar/datacite/ dcat: http://www.w3.org/ns/dcat# dcterms: http://purl.org/dc/terms/ fabio: http://purl.org/spar/fabio/ foaf: http://xmlns.com/foaf/0.1/ frbr: http://purl.org/vocab/frbr/core# literal: http://www.essepuntato.it/2010/06/literalreification/ oco: https://w3id.org/oc/ontology/ prism: http://prismstandard.org/namespaces/basic/2.0/ pro: http://purl.org/spar/pro/ prov: http://www.w3.org/ns/prov# rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# void: http://rdfs.org/ns/void#

Datasetsanddistributions

• Dataset: dcat:Dataset• Distribution: dcat:Distribution

Bibliographicentities

• Bibliographicentry: biro:BibliographicReference• Responsibleagent: foaf:Agent• Agentrole: pro:RoleInTime• Bibliographicresource: fabio:Expression

Subclasses:o Book fabio:Booko Bookchapter fabio:BookChapter

12http://www.sparontologies.net13http://dublincore.org/documents/dcmi-terms/14http://www.ifla.org/publications/functional-requirements-for-bibliographic-records15http://www.idealliance.org/specifications/prism-metadata-initiative16https://www.w3.org/TR/rdf11-concepts/17http://www.w3.org/TR/vocab-dcat18http://xmlns.com/foaf/spec/19http://ontologydesignpatterns.org/wiki/Submissions:Literal_Reification20https://w3id.org/oc/ontology21http://www.w3.org/TR/prov-o22http://www.w3.org/TR/prov-dc23http://www.w3.org/TR/void

Page 13: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

o Bookpart doco:Part(partofafabio:Book)o Booksection fabio:ExpressionCollection(partofafabio:Book)o Bookseries fabio:BookSerieso Bookset fabio:BookSeto Booktrack fabio:Expression(partofafabio:ExpressionCollection)o Component fabio:Expressiono Dataset fabio:DataFileo Dissertation fabio:Thesiso Editedbook fabio:Booko Journalarticle fabio:JournalArticleo JournalIssue fabio:JournalIssueo JournalVolume fabio:JournalVolumeo Journal fabio:Journalo Monograph fabio:Booko Proceedingsarticle fabio:ProceedingsPapero Proceedings fabio:AcademicProceedingso Referencebook fabio:ReferenceBooko Referenceentry fabio:ReferenceEntryo Reportseries fabio:Series(ofsomefabio:ReportDocument)o Report fabio:ReportDocumento Standardseries fabio:Series(ofsomefabio:SpecificationDocument)o Standard fabio:SpecificationDocument

• Resourceembodiment: fabio:ManifestationSubclasses:

o Digitalembodimentfabio:DigitalManifestationo Printembodiment fabio:PrintObject

Identifier

• Identifier: datacite:IdentifierProvenancedata

• Snapshotofentitymetadata: prov:Entity

• Curatorialactivity: prov:ActivitySubclasses:

o Creation: prov:Createo Modification: prov:Modifyo Merging: prov:Replace

• Provenanceagent: prov:Agent• Curatorialrole: prov:Association

MappingentitiesattributesandpropertiesInthissectionweintroducethemappingbetweenalltheattributesandpropertieswithOWL-relatedentities.

Page 14: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

DatasetsanddistributionsAnydataset:

• hastitle: dcterms:title• hassubtitle: fabio:hasSubtitle• hasdescription: dcterms:description• haspublicationdate: dcterms:issued• hasmodificationdate: dcterms:modified• haskeyword: dcat:keyword• hassubject: dcat:theme• hasdistribution: dcat:distribution

Maindataset(alltheabove,plusthefollowingones):

• haslandingpage: dcat:landingPage• hassub-dataset: void:subset• hasSPARQLendpoint: void:sparqlEndpoint

Distribution:

• hastitle: dcterms:title• hasdescription: dcterms:description• haspublicationdate: dcterms:issued• haslicense: dcterms:license• hasdownloadURL: dcat:downloadURL• hasfiletype: dcat:mediaType• hasbytesize: dcat:byteSize

BibliographicentitiesAnyofthefollowingresources

• hasidentifier: datacite:hasIdentifierBibliographicentry

• hasbibliographicentrytext: c4o:hasContent

• references: biro:referencesAgentrole

• hasroletype: pro:withRole• isheldby: pro:isHeldBy• hasnext: oco:hasNext

Responsibleagent

• hasname: foaf:name

Page 15: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

• hasgivenname: foaf:givenName• hasfamilyname: foaf:familyName

Bibliographicresource

• hastype: rdf:type• hastitle: dcterms:title• ispartof: frbr:partOf• cites: cito:cites• haspublicationyear: fabio:hasPublicationYear• isembodiedas: frbr:embodiment• hasnumber: fabio:hasSequenceIdentifier• hasedition: prism:edition• haspart: frbr:part• hascontributor: pro:isDocumentContextFor

Resourceembodiment:

• hastype: rdf:type• hasformat: dcterms:format• hasfirstpage: prism:startingPage• haslastpage: prism:endingPage• hasurl: frbr:exemplar

IdentifierIdentifier

• hasliteralvalue: literal:hasLiteralValue• hasscheme: datacite:usesIdentifierScheme

ProvenancedataSnapshotofentitymetadata

• hascreationdate: prov:generatedAtTime• hasinvalidationdate: prov:invalidatedAtTime• issnapshotof: prov:specializationOf• isderivedfrom: prov:wasDerivedFrom• hasprimarysource: prov:hadPrimarySource• isgeneratedby: prov:wasGeneratedBy• isinvalidatedby: prov:wasInvalidatedBy

Curatorialactivity

• hastype: rdf:type• involvesagentwithrole: prov:qualifiedAssociation

Page 16: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

• hasdescription: dcterms:description• hasupdateaction oco:hasUpdateQuery

Curatorialrole

• hasroletype: prov:hadRole• heldbyagent: prov:agent

Provenanceagent

• hasname: foaf:name• hasgivenname: foaf:givenName• hasfamilyname: foaf:familyName

LinearizationinBibJSON+JSON-LDThe RDF data included in the OCC is available in a triplestore, accompanied by a SPARQLendpoint, and is stored in JSON-LD format. The BibJSON specification(http://okfnlabs.org/bibjson/) has been adopted, since it provides JSON labels for thedescription of bibliographic entities. In the following subsections,we introduce alignmentbetweenOCCtermsandtheIRIsoftheontologicalentitiesdescribedintheprevioussection,andgiveexamplesoflinearizationofsomeoftheaforementionedentities.

ContextThe OCC Context (http://w3id.org/oc/corpus/context.json) is a mapping document thatformallymaps terms used in theOCC’s JSON-LD files to the entities defined in the variousontologiesusedfordescribingOCCdatainRDF.TheOCCContextisdefinedasfollows. { "@context": { "@base": "https://w3id.org/oc/", "gocc": "https://w3id.org/oc/corpus/", "gar": "https://w3id.org/oc/corpus/ar/", "gbe": "https://w3id.org/oc/corpus/be/", "gbr": "https://w3id.org/oc/corpus/br/", "gcr": "https://w3id.org/oc/corpus/cr/", "gid": "https://w3id.org/oc/corpus/id/", "gra": "https://w3id.org/oc/corpus/ra/", "gre": "https://w3id.org/oc/corpus/re/", "gdi": "https://w3id.org/oc/corpus/di/", "application": "https://w3id.org/spar/mediatype/application/", "biro": "http://purl.org/spar/biro/", "c4o": "http://purl.org/spar/c4o/", "cito": "http://purl.org/spar/cito/", "datacite": "http://purl.org/spar/datacite/", "dbr": "http://dbpedia.org/resource/", "dcat": "http://www.w3.org/ns/dcat#", "dcterms": "http://purl.org/dc/terms/", "doco": "http://purl.org/spar/doco/", "fabio": "http://purl.org/spar/fabio/", "foaf": "http://xmlns.com/foaf/0.1/", "frbr": "http://purl.org/vocab/frbr/core#", "literal": "http://www.essepuntato.it/2010/06/literalreification/", "oco": "https://w3id.org/oc/ontology/", "prism": "http://prismstandard.org/namespaces/basic/2.0/", "pro": "http://purl.org/spar/pro/", "prov": "http://www.w3.org/ns/prov#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#",

Page 17: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

"text": "https://w3id.org/spar/mediatype/text/", "void": "http://rdfs.org/ns/void#", "xsd": "http://www.w3.org/2001/XMLSchema#", "iri": "@id", "a": "@type", "agent": "foaf:Agent", "article": "fabio:JournalArticle", "book": "fabio:Book", "book_part": "doco:Part", "book_section": "fabio:ExpressionCollection", "book_series": "fabio:BookSeries", "book_set": "fabio:BookSet", "creation": "prov:Create", "curatorial_activity": "prov:Activity", "curatorial_role": "prov:Association", "dataset": "fabio:DataFile", "digital_format": "fabio:DigitalManifestation", "generic_format": "fabio:Manifestation", "entry": "biro:BibliographicReference", "inbook": "fabio:BookChapter", "inproceedings": "fabio:ProceedingsPaper", "merging": "prov:Replace", "metadata_snapshot": "prov:Entity", "document": "fabio:Expression", "occ_dataset": "dcat:Dataset", "occ_distribution": "dcat:Distribution", "patent": "fabio:PatentDocument", "periodical_issue": "fabio:JournalIssue", "periodical_volume": "fabio:JournalVolume", "periodical_journal": "fabio:Journal", "print_format": "fabio:PrintObject", "proceedings": "fabio:AcademicProceedings", "provenance_agent": "prov:Agent", "reference_book": "fabio:ReferenceBook", "reference_entry": "fabio:ReferenceEntry", "role": "pro:RoleInTime", "series": "fabio:Series", "standard": "fabio:SpecificationDocument", "techreport": "fabio:ReportDocument", "thesis": "fabio:Thesis", "web": "fabio:WebContent", "unpublished": "fabio:Preprint", "unique_identifier": "datacite:Identifier", "modification": "prov:Modify", "citation": { "@id": "cito:cites", "@type": "@vocab" }, "contributor": { "@id": "pro:isDocumentContextFor", "@type": "@vocab" }, "crossref": { "@id": "biro:references", "@type": "@vocab"}, "curatorial_role_type": { "@id": "prov:hadRole" , "@type": "@vocab" }, "derived_from": { "@id": "prov:wasDerivedFrom", "@type": "@vocab" }, "distribution": { "@id": "dcat:distribution", "@type": "@vocab" }, "download": { "@id": "dcat:downloadURL", "@type": "@vocab" }, "endpoint": { "@id": "void:sparqlEndpoint", "@type": "@vocab" }, "file_type": { "@id": "dcat:mediaType", "@type": "@vocab" }, "format": { "@id": "frbr:embodiment", "@type": "@vocab" }, "generated_by": { "@id": "prov:wasGeneratedBy", "@type": "@vocab" }, "held_by": { "@id": "prov:agent" , "@type": "@vocab" }, "identifier": { "@id": "datacite:hasIdentifier", "@type": "@vocab" }, "role_of": { "@id": "pro:isHeldBy", "@type": "@vocab" }, "invalidated_by": { "@id": "prov:wasInvalidatedBy", "@type": "@vocab" }, "involved": { "@id": "prov:qualifiedAssociation", "@type": "@vocab" }, "license": { "@id": "dcterms:license", "@type": "@vocab" }, "mime_type": { "@id": "dcterms:format", "@type": "@vocab" }, "next": { "@id": "oco:hasNext", "@type": "@vocab" }, "reference": { "@id": "frbr:part", "@type": "@vocab" }, "part_of": { "@id": "frbr:partOf", "@type": "@vocab" }, "role_type": { "@id": "pro:withRole", "@type": "@vocab" }, "snapshot_of": { "@id": "prov:specializationOf", "@type": "@vocab" }, "source": { "@id": "prov:hadPrimarySource", "@type": "@vocab" }, "subject": { "@id": "dcat:theme", "@type": "@vocab" }, "subset": { "@id": "void:subset", "@type": "@vocab" }, "type": { "@id": "datacite:usesIdentifierScheme", "@type": "@vocab" }, "document_url": { "@id": "frbr:exemplar", "@type": "@vocab" }, "webpage": { "@id": "dcat:landingPage", "@type": "@vocab" }, "byte": { "@id": "dcat:byteSize", "@type": "xsd:decimal" }, "description": "dcterms:description", "edition": "prism:edition",

Page 18: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

"fname": "foaf:familyName", "fpage": "prism:startingPage", "generated": { "@id": "prov:generatedAtTime", "@type": "xsd:dateTime" }, "gname": "foaf:givenName", "id": "literal:hasLiteralValue", "invalidated": { "@id": "prov:invalidatedAtTime", "@type": "xsd:dateTime" }, "keyword": "dcat:keyword", "label": "rdfs:label", "lpage": "prism:endingPage", "mod_date": { "@id": "dcterms:modified", "@type": "xsd:dateTime" }, "name": "foaf:name", "number": "fabio:hasSequenceIdentifier", "pub_date": { "@id": "dcterms:issued", "@type": "xsd:dateTime" }, "content": "c4o:hasContent", "title": "dcterms:title", "subtitle": "fabio:hasSubtitle", "year": { "@id": "fabio:hasPublicationYear", "@type": "xsd:gYear" }, "update_action": "oco:hasUpdateQuery", "ark": "datacite:ark", "arxiv": "datacite:arxiv", "author": "pro:author", "bibliographic_database": "dbr:Bibliographic_database", "cc0": "https://creativecommons.org/publicdomain/zero/1.0/legalcode", "ccby": "https://creativecommons.org/licenses/by/4.0/legalcode", "curator": "oco:occ-curator", "dia": "datacite:dia", "docx": "application:vnd.openxmlformats-officedocument.wordprocessingml.document", "doi": "datacite:doi", "ean13": "datacite:ean13", "editor": "pro:editor", "eissn": "datacite:eissn", "fundref": "datacite:fundref", "handle": "datacite:handle", "html": "text:html", "infouri": "datacite:infouri", "isbn": "datacite:isbn", "isni": "datacite:isni", "issn": "datacite:issn", "lissn": "datacite:lissn", "istc": "datacite:istc", "json": "application:json", "jsonld": "application:ld+json", "jst": "datacite:jst", "localfunder": "datacite:local-funder-identifier-scheme", "localpersonal": "datacite:local-personal-identifier-scheme", "localresource": "datacite:local-resource-identifier-scheme", "lsid": "datacite:lsid", "nii": "datacite:nii", "nationalinsurancenumber": "datacite:national-insurance-number", "nihmsid": "datacite:nihmsid", "occ": "datacite:occ", "odt": "application:vnd.oasis.opendocument.text", "open_access": "dbr:Open_access", "openid": "datacite:openid", "orcid": "datacite:orcid", "pdf": "application:pdf", "pii": "datacite:pii", "plain": "text:plain", "pmcid": "datacite:pmcid", "pmid": "datacite:pmid", "metadata_provider": "oco:source-metadata-provider", "publisher": "pro:publisher", "purl": "datacite:purl", "rdfxml": "application:rdf+xml", "researcherid": "datacite:researcherid", "scholarly_communication": "dbr:Scholarly_communication", "sici": "datacite:sici", "social_security_number": "datacite:social-security-number", "turtle": "text:turtle", "upc": "datacite:upc", "uri": "datacite:uri", "url": "datacite:url", "urn": "datacite:urn", "viaf": "datacite:viaf", "xhtml": "application:xhtml+xml" } }

Page 19: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

BibliographicresourcesandtheirmetadataThefollowingexcerptshowshowtolinearizetheinformationaboutabibliographicresourceintoJSON-LDaccordingtotheaforementionedmappingdocument(i.e.theOCCContext).{ "@context": "https://w3id.org/oc/corpus/context.json", "iri": "gbr:1", "a": "article", "identifier": [ { "iri": "gid:1", "a": "unique_identifier", "id": "br/1", "type": "occ" }, { "iri": "gid:2", "a": "unique_identifier", "id": "10.1108/JD-12-2013-0166", "type": "doi" }, { "iri": "gid:3", "a": "unique_identifier", "id": "http://www.emeraldinsight.com/doi/abs/10.1108/JD-12-2013-0166", "type": "url" }, { "iri": "gid:4", "a": "unique_identifier", "id": "http://dx.doi.org/10.1108/JD-12-2013-0166", "type": "url" } ], "title": "Setting our bibliographic references free: towards open citation data", "year": "2015", "format": [ { "iri": "gre:1", "a": "digital_format", "identifier": { "iri": "gid:5", "a": "unique_identifier", "id": "re/1", "type": "occ" }, "mime_type": "pdf", "fpage": "253", "lpage": "277", "document_url": "http://www.emeraldinsight.com/doi/pdfplus/10.1108/JD-12-2013-0166" }, { "iri": "gre:2", "a": "digital_format", "identifier": { "iri": "gid:6", "a": "unique_identifier", "id": "re/2", "type": "occ" }, "mime_type": "html", "document_url": "http://www.emeraldinsight.com/doi/full/10.1108/JD-12-2013-0166" } ], "reference": { "iri": "gbe:1", "a": "entry", "content": "Agarwal, S., Choubey, L. and Yu, H. (2010), “Automatically classifying the role of citations in biomedical articles”, Proceedings of the 2010 AMIA Annual Symposium, pp. 11-15.", "cross_reference": "gbr:5" }, "part_of": { "iri": "gbr:2", "a": "periodical_issue", "identifier": { "iri": "gid:7",

Page 20: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

"a": "unique_identifier", "id": "br/2", "type": "occ" }, "number": "2", "part_of": { "iri": "gbr:3", "a": "periodical_volume", "identifier": { "iri": "gid:8", "a": "unique_identifier", "id": "br/3", "type": "occ" }, "number": "71", "part_of": { "iri": "gbr:4", "a": "periodical_journal", "identifier": [ { "iri": "gid:9", "a": "unique_identifier", "id": "br/4", "type": "occ" }, { "iri": "gid:10", "a": "unique_identifier", "id": "0022-0418", "type": "issn" } ], "title": "Journal of Documentation" } } }, "citation": [ { "iri": "gbr:5", "a": "inproceedings", "identifier": [ { "iri": "gid:11", "a": "unique_identifier", "id": "br/5", "type": "occ" } ], "title": "Automatically classifying the role of citations in biomedical articles", "year": "2010", "format": [ { "iri": "gre:3", "a": "generic_format", "identifier": { "iri": "gid:12", "a": "unique_identifier", "id": "re/3", "type": "occ" }, "fpage": "11", "lpage": "15" } ], "part_of": { "iri": "gbr:10", "a": "proceedings", "identifier": { "iri": "gid:13", "a": "unique_identifier", "id": "br/10", "type": "occ" }, "title": "Proceedings of the 2010 AMIA Annual Symposium" } } ] }

Page 21: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

DatasetsanddistributionsThefollowingexcerptshowshowtolinearizetheinformationabouttheOCC,itsdistributionsand its related sub-datasets into JSON-LD according to the aforementioned mappingdocument(i.e.theOCCContext). { "@context": "https://w3id.org/oc/corpus/context.json", "iri": "gocc:", "a": "occ_dataset", "label": "OCC", "title": "The OpenCitations Corpus", "description": "The OpenCitations Corpus is an open repository of scholarly citation data made available under a Creative Commons public domain dedication, which provides in RDF accurate citation information (bibliographic references) harvested from the scholarly literature (described using the SPAR Ontologies) that others may freely build upon, enhance and reuse for any purpose, without restriction under copyright or database law.", "pub_date": "2016-02-01T00:00:00", "mod_date": "2016-04-01T00:00:00", "keyword": [ "OCC", "OpenCitations", "OpenCitations Corpus", "SPAR Ontologies", "bibliographic references", "citations" ], "subject": [ "scholarly_communication", "bibliographic_database", "open_access", "citations" ], "distribution": [ { "iri": "gdi:1", "a": "occ_distribution", "label": "distribution 1 of OCC [di/1 - OCC]", "title": "The Open Citations Corpus: distribution in Turtle dated 3rd April 2016", "description": "The 3rd April 2016 distribution of the Open Citations Corpus (OCC) stored in Turtle.", "pub_date": "2016-04-03T12:00:00", "license": "cc0", "download": "http://www.opencitations.net/static/distribution/occ-2016-04-03.ttl.zip", "file_type": "turtle", "byte": "14098371" } ], "webpage": "http://opencitations.net/", "subset": [ { "iri": "gbr:", "a": "occ_dataset", "label": "OCC / br", "title": "The Open Citations Corpus: Bibliographic Resource dataset", "description": "The OpenCitations Corpus is an open repository of scholarly citation data made available under a Creative Commons public domain dedication, which provides in RDF accurate citation information (bibliographic references) harvested from the scholarly literature (described using the SPAR Ontologies) that others may freely build upon, enhance and reuse for any purpose, without restriction under copyright or database law. This sub-dataset contains all the 'bibliographic resource' resources.", "pub_date": "2016-02-01T00:00:00", "mod_date": "2016-03-29T00:00:00", "keyword": [ "OCC", "OpenCitations Corpus", "OpenCitations", "SPAR Ontologies", "bibliographic references", "citations", "bibliographic resource" ], "subject": [ "scholarly_communication", "bibliographic_database", "open_access", "citations" ]

Page 22: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

} ], "endpoint": "https://w3id.org/oc/corpus/sparql" }

ProvenancedataThe following excerpt shows how to linearize the information about the provenance of abibliographic entity contained in the OCC into JSON-LD according to the aforementionedmappingdocument(i.e.theOCCContext). { "@context": "https://w3id.org/oc/corpus/context.json", "iri": "gbr:1/prov/se/2", "a": "metadata_snapshot", "label": "snapshot of entity metadata 2 related to bibliographic resource 1 [se/2 -> br/1]", "snapshot_of": "gbr:1", "generated": "2016-04-01T00:00:00", "generated_by": { "iri": "gbr:1/prov/ca/2", "a": ["curatorial_activity", "modification"], "label": "curatorial activity 2 related to bibliographic resource 1 [ca/2 -> br/1]", "involved": { "iri": "gbr:1/prov/cr/3", "a": "curatorial_role", "label": "curatorial role 3 related to bibliographic resource 1 [cr/3 -> br/1]", "curatorial_role_type": "curator", "held_by": { "iri": "gpa:3", "a": "provenance_agent", "name": "Silvio Peroni" } }, "description": "The field 'title' of the entity 'https://w3id.org/oc/corpus/br/1' has been modified.", "update_action": "DELETE DATA { GRAPH <https://w3id.org/oc/corpus/br/> { <https://w3id.org/oc/corpus/br/1> <http://purl.org/dc/terms/title> 'Setting our bibliographic references free: towards open citation data' } }; INSERT DATA { GRAPH <https://w3id.org/oc/corpus/br/> { <https://w3id.org/oc/corpus/br/1> <http://purl.org/dc/terms/title> 'Setting Our Bibliographic References Free: Towards Open Citation Data' } }" }, "derived_from": [ { "iri": "gbr:1/prov/se/1", "a": "metadata_snapshot", "label": "snapshot of entity metadata 1 related to bibliographic resource 1 [se/1 -> br/1]", "snapshot_of": "gbr:1", "generated": "2016-02-01T00:00:00", "generated_by": { "iri": "gbr:1/prov/ca/1", "a": ["curatorial_activity", "creation"], "label": "curatorial activity 1 related to bibliographic resource 1 [ca/1 -> br/1]", "involved": [ { "iri": "gbr:1/prov/cr/1", "a": "curatorial_role", "label": "curatorial role 1 related to bibliographic resource 1 [cr/1 -> br/1]", "curatorial_role_type": "metadata_provider", "held_by": { "iri": "gpa:1", "a": "provenance_agent", "name": "CrossRef" } }, { "iri": "gbr:1/prov/cr/2", "a": "curatorial_role", "label": "curatorial role 2 related to bibliographic resource 1 [cr/2 -> br/1]", "curatorial_role_type": "curator", "held_by": { "iri": "gpa:2", "a": "provenance_agent", "name": "SPACIN CrossrefProcessor"

Page 23: Metadata for the OpenCitations Corpus - Amazon S3 · the Data Catalog Vocabulary4 and the VoID Vocabulary5. Such datasets can have particular distributions. • Dataset (short: the

} } ], "description": "The entity 'https://w3id.org/oc/corpus/br/1' has been created." }, "source": "http://api.crossref.org/works/10.1108/JD-12-2013-0166", "invalidated": "2016-04-01T00:00:00", "invalidated_by": "gbr:1/prov/ca/2" } ] }