Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the...
Transcript of Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the...
Vocabulary management and SKOS
Putting Business in the Lead
Jan Voskuil (Taxonic)
September 5th, 2014, Leipzig
SEMANTiCS 2014
Introduction
Jan Voskuil Taxonic (co-founder)
Consultancy in Semantic Technology
“SKOS is used for findability, but should be used also for vocabulary management in organizations.
Business owns the dictionary, not IT”
What are dictionaries and what for? SKOS: Tooling and benefits Practicalities
Dienst Justitiële Inrichtingen (DJI)
Custodial Institutions Agency
Ca. 10.000 employees
Ca. 70.000 inmates per year
Ca. 50 facilities
Four groups of detainees
Adult detainees
Juvenile offenders
Patients in forensic care
Foreign nationals
4
Dictionaries: Benefits
• Knowledge management
• Quality of information
• Manageability– If your systems contain 100K+ of
attribute names, then they
contain unstructured
information (Dave McComb)
• Findability
– Document (DMS)
– Data (DBMS)
• Exchangeability
5
Frequency of the most frequent word
Frequency of the second most frequent word
How many key words are enough?
• Zipf’s Law• 5000 words are enough to understand
95% of any corpus. For the other 5% you need to know the other 200,000 words
Source:Tiberius and Schoonheim
A Frequency Dictionary of Dutch, 2014
Pocket dictionary: 5K
General dictionary: 100K
Lexicographic dictionary: 1M+
6
The Real World
Dictionary Owner
Begrippenwoordenboek DJI Dept X
Begrippenlijst Project Y Project Y
Mega Glossary ICT-Dept
Information chain dictionaries
Ketenwoordenboek Strafrecht JustID
Ketenwoordenboek Vreemdelingen
JustID
Justitiethesaurus WODC
Data Dictionaries
Gegevenswoordenboek MITS ICT-Dept
Datadictionary Tulp MIR ICT-Dept
… It just does not work!
What is the correct definition of x?Who decides this?
My project introduces new terms, how can I get these accepted?
7
OLD SITUATION NEW SITUATION
Various lists Single source of truth
Various versions Single source of truth
Word-documents Intranet (Internet)
Distribution per mail Intranet (Internet)
Endless discussions Clear-cut governance
Responsibility of IT dept or project Ownership by the business
8
Some How To’s
• Keep the dictionary lean and mean– Create a “pocket dictionary”
– Example: 1200 key words
• Governance: be pragmatic
• Ownership within the business!
• Use clear, explanatory descriptions – Language of the work force
– Avoid legal speak!
• Dictionary maintenance is a continuous proces!– Release cycle
– One major, four minor releases per year
– Major release is approved by senior executives
9
Why SKOS is so great: just enough semantics
• Semantic relations
– Compare one-dimensional lists
• A LIMITED number of
STANDARDIZED semantic
relations
– Broader, Narrower, Related Term
– Semantics is sufficiently vague
• Intuitive, easy to understand
– Ideal for “pidginization”
– Use is far broader than Class
Diagrams, ERDs and ontologies
• Only most relevant info
• “GENERALIZED CLASSIFICATION”
Justitiabele(“Detainee”)
Adult detainee
Juvenile offender
Foreign national
Patient in forensic care
nar
row
er
Criminal Law
Penal Institution
narrower
Sex
Male
Female
Unknown
Undisclosedn
arro
wer
10
Why SKOS is so great: tooling
11
Tooling: PoolParty Thesaurus Manager
12
End User View
13
SKOS is an Open Standard: Project Linking
http://vocabulary.wolterskluwer.de
15
prefLabel: Unfallverhütung
Alternative labels
Broaders
Narrowers
Related terms
From DBPedia
From lod.gesis.org
From eurovoc.org
From Wolters Kluwer
Other thesauri on
the web
16
prefLabel: Unfallverhütung
Alternative labels
Broaders
Narrowers
Related terms
From DBPedia
From lod.gesis.org
From eurovoc.org
From Wolters Kluwer
Other thesauri on
the web
DJI and the POLICE have very different meanings for the word ARRESTANT
DO: > RESPECT DIFFERENCES BETWEEN ORGANIZATIONS> MAKE LEXICOGRAPHIC DIFFERENCES EXPLICIT USING LINKED THESAURI
DON’T> TRY MAKING ALL ORGANIZATIONS USE EXACTLY THE SAME LANGUAGE
17
Conclusion and next step:
Linking Thesauri to Datamodels
• Datamodels: not owned by business
– too detailed
– too complex
– NO ownership at the strategic level
• Thesauri
– Relatively abstract
– Relatively simple
– Ownership by the business
• SKOS bridges the gap
– With datamodels in RDF, the gap can be bridged!
18
THESAURUS AND DOMAINMODELS: SCENARIO 1
DOMAIN MODEL| Data dictionary
:inmate#9818763
“B.23.a”:cell
:pi_Dordrecht:isRegisteredAt
:penitentiaryInstitution
rdf:type
THESAURUSskos:Concept
voc:4862
“Penitentiary Institution”
skos:prefLabel
rdf:type
“Detention Facility”
skos:broader
eurovoc:C877
Skos:Concept
rdfs:type
skos:exactMatch
skos:prefLabel
“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a
variety of freedoms under the authority of…
skos:Definition
“място за лишаване от свобода ”@bg
“Penal Institution”@en
skos:prefLabel
owl:sameAs?skos:exactMatch?
19
DOMAIN MODEL| Data dictionary
THESAURUS
DOMAIN MODEL| Data dictionary
THESAURUS AND DOMAINMODELS: SCENARIO 2
skos:Concept
“Penitentiary Institution”
rdf:type
“Detention Facility”
eurovoc:C877
Skos:Concept
rdfs:type
skos:exactMatch
skos:prefLabel
“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a
variety of freedoms under the authority of…
“място за лишаване от свобода ”@bg
“Penal Institution”@en
skos:prefLabel
:inmate#9818763
“B.23.a”:cell
:pi_Dordrecht:isRegisteredAt
:penitentiaryInstitution
rdf:type