Terminology Curation with the Semantic MediaWiki
description
Transcript of Terminology Curation with the Semantic MediaWiki
04/18/2007
Terminology and the Semantic MediaWiki
Ecoterm IV – Vienna17 – 18 April 2007
EcoInformatics InitiativeEcoInformatics Initiative
Terminology Curation with the Semantic MediaWiki
Harold SolbrigInformatics ArchitectApelon, Inc.
04/18/2007 Terminology and the Semantic MediaWiki 2
04/18/2007 Terminology and the Semantic MediaWiki 3
The Primary Task
Evaluate the roles, categories and organization of the National Cancer Institute (NCI)’s Cancer Thesaurus with respect to:
Upper Level Ontological Principles ISO TC37 & Related principles
As with Ontology construction, it was understood by all parties that this was a process – not a goal.
04/18/2007 Terminology and the Semantic MediaWiki 4
Approach
1. Gather appropriate upper level ontologies (BFO, Dolce, Top Bio, UMLS Semantic Net and OBO Relations Ontology) into a single, readily referenced format
2. Load NCI Thesaurus into same format
3. Multiple parties review, annotate, recommend and categorize
4. Publish, analyze and evaluate results
04/18/2007 Terminology and the Semantic MediaWiki 5
Solution
By using the Semantic MediaWiki (SMW), we were able to accomplish all of the goals in a (very) reasonable period of time
04/18/2007 Terminology and the Semantic MediaWiki 6
Discussion
We also discovered that, with some extensions, the SMW could be useful for publishing, annotating and cross-referencing other terminological (and other..) resources.
04/18/2007 Terminology and the Semantic MediaWiki 7
Questions?
… just kidding.
04/18/2007 Terminology and the Semantic MediaWiki 8
Wiki’s
Community developed Collaborative “Organic” – to the very core… Primary focus (to date) is human
consumption Traceable, provenance automatically
recorded, differences, undo and redo.
04/18/2007 Terminology and the Semantic MediaWiki 9
MediaWiki
http://en.wikipedia.org/wiki/Wiki Base for WikiPedia and many others… Key characteristics
Web based editingPage linksCategoriesTemplates
04/18/2007 Terminology and the Semantic MediaWiki 10
MediaWiki
Fully documented using (surprise!) mediawiki
Rich mechanisms for discussion, curation, export, etc.
04/18/2007 Terminology and the Semantic MediaWiki 11
04/18/2007 Terminology and the Semantic MediaWiki 12
Common constructs
[[Train Transport]] – hyperlink to page named “Train_Transport”
‘‘Italic’’, ‘‘‘Bold’’’ * Bullet point [http://www.w3c.org/ The W3C] – hyperlink … and much more
04/18/2007 Terminology and the Semantic MediaWiki 13
Templates
04/18/2007 Terminology and the Semantic MediaWiki 14
Templates
04/18/2007 Terminology and the Semantic MediaWiki 15
Sample Template
ParameterExtension call
04/18/2007 Terminology and the Semantic MediaWiki 16
Semantic MediaWiki
04/18/2007 Terminology and the Semantic MediaWiki 17
Semantic MediaWiki3 Key extensions to MediaWiki
1. Categories == Class– PageA … [[Category:X]] pageA rdf:Type
category:X– Category:Y … [[Category:X]] category:Y
rdfs:subClassOf category:X
2. Links == Role– PageA … [[PageB]] PageA …
[[hasPart::PageB]]
3. Attributes == DataProperty – [[population:=32,154,773]]– Includes datatypes
04/18/2007 Terminology and the Semantic MediaWiki 18
Categories and Relations
04/18/2007 Terminology and the Semantic MediaWiki 19
Attributes
04/18/2007 Terminology and the Semantic MediaWiki 20
Semantic Rendering
Type (or superClass)
Attribute Value
Relation RDF (!)
04/18/2007 Terminology and the Semantic MediaWiki 21
Thesaurus Content
04/18/2007 Terminology and the Semantic MediaWiki 22
Templates?
; Gene_Product_Is_Biomarker_Type
: The role is used to designate the type of …
Kind: [[:Category:NCI_Kind]]
‘‘‘Semantic Type:’’’ [NCI_Semantic_Type::Category:SN_Conceptual_Entity|Conceptual Entity]
Brittle, not readily changed…
04/18/2007 Terminology and the Semantic MediaWiki 23
Templates?
{{OntylogDescription|ns=NCI|text=“The role is used to designate…”}}
{{Kind|ns=NCI|target=Kind}}
{{ResourceRef|name=Semantic_Type|ns=NCI|target=Conceptual_Entity|targetns=SN}}
Can readily be updated viat template…
04/18/2007 Terminology and the Semantic MediaWiki 24
Commentary
Link to another NCI comment
Link to external Ontology
Categorization in external Ontology
04/18/2007 Terminology and the Semantic MediaWiki 25
Computed
04/18/2007 Terminology and the Semantic MediaWiki 26
How is it Working?
Very well!
04/18/2007 Terminology and the Semantic MediaWiki 27
What can we do to improve it…
04/18/2007 Terminology and the Semantic MediaWiki 28
Terminology
Centrally curated Central to the practice of medicine
Insurance and reportingRegulatoryResearchClinical Practice Information Sharing
ICD-9, CPT-4, SNOMED, …
04/18/2007 Terminology and the Semantic MediaWiki 29
Clinical Terminology
Quality and content is important Needs central vetting, integration, qa
Central model doesn’t scaleNeed input from (many) expertsNeed visible, active feedback loop
04/18/2007 Terminology and the Semantic MediaWiki 30
Terminology Workflow 1995
ControlledTerminology
Curation
(1)
Distribution
BooksPDF
Lists andTables
(2)
(3)
(4)
04/18/2007 Terminology and the Semantic MediaWiki 31
Terminology Workflow 1995
ControlledTerminology
‘B’
(1)
(2)
(3)
Curation
Distribution
BooksPDF
Lists andTables
04/18/2007 Terminology and the Semantic MediaWiki 32
Terminology Workflow 2008
ControlledTerminology
Curation
(1)
Distribution
(2)
(3)
(5)
CommonDistribution
Model
OnlineServices
(4)
04/18/2007 Terminology and the Semantic MediaWiki 33
Terminology Workflow 2008
ControlledTerminology
Curation
(1)
Distribution
(2)
(3)
(5)
CommonDistribution
Model
OnlineServices
(4)
ControlledTerminology
B
04/18/2007 Terminology and the Semantic MediaWiki 34
Common Distribution Model
LexGrid (a little bit of…) OWL
NCI Thesaurus & SNOMED CTStill requires LexGrid-like additions“Pushing the envelope”
UMLS RRFAlthough underspecified as a ‘model’
04/18/2007 Terminology and the Semantic MediaWiki 35
Online Services
OMG Terminology Query Services Not heavily used Perceived (incorrectly) as CORBA specific Perceived as too complex Object oriented and stateful
ANSI Common Terminology Services Being adopted Necessary but not sufficient Stateless
CTS-2 Co-development beginning w/ HL7 & OMG
04/18/2007 Terminology and the Semantic MediaWiki 36
Online Services
LexBIGLexGrid for the Bio Informatics GridRobust query specificationMeets many end-user (developers)
requirments Not simple to implement – it actually adds value Not a standard - but will be used to guide CTS-2
04/18/2007 Terminology and the Semantic MediaWiki 37
Workflow and Feedback
ControlledTerminology
Curation
(1)
Distribution
(2)
(3)
(5)
CommonDistribution
Model
OnlineServices
(4)
04/18/2007 Terminology and the Semantic MediaWiki 38
The Feedback Component
Curation
04/18/2007 Terminology and the Semantic MediaWiki 39
The Feedback Component
Curation
SemanticMediaWiki (++)
Annotations andChange Requests
CommunityReview
Distribution
CommonDistribution
Model
OnlineServices
VersionStaging
04/18/2007 Terminology and the Semantic MediaWiki 40
Issues and Next Steps
(1) SHARED Semantics{{Definition|…}}{{Synonym|…}}}{{References|…}}{{DLSome|…}}{{DLAll|…}}…
12620 anyone?
04/18/2007 Terminology and the Semantic MediaWiki 41
Issues and Next Steps
(2) Figure out namespacesNCI:Activity, AgroVoc:Fish, …NCI_Activity, AgroVoc_Fish???
(2a) Identifiers (Activity vs. C12345)(2b) Versions(2c) URI’s (vs. URL’s)
InternalExternal
04/18/2007 Terminology and the Semantic MediaWiki 42
Certification and Sanctioning
Who can edit? Who can validate? Who selects updates? … (see:
http://en.citizendium.org/wiki/Main_Page
04/18/2007 Terminology and the Semantic MediaWiki 43
Automatic Export
Selecting sets of updates Formatting update recommendations for
target curators, etc…
04/18/2007 Terminology and the Semantic MediaWiki 44
Synchronization
Changes implemented in terminologyUpdate wiki pagesSay what changedWhat changes are incorporated by value? By
reference?
04/18/2007 Terminology and the Semantic MediaWiki 45
Approach and Responsible Parties
Shared SemanticsCore set based on LexGrid & OWLPost on WIKI and link on SMW siteAssigned to Apelon, Mayo, NCI, ???Extend to OBO, SKOS (?), XMDR…Connections to 12620
04/18/2007 Terminology and the Semantic MediaWiki 46
Time Frame and Assignments
URI’s, namespaces, namingUK NCR (CancerGrid) – looking at unAPI and
servers(Hopefully) can provide URI resolver svc.Short term – use templates / extensions
04/18/2007 Terminology and the Semantic MediaWiki 47
Content
SNOMED-CT, ICD-9-CM, many, many others are already available via. Apelon DTS ServicesAvailable soon
FMA, HL7 Version 3 Terminology, OBO Foundry (GO, PATO, etc) as time permits
Others as needed (and funded…)
04/18/2007 Terminology and the Semantic MediaWiki 48
What we’ve got to date
Apelon DTS Server Extension Includes both defined and classified view (!) Export in restful XML (currentely Apelon, soon to be
LexGrid) XMDR Export Format Protégé (Native and OWL 3.2) prototype
Done by Mayo Both import and export Still needs templates
04/18/2007 Terminology and the Semantic MediaWiki 49
Questions?
This time for real
Note: SMW will be made externally available (w/ simple password) once we get contract specific info cleaned up (NCI will probably publish shortly)… contact: [email protected] for access.