The Lexical Grid. Why LexGrid? Existing medical records Various forms of coding and classification...
-
Upload
dominick-leonard -
Category
Documents
-
view
213 -
download
0
Transcript of The Lexical Grid. Why LexGrid? Existing medical records Various forms of coding and classification...
Why LexGrid?
Existing medical records• Various forms of coding and
classification in use since the early 1500’s
• ‘Modern’ records from the 1960’s to present include various forms of codes
• Medical records are still on a “per-institution basis”
Why LexGrid?
Emerging medical records
Multiple factors forcing new levels of interoperability
• Economic• Regulatory• Technical
Why LexGrid?
Bioinformatics• Large volumes of information
• Large cross sections• Detailed – what is important may not (and
cannot) be anticipated• Interoperability of
• Medical (Phenomics)• Genomics• Environmental• GeoSpatial
The GAP (In Western Medicine)“Terminologies”
Coding and Classification
“Ontologies”Computable DL Frameworks
ICD-9-CM
CPT-4
ICD-10-PCS
MESH
SNOMED-IIISNOMED CT
GO...
Many, many more to comeCountries
Languages
Mime Types
SNOP
FMA
ChEBI
MGED
GMOD
Language and the Communication Process
• Language - a “specification” that enables communication
• Semantics - the association between signs or symbols and their intended “meaning”
• Syntax - the rules for ordering and structuring the signs into phrases and sentences
• Pragmatics - the relationship between signs and symbols and the recipient. Broadly, the shared context.
Ogden’s Semiotic Triangle
Thought or Reference
Referent Symbol
SymbolisesRefers to
Stands for
C.K Ogden and I. A. Richards. The Meaning of Meaning.
Ogden’s Semiotic Triangle
Thought or Reference
Referent Symbol
SymbolisesRefers to
Stands for
C.K Ogden and I. A. Richards. The Meaning of Meaning.
“Rose”, “ClipArt”
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
Context Context
Shared Context
Shared Context
Impacts how much information can be contained in a symbol.
NoSharedContext
SharedUniverse
CommonLanguage
SharedSpecies
SharedPlanet
SharedSun
CommonCulture
SimilarEducation
CommonProfession
CommonSpecialty
Information / Symbol
The impact of context on communication
Shared context:• Allows information to be communicated in larger,
more succinct “chunks”.• Drug, analgesic and NSAID are all “chunks”,
yet differ markedly in conceptual complexity.• Enables specialized symbol sets:
• Contrast the amount of information contained in the formula E=MC2 versus that contained in this presentation...
Contextual Formalism
The degree of formality in a shared context can vary across a wide spectrum:
• Tacit context which is simply presumed• Contextual negotiation proceeding the
actual message• Rigorous and formal rules and documents
describing the form and possible meanings behind every message and phrase.
Factors Effecting the Degree Contextual Formalism
• Number of participating parties• Formalism needs to increase as number of
participants increase
• Geographic, cultural and temporal proximity of communicators
• The further apart communicators are, the less they can assume
• Amount of shared context• The more you have, the more important it
becomes to be organized
Factors Effecting the Degree Contextual Formalism
• The cost of imprecise communication• Poetry and literature - low cost (some may argue
actual gain)• Technical and professional - high to very high
cost• What is the cost of assuming the units of a
thrust specification?• What is the cost of assuming the dose of a
prescription?• What is the cost of assuming the century in
which the communication originated?
Common Forms of Contextual Formalism
• Dictionaries
• Thesauri
• Textbooks, college courses, etc.
• Operations manuals
• Data dictionaries
• Terminologies
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
Context Context
Shared Context
Making Shared Context Explicit
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Context Context
Formal SharedContext
Terminologies Terminologies
Shared Context Least Common Denominator
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Context Context
Reduce the Shared Context...
Terminologies Terminologies
Terminologies
“I see a ClipArt image of a red flower with ...”
... increase the symbolcomplexity
Context
Information vs. Symbol
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Information
Symbol Symbol
Information – predicate w/ Range of True/False/..Symbol - predicate w/ Range of “Concept”
Ontologies serve (at least) two roles
Symbol - Definitional
• Concept -> Symbol
• Symbol -> Concept
• Symbol/Symbol translation
• Symbol validation, organization and mapping
• Are axioms – not verifiable
Information - Propositional
• Statements
• True/False/Unknown
• Convey information
• Are verifiable
Sample Description Logic
Symbol
A, B
C, D
R
>
?
: A
C u D
8 R.C
9 R.>
Interpretation
AI, BI
CI, DI
RI µ I x I
I
?
I n AI
CI Å DI
{a 2 I | 8 b.(a,b) 2 RI ! b 2 CI}
{a 2 I | 9 b.(a,b) 2 RI}
Interpretations
• An interpretation I satisfies an inclusion C v D if CI µ DI, and it satisfies an equality C ´ D if CI = DI.
• If T is a set of axioms, then I satisfies T iff I satisfies each element of T.
• If I satisfies an axiom (resp. a set of axioms) then we say that it is a model of this axiom (resp. set of axioms).
• Two axioms or two sets of axioms are equivalent if they have the same models.
Description Logic
Symbol
A, B
C, D
R
>
?
: A
C u D
8 R.C
9 R.>
Interpretation
AI, BI
CI, DI
RI µ I x I
I
?
I n AI
CI Å DI
{a 2 I | 8 b.(a,b) 2 RI ! b 2 CI}
{a 2 I | 9 b.(a,b) 2 RI}
Much study (DAML+OIL, OWL, CL, …)
But what of this????
Interpretation and OWL
OWL:AnnotationProperty
• … in OWL DL one cannot define subproperties or domain/range constraints for annotation properties
• Five annotation properties are predefined by OWL:owl:versionInfo
rdfs:labelrdfs:commentrdfs:seeAlsordfs:isDefinedBy
A Rose in OWL?
<owl:Class rdf:ID=“Rose”><rdfs:subClassOf rdf:resource=“#FloweringPlant”/><rdfs:subClassOf>
<owl:restriction>
<owl:onProperty rdf:resource=“#hasRisk”/>
<owl:someValuesFrom rdf:resource=“#Thorn/>
</owl:restriction>
</rdfs:subClassOf>
</owl:Class>
A Rose in OWL?
<owl:Class rdf:ID=“C”><rdfs:subClassOf rdf:resource=“#A”/><rdfs:subClassOf>
<owl:restriction>
<owl:onProperty rdf:resource=“#R”/>
<owl:someValuesFrom rdf:resource=“#D/>
</owl:restriction>
</rdfs:subClassOf>
</owl:Class>
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“rose”
Refers ToSymbolises
Stands For“floweringPlant”,“hasRisk”“thorn”
CONCEPT
Symbol
Context Context
{x 2 I | 9 thorn.(x, thorn) 2 hasRiskI Å x 2 floweringPlantI }
+
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“A101”
Refers ToSymbolises
Stands For“A102”,“R1”“A103”
CONCEPT
Symbol
Context Context
{x 2 I | 9 A102.(x, A102) 2 R1I Å x 2 A103^\cI }
+
A101I = “rose”
A101I = “flower”
A102I = “sharp spine”
R1I = “possible misfortune”
Definitions vs. Propositions
Is this:1. The thing that is defined as a
procedure that involves an excision of a structure of lobe of lung? (Axiom)
2. A statement saying “All procedures that involve an excision of the structure of lobe of lung are pulmonary lobectomy? (Falsifiable proposition)
LexGrid Focus
• Definitional Aspects of Ontologies
• Making sure that the information (axioms) that are the basis of propositions are accurate, complete and reproducible
• Making sure that resulting propositions are verifiable – that the terms that come out match the terms that go in
(Reference) Ontologies
Reference ontologies are not designed to be nice - they are designed to be big, boring and true.
Barry Smith
LexGrid Goal
1) Combine:• Lexical Semantics
• Names• (Textual) Definitions• Comments• Other non-classification property
• Context• Languages and dialects• Communities and specialties• Localizations
• Logical Semantics• Roles and Relations
LexGrid Goal
2) Use these to integrate, reason about and report:
• Existing data & codes• Special contexts• Need formalization
• New information• New screens• Metadata
The LexGrid Goal
Terminology as a commodity resource• Available whenever and wherever it
is needed• Online or downloadable• Push or pull update mechanism• Available 24x7
• Revised and updated in “real-time”• Cross-linked and indexed
The Heart of the Lexical Grid
The LexGrid Model - a model of terminology that:1) Explicitly names and defines the
things that the LexGrid tools need to reference explicitly
2) Represents “non-semantic” entities as name/value pairs
hyperNormalized hyperSpecified
hyperNormalized Model
+ Incredibly flexible
- Doesn’t say a heck of a lot about a given domain.
• Specialization is possible• Entity: “Patient”• Attribute: “Name/String”• Relationship: hasName
• Many hyperNormalized models already existER1 / UML / SQL / …
hyperNormalized
hyperSpecified Model
hyperSpecified
+ Incredibly precise – you know exactly what you’ve got
- Unwieldy and inflexible
- Difficult to understand
hyperNormalized hyperSpecified
Modeling Pragmatics
• Make the differences that are important explicit
• Use terminology to carry the rest
The Heart of the Lexical Grid
The LexGrid Model - a formal model of terminology that:1) Explicitly names and defines the
entities and objects used in the LexGrid tooling
2) Supports as many “non-semantic” entities (from the toolkit perspective) as possible via. Name/value pairs
(Short Rave)
Thought or Reference
Referent Symbol
SymbolisesRefers to
Stands for
C.K Ogden and I. A. Richards. The Meaning of Meaning.
“Rose”, “ClipArt”
Concept
Symbol
Concept, Symbol and Meaning
Human Being
Human / SymbolInteraction
The focus ofLexGrid
cd Model
Concept
Symbol
Meaning
Concept vs. Symbol
A thing that is a flower and has thorns
Symbol
Symbolizes a conceptNOT a concept.
(short rave)
• Calling a symbol a concept in a model:• Confuses everyone• Makes a mess of the resulting model
• Everything is a concept• And (almost) everything is NOT in
anyone’s database
• Symbols, can be modeled, carried in databases, reasoned with, etc.
The LexGrid Model
• Source is currently maintained in XML Schema
• First incarnation was LDAP Schema
• (Semi) automatic transformations available to
• Unified Modeling Language (UML)• XML Model Interchange (XMI)• Eclipse Modeling Framework (EMF)• Java• LDAP Schema
The LexGrid Node
• A LexGrid Node is software and a backing data store that represents terminological information in a format semantically faithful to the LexGrid Model
LexGridNode
DataStore
FunctionalityVirtual Nodes
LexGridNode
DataStoreLexGridNode
DataStore
LexGridNode
DataStore
LexGridNode
DataStore
Mayo
Stanford
UCSF
NCI
FunctionalityVirtual Nodes
• Virtual Node Toolkit• Create and load a local node• Publish in web space• Node is treated as part of the larger
grid
FunctionalityReplication / Update
NCIReplica
DataStore
Mayo
NCIReplica
DataStore
Stanford
NCI
DataStore
NCI
Update
Subscribe
ChangeLog
ChangeLog
ChangeLog
“Push”“Pull”
FunctionalityIndices
NCI
DataStore
NCI
Update
IndexService
Subscribe
“Push”
ReasoningService
Subscribe
“Push”
FunctionalityCross References
NCI
DataStore
UMLS
DataStore
SemanticNET
DataStoreUMLS_CUI = URN:ISO:2.16.840.1.113883.6.56:C0002072
Semantic_Type = URN:ISO:2.16.840.1.113883.6.56.1:T123
T123 – “Biologically Active Substance”
ConceptCode: C222 entityDescription: Alkylsulfonate Compound Semantic_Type: SemNet:T123 UMLS_CUI: C0002072
C0002702 – “Alkanesufonates”
LexGrid Components
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
OWLBrowse and
Edit
Export
Embed
...
OBO
UMLSSKOS
Protege (custom)
Protege
LexGrid Components
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
...
MMFIODM…
20944
XMDRRDF DB’s
SPARQLProlog..
SwoopProtégéDagEditXMDRp
SKOSOWLUMLS…
MDA
Different Data Forms, Same Information
PT# Observation
1110112 F
PT# Tag Value
1110112 Gender Female
PT# Female
1110112 TRUE
PT#
1110112
Table 17: Female Patients
Table Name
Tag/Value Pair
Column Heading
A code in a table
Database Names
Free text
PT# observation
1110112 “…a middle-aged woman…”
Female ResearchClinic
Different Vocabulary Same Information
Code Designation
F Female
Code Designation
123.17 Male or Female Adult
Code DesignationAA 17-44 Year Old Female with
no signs of head injury
Code Designation
A17 XX
A13 XX Mosaic
Desired Granularity
Too Coarse
Coupled With OtherInformation
Too Fine
Terminology and the Information Model
Information Model
Terminology
Terminology and structure must be coordinated to achieve consistency and an integrated whole in HL7 standards.
LexGrid Collaborations
• NCI• LexBIG – LexGrid for caGRID
• National Center for Biomedical Ontology• LexBIO – LexGrid for NCBO
• Health Level Seven (HL7)• Tooling
• National Library of Medicine
• ISO JTC1/SC32 (NCITS-L8) - XMDR
Acnowledgements
This work was supported in part by a grant from the US National Library of Medicine: LM07319.