The Lexical Grid. Why LexGrid? Existing medical records Various forms of coding and classification...

75
The Lexical Grid

Transcript of The Lexical Grid. Why LexGrid? Existing medical records Various forms of coding and classification...

The Lexical Grid

Why LexGrid?

Existing medical records• Various forms of coding and

classification in use since the early 1500’s

• ‘Modern’ records from the 1960’s to present include various forms of codes

• Medical records are still on a “per-institution basis”

Why LexGrid?

Emerging medical records

Multiple factors forcing new levels of interoperability

• Economic• Regulatory• Technical

Why LexGrid?

Bioinformatics• Large volumes of information

• Large cross sections• Detailed – what is important may not (and

cannot) be anticipated• Interoperability of

• Medical (Phenomics)• Genomics• Environmental• GeoSpatial

The GAP (In Western Medicine)“Terminologies”

Coding and Classification

“Ontologies”Computable DL Frameworks

ICD-9-CM

CPT-4

ICD-10-PCS

MESH

SNOMED-IIISNOMED CT

GO...

Many, many more to comeCountries

Languages

Mime Types

SNOP

FMA

ChEBI

MGED

GMOD

LexGridThe purpose behind LexGrid

Communication

Language and the Communication Process

• Language - a “specification” that enables communication

• Semantics - the association between signs or symbols and their intended “meaning”

• Syntax - the rules for ordering and structuring the signs into phrases and sentences

• Pragmatics - the relationship between signs and symbols and the recipient. Broadly, the shared context.

Ogden’s Semiotic Triangle

Thought or Reference

Referent Symbol

SymbolisesRefers to

Stands for

C.K Ogden and I. A. Richards. The Meaning of Meaning.

Ogden’s Semiotic Triangle

Thought or Reference

Referent Symbol

SymbolisesRefers to

Stands for

C.K Ogden and I. A. Richards. The Meaning of Meaning.

“Rose”, “ClipArt”

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Semantics

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Semantics

Syntax

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Semantics

Syntax

Context Context

Shared Context

Shared Context

Impacts how much information can be contained in a symbol.

NoSharedContext

SharedUniverse

CommonLanguage

SharedSpecies

SharedPlanet

SharedSun

CommonCulture

SimilarEducation

CommonProfession

CommonSpecialty

Information / Symbol

Minimum Shared Context

The impact of context on communication

Shared context:• Allows information to be communicated in larger,

more succinct “chunks”.• Drug, analgesic and NSAID are all “chunks”,

yet differ markedly in conceptual complexity.• Enables specialized symbol sets:

• Contrast the amount of information contained in the formula E=MC2 versus that contained in this presentation...

Contextual Formalism

The degree of formality in a shared context can vary across a wide spectrum:

• Tacit context which is simply presumed• Contextual negotiation proceeding the

actual message• Rigorous and formal rules and documents

describing the form and possible meanings behind every message and phrase.

Factors Effecting the Degree Contextual Formalism

• Number of participating parties• Formalism needs to increase as number of

participants increase

• Geographic, cultural and temporal proximity of communicators

• The further apart communicators are, the less they can assume

• Amount of shared context• The more you have, the more important it

becomes to be organized

Factors Effecting the Degree Contextual Formalism

• The cost of imprecise communication• Poetry and literature - low cost (some may argue

actual gain)• Technical and professional - high to very high

cost• What is the cost of assuming the units of a

thrust specification?• What is the cost of assuming the dose of a

prescription?• What is the cost of assuming the century in

which the communication originated?

Common Forms of Contextual Formalism

• Dictionaries

• Thesauri

• Textbooks, college courses, etc.

• Operations manuals

• Data dictionaries

• Terminologies

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Semantics

Syntax

Context Context

Shared Context

Making Shared Context Explicit

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Context Context

Formal SharedContext

Terminologies Terminologies

Shared Context Least Common Denominator

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Context Context

Reduce the Shared Context...

Terminologies Terminologies

Terminologies

“I see a ClipArt image of a red flower with ...”

... increase the symbolcomplexity

Context

Information vs. Symbol

CONCEPT

Referent

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

Refers ToSymbolises

Stands For“Rose”,“ClipArt”

CONCEPT

Symbol Symbol

“I see a ClipArt image of a rose”

Information

Symbol Symbol

Information – predicate w/ Range of True/False/..Symbol - predicate w/ Range of “Concept”

Ontologies serve (at least) two roles

Symbol - Definitional

• Concept -> Symbol

• Symbol -> Concept

• Symbol/Symbol translation

• Symbol validation, organization and mapping

• Are axioms – not verifiable

Information - Propositional

• Statements

• True/False/Unknown

• Convey information

• Are verifiable

Sample Description Logic

Symbol

A, B

C, D

R

>

?

: A

C u D

8 R.C

9 R.>

Interpretation

AI, BI

CI, DI

RI µ I x I

I

?

I n AI

CI Å DI

{a 2 I | 8 b.(a,b) 2 RI ! b 2 CI}

{a 2 I | 9 b.(a,b) 2 RI}

Interpretations

• An interpretation I satisfies an inclusion C v D if CI µ DI, and it satisfies an equality C ´ D if CI = DI.

• If T is a set of axioms, then I satisfies T iff I satisfies each element of T.

• If I satisfies an axiom (resp. a set of axioms) then we say that it is a model of this axiom (resp. set of axioms).

• Two axioms or two sets of axioms are equivalent if they have the same models.

Description Logic

Symbol

A, B

C, D

R

>

?

: A

C u D

8 R.C

9 R.>

Interpretation

AI, BI

CI, DI

RI µ I x I

I

?

I n AI

CI Å DI

{a 2 I | 8 b.(a,b) 2 RI ! b 2 CI}

{a 2 I | 9 b.(a,b) 2 RI}

Much study (DAML+OIL, OWL, CL, …)

But what of this????

Interpretation and OWL

OWL:AnnotationProperty

• … in OWL DL one cannot define subproperties or domain/range constraints for annotation properties

• Five annotation properties are predefined by OWL:owl:versionInfo

rdfs:labelrdfs:commentrdfs:seeAlsordfs:isDefinedBy

A Rose in OWL?

<owl:Class rdf:ID=“Rose”><rdfs:subClassOf rdf:resource=“#FloweringPlant”/><rdfs:subClassOf>

<owl:restriction>

<owl:onProperty rdf:resource=“#hasRisk”/>

<owl:someValuesFrom rdf:resource=“#Thorn/>

</owl:restriction>

</rdfs:subClassOf>

</owl:Class>

A Rose in OWL?

<owl:Class rdf:ID=“C”><rdfs:subClassOf rdf:resource=“#A”/><rdfs:subClassOf>

<owl:restriction>

<owl:onProperty rdf:resource=“#R”/>

<owl:someValuesFrom rdf:resource=“#D/>

</owl:restriction>

</rdfs:subClassOf>

</owl:Class>

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“rose”

Refers ToSymbolises

Stands For“floweringPlant”,“hasRisk”“thorn”

CONCEPT

Symbol

Context Context

{x 2 I | 9 thorn.(x, thorn) 2 hasRiskI Å x 2 floweringPlantI }

+

The Communication Process

CONCEPT

Referent

Refers ToSymbolises

Stands For“A101”

Refers ToSymbolises

Stands For“A102”,“R1”“A103”

CONCEPT

Symbol

Context Context

{x 2 I | 9 A102.(x, A102) 2 R1I Å x 2 A103^\cI }

+

A101I = “rose”

A101I = “flower”

A102I = “sharp spine”

R1I = “possible misfortune”

Definitions vs. Propositions

Is this:1. The thing that is defined as a

procedure that involves an excision of a structure of lobe of lung? (Axiom)

2. A statement saying “All procedures that involve an excision of the structure of lobe of lung are pulmonary lobectomy? (Falsifiable proposition)

LexGrid Focus

• Definitional Aspects of Ontologies

• Making sure that the information (axioms) that are the basis of propositions are accurate, complete and reproducible

• Making sure that resulting propositions are verifiable – that the terms that come out match the terms that go in

(Reference) Ontologies

Reference ontologies are not designed to be nice - they are designed to be big, boring and true.

Barry Smith

LexGrid Goal

1) Combine:• Lexical Semantics

• Names• (Textual) Definitions• Comments• Other non-classification property

• Context• Languages and dialects• Communities and specialties• Localizations

• Logical Semantics• Roles and Relations

LexGrid Goal

2) Use these to integrate, reason about and report:

• Existing data & codes• Special contexts• Need formalization

• New information• New screens• Metadata

The LexGrid Goal

Terminology as a commodity resource• Available whenever and wherever it

is needed• Online or downloadable• Push or pull update mechanism• Available 24x7

• Revised and updated in “real-time”• Cross-linked and indexed

LexGrid Three-Pronged Approach

LexGridModel

The Heart of the Lexical Grid

The LexGrid Model - a model of terminology that:1) Explicitly names and defines the

things that the LexGrid tools need to reference explicitly

2) Represents “non-semantic” entities as name/value pairs

Modeling Extremes

hyperNormalized hyperSpecified

hyperNormalized hyperSpecified

hyperNormalized Model

+ Incredibly flexible

- Doesn’t say a heck of a lot about a given domain.

• Specialization is possible• Entity: “Patient”• Attribute: “Name/String”• Relationship: hasName

• Many hyperNormalized models already existER1 / UML / SQL / …

hyperSpecified Model

hyperNormalized

hyperSpecified Model

hyperSpecified

+ Incredibly precise – you know exactly what you’ve got

- Unwieldy and inflexible

- Difficult to understand

hyperNormalized hyperSpecified

Modeling Pragmatics

• Make the differences that are important explicit

• Use terminology to carry the rest

The Heart of the Lexical Grid

The LexGrid Model - a formal model of terminology that:1) Explicitly names and defines the

entities and objects used in the LexGrid tooling

2) Supports as many “non-semantic” entities (from the toolkit perspective) as possible via. Name/value pairs

The LexGrid Model

Interpretation Computation

The LexGrid Model

(Short Rave)This is not a model of a concept!!!

It is a model of a symbol!!!

(Short Rave)

Thought or Reference

Referent Symbol

SymbolisesRefers to

Stands for

C.K Ogden and I. A. Richards. The Meaning of Meaning.

“Rose”, “ClipArt”

Concept

Symbol

Concept, Symbol and Meaning

Human Being

Human / SymbolInteraction

The focus ofLexGrid

cd Model

Concept

Symbol

Meaning

Concept vs. Symbol

A thing that is a flower and has thorns

Symbol

Symbolizes a conceptNOT a concept.

(short rave)

• Calling a symbol a concept in a model:• Confuses everyone• Makes a mess of the resulting model

• Everything is a concept• And (almost) everything is NOT in

anyone’s database

• Symbols, can be modeled, carried in databases, reasoned with, etc.

(end rave)

The LexGrid Model

• Source is currently maintained in XML Schema

• First incarnation was LDAP Schema

• (Semi) automatic transformations available to

• Unified Modeling Language (UML)• XML Model Interchange (XMI)• Eclipse Modeling Framework (EMF)• Java• LDAP Schema

The LexGrid Node

• A LexGrid Node is software and a backing data store that represents terminological information in a format semantically faithful to the LexGrid Model

LexGridNode

DataStore

FunctionalityVirtual Nodes

LexGridNode

DataStoreLexGridNode

DataStore

LexGridNode

DataStore

LexGridNode

DataStore

Mayo

Stanford

UCSF

NCI

FunctionalityVirtual Nodes

• Virtual Node Toolkit• Create and load a local node• Publish in web space• Node is treated as part of the larger

grid

FunctionalityVirtual Nodes – Cross Node Search

ICD-9

FMA

MeSH

FunctionalityReplication / Update

NCIReplica

DataStore

Mayo

NCIReplica

DataStore

Stanford

NCI

DataStore

NCI

Update

Subscribe

ChangeLog

ChangeLog

ChangeLog

“Push”“Pull”

FunctionalityIndices

NCI

DataStore

NCI

Update

IndexService

Subscribe

“Push”

ReasoningService

Subscribe

“Push”

FunctionalityCross References

NCI

DataStore

UMLS

DataStore

SemanticNET

DataStoreUMLS_CUI = URN:ISO:2.16.840.1.113883.6.56:C0002072

Semantic_Type = URN:ISO:2.16.840.1.113883.6.56.1:T123

T123 – “Biologically Active Substance”

ConceptCode: C222 entityDescription: Alkylsulfonate Compound Semantic_Type: SemNet:T123 UMLS_CUI: C0002072

C0002702 – “Alkanesufonates”

FunctionalityNode Directory

LexGrid Components

LexGridNode

DataStore

Services

WebClients

Java

.NET

...

Import

Editors

Browsers

Query Tools

OWL

RDFXML

CSV

OWLBrowse and

Edit

Export

Embed

...

OBO

UMLSSKOS

Protege (custom)

Protege

LexGrid Components

LexGridNode

DataStore

Services

WebClients

Java

.NET

...

Editors

Browsers

Query Tools

OWL

RDFXML

CSV

Terminology

...

MMFIODM…

20944

XMDRRDF DB’s

SPARQLProlog..

SwoopProtégéDagEditXMDRp

SKOSOWLUMLS…

MDA

LexGrid and Metadata

Different Data Forms, Same Information

PT# Observation

1110112 F

PT# Tag Value

1110112 Gender Female

PT# Female

1110112 TRUE

PT#

1110112

Table 17: Female Patients

Table Name

Tag/Value Pair

Column Heading

A code in a table

Database Names

Free text

PT# observation

1110112 “…a middle-aged woman…”

Female ResearchClinic

Different Vocabulary Same Information

Code Designation

F Female

Code Designation

123.17 Male or Female Adult

Code DesignationAA 17-44 Year Old Female with

no signs of head injury

Code Designation

A17 XX

A13 XX Mosaic

Desired Granularity

Too Coarse

Coupled With OtherInformation

Too Fine

Terminology and the Information Model

Information Model

Terminology

?

Terminology and the Information Model

Information Model

Terminology

Terminology and structure must be coordinated to achieve consistency and an integrated whole in HL7 standards.

Active Application Work

• SNOMED CMWG

• HL7 Terminfo

LexGrid Collaborations

• NCI• LexBIG – LexGrid for caGRID

• National Center for Biomedical Ontology• LexBIO – LexGrid for NCBO

• Health Level Seven (HL7)• Tooling

• National Library of Medicine

• ISO JTC1/SC32 (NCITS-L8) - XMDR

Acnowledgements

This work was supported in part by a grant from the US National Library of Medicine: LM07319.