Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Post on 30-Dec-2015

33 views 0 download

Tags:

description

Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature. Michael Shepherd Web Information Filtering Lab Faculty of Computer Science Dalhousie University. Research Team. Students Qiufen Qiu (MD and MCS) Zhixin Chen (MHI and BSc) Computer Science Faculty Michael Shepherd - PowerPoint PPT Presentation

Transcript of Accessing Tacit Knowledge and Linking it to the Peer-Reviewed Literature

Accessing Tacit Knowledge and Linking it to the Peer-

Reviewed Literature

Michael ShepherdWeb Information Filtering LabFaculty of Computer Science

Dalhousie University

Research Team

• Students– Qiufen Qiu (MD and MCS)– Zhixin Chen (MHI and BSc)

• Computer Science Faculty– Michael Shepherd– Qigang Gao– Syed Sibte Raza Abidi

• Anaesthesia & Psychology– G. Allen Finley

Overview

• Introduction

• Research Program

• Results to Date

• Summary

Pediatric Pain Discussion List

• Clinical discussion on pediatric pain

• Informal email-based discussion among professionals

• Initiated in 1993

• Over 700 subscribers world-wide

• More than 10,000 messages

Date: Wed, 04 Jan 1995 16:54:48 -0500 (EST)From: posterSubject: opioids and meningitis

X is a 13 month (9.8kg) old boy suffering from acute meningitis (pneumocoque) treated with IV cefotaxime; at day three, I have been called as pediatric pain consultant to assess X; I have discovered an extreme painfull state: one could not handle or touch him without producing screaming. The child was unable to move spontaneously he looked paralysed by pain and hypertonia ; he also presented a neurological complication : ptosis at the right side.The pain treatment was IV acetaminophen. The first day I have prescribed IV Nalbuphine (weak opioid u antagonist and agonist) 11mg/24h after a loading dose of 1.4 mg; Pain at rest has been succesfully relieved but not the mobilisation pain; the dose has been increased at 14 mg/day wihout relieving the pain associated with moving; he has moved spontaneously limbs 2 days later; nalbuphine has been stopped 4 days later. Neurological examination and CT scan have been still normal (except ptosis) during this period. No opioid's side effects have been observed.

What do you think of this case ?Have you any experience with opioids and acute meningitis ?

Dr Poster, Pediatric pain unit, Poster Hospital

Date: Wed, 04 Jan 1995 17:27:25 -0500 (EST)From: first replySubject: re: opioids and meningitis

Is there any periosteal involvement? If so an NSAID (ibuprofen or naproxen) may be much more effective than even opioid.

-------------------------

Date: Wed, 04 Jan 1995 19:06:32 -0400From: second replySubject: Re: opioids and meningitis

Poster writes:> X is a 13 month (9.8kg) old boy suffering from acute meningitis...> extreme painfull state: one could not handle or touch him without> producing screaming....> The first day I have prescribed IV Nalbuphine ...> succesfully relieved but not the mobilisation pain;...> has moved spontaneously limbs 2 days later; nalbuphine has been stopped 4> days later. Neurological examination and CT scan have been still normal...

I have used IV morphine for similar severe meningitis pain, with success. I wouldn't hesitate to use a pure opioid agonist (in conjunction with acetaminophen, NSAID, and/or tricyclics). However, it sounds like you have the situation under control.

Second Reply, Associate Professor, Dept and University

-------------------------

Date: Thu, 05 Jan 1995 18:58:32 -0800 (PST)From: Third ReplySubject: Re: opioids and meningitis

I wonder if the problem is not due to severe arachnoiditis that is secondary to the inflammation. I would suggest a trial of steroids in this patient, perhaps in combination with a benzodiazepine to reduce the spasm. Narcotics may reduce the pain but I would not like to keep X on them for too long.

Good luck

Third Reply

-------------------------  

Tacit and Explicit Knowledge

• Tacit knowledge is what the knower knows and is derived from experience

• Explicit knowledge is represented by some artifact such as a document or journal article

Tacit Knowledge

Explicit Knowledge

Internalization

Co

mb

inatio

n

Socialization

Externalization

Knowledge Transformation Processes

Knowledge Transformation Processes

Socialization– Tacit to tacit– Face to face meetings– Synchronous

Externalizaton– Tacit to explicit– Respond to question– List-server discussion– Asynchronous

Internalization– Explicit to tacit– Access to organized explicit knowledge– Structure helps user internalize knowledge

Combination– Explicit to explicit– Organization into categories– Reflects knowledge of domain

Research Questions

1. ExternalizationHow can we capture the tacit knowledge in such a discussion list and transform it into explicit knowledge?

2. CombinationHow can we organize this explicit knowledge?

3. InternalizationHow do we provide access to this explicit knowledge so that users can internalize this knowledge?

4. Linking Tacit Knowledge to Best EvidenceHow do we map this transformed tacit knowledge to the appropriate best evidence literature?

Mapping Tacit Knowledge to Explicit Knowledge in Medical Literature

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h T

erm

ino

log

y M

ap

Thread Clusters

Externalization Combination LinkingInternalization

Access

Data Cleaning• Remove duplicate messages (subject & time stamp)

• Remove responses that were generated automatically by “vacation” mail programs

• Remove other “junk” e-mails

• Removing unnecessary content of the messages themselves. This unnecessary content included non-textual material such as images that would not be used in the clustering process and included original messages that were more than ten lines long as these would skew the clustering process.

• The initial stage of this cleaning was done manually until patterns were recognized and then programs were written to clean the data based on these patterns.

Externalization: Creating Threads

• Messages were threaded based on time stamps and subject headings.

• Those messages that had a blank subject field were processed based on the included original messages to which they had replied.

Thread Representation

• Each thread is treated as though it were a contiguous document

• The original messages that are embedded in the reply messages are removed.

• Stop words are removed• If not on the stop list, they are matched against a

synonym dictionary manually created by a pediatric pain specialist.

• The remaining terms are stemmed• The stemmed terms are assigned tf.idf weights

Data Set

• An archived sample of 6939 messages from 1993-1999

• After cleaning 4033 messages

• After threading 1289 threads

• Each thread is represented by a vector of 4111 term weights

term1 term2 term3 . . . term4111

thread1 w1,1 w1,2 . . . w1,4111

thread2 w2,1 w2,2 . . . w2,4111

.

.

.

thread1289 w1289,1 w1289,2 . . . w1289,4111

Thread-Term Matrix

Combination: Organizing the Threads

• Text clustering – unsupervised learning process – groups documents into clusters so that the documents

within a cluster have high similarity with one another, but are very dissimilar to the documents in the other clusters

• Text classification or categorization– supervised learning process– Assigns documents to pre-defined classes or

categories

k-means clustering with k=2

12

3

45

6 7

k-means clustering with k=2

12

34 56 7

k-means clustering with k=2

12 3

4

5

6

7

Evaluation of Clustering

• Performed a study in which 100 randomly selected threads were presented to two experts for clustering and to our clustering algorithm

• Results of clustering between the experts measured

• Results of clustering between the experts and the system measured

Cluster ID

Label Number of Threads

1 adverse effects of a treatment or medication, monitoring requirements 6

2 advice on medications or treatment technique for a particular CONDITION 33

3 announcement of a publication or event 9

4 assessment methods for a particular condition 2

5 availability and benefits of a nonstandard drug compound 3

6 availability and validation of a particular assessment tool 11

7 contact information or other information about a specific person 3

8 dosage or adverse effects or technique for a medication or other treatment 9

9 information on a condition: description, etiology, prognosis 4

10 job description, posting of job or fellowship 3

11 miscellaneous 3

12 other newsgroups and listservs 2

13 policies, guidelines, protocols, algorithms, quality assurance, supervision, competency

12

Clusters and labels created by expert 1 – a psychologist

Clusters and labels created by expert 2 – a medical doctor

Cluster ID Label Number of Threads

1 Assessment 10

2 Musculoskeletal 2

3 sedation & procedures 4

4 oral drugs 5

5 miscellaneous /irrelevant/out-of-date 21

6 neuropathic pain 9

7 regional analgesia 10

8 postoperative pain 5

9 intravenous opioids 11

10 psychology 5

11 visceral pain, bowel function, etc 2

12 topical analgesia, EMLA 2

13 everyday pain 1

14 NMDA antagonists, ketamine 1

15 resources 8

16 administration 3

17 burns 1

Inter-Rater Reliability

The Redundancy(X, Y) is the proportion of uncertainty about X that is removed by knowing Y

In this instance, X and Y represent the two sets of clusters generated by the experts.

The measure is asymmetrical and the calculated redundancy measures are:

R(Expert-1, Expert-2) = 0.51R(Expert-2, Expert-1) = 0.44

Evaluation of the Automatically Generated Clustering

• Assume each manually created cluster is correct

• Compare the manually created cluster against an automatically created cluster

• Recall – the proportion of those items in the manually created cluster that appear together in the same automatically generated cluster

• Precision – the proportion of those items in an automatically created cluster that appear together in the same manually created cluster

• F–measure = 2PR / (P+R)

Hierarchy – k=2

C-1,1

C-2,1

C-3,1 C-3,2 C-3,3 C-3,4

C-4,1 C-4,2

C-2,2

E-1

E-2

E-n

F-measure for a classification

The overall F-measure is used to reflect the quality of the whole hierarchy. The overall F-measure is the average weighted F-measure for all the clusters in a humanly generated clustering and is defined to be:

Overall F-measure = ∑ ( |T| * F(T)) / ∑ |T|

ST ST

TTFT ||/))(*|(|

ST ST

TTFT ||/))(*|(|

Evaluation of Clustering

• Each expert’s set of clusters was compared to the automatically generated hierarchical clustering. The hierarchy was generated ten times using different seed centroids for each run.

• The results of the paired-samples t tests (p=0.05) show that there was no significant difference between the two sets of manually generated clusters when used to evaluate the automatically generated clustering (k = 6).

E-1 E-2

k-means0.47 0.48

Evaluation of k-means Clustering

• We now have 3 different clusterings with inter-rater reliability of < .50

• k-means generated a large number of term representatives for each cluster with no elegant way of mapping the terms into MeSH.

• Therefore, the k-means clustering algorithm was replaced with a SOM in the expectation that the clustering results would be better and that a smaller set of term representatives for each cluster might be identified.

SOM – Self Organizing Maps

• Invented by Teuvo Kohonen • Provide a way of representing

multidimensional data in much lower dimensional spaces - usually one or two dimensions.

• Create a network that stores information in such a way that any topological relationships within the training set are maintained

Example 2-D Lattice of Nodes

Red = 240 Green = 89 Blue = 48

R G B

240 89 48

37 202 219

Mapping 3 Dimensional Colour Vectors Into 2 Dimensions

Notice that in addition to clustering the colours into distinct regions, regions of similar properties are usually found adjacent to each other.

SOM Neighbourhood Decreases

Mapping 3 Dimensional Colour Vectors Into 2 Dimensions

Notice that in addition to clustering the colours into distinct regions, regions of similar properties are usually found adjacent to each other.

term1 term2 term3 . . . term4111

thread1 w1,1 w1,2 . . . w1,4111

thread2 w2,1 w2,2 . . . w2,4111

.

.

.

thread1289 w1289,1 w1289,2 . . . w1289,4111

Thread-Term Matrix

Principal Component Analysis for Feature Length Reduction

PCA Vectors

Eig

en V

alu

es

SOM – Vector Length 150

Growing Hierarchical SOM

SOM Results

Method Features Map SizeNumber

Of Clusters

Best F-Measure(SOM)

Average F-Measure(k-means)

Expert1

Expert2

Expert1

Expert2

SOM 150 8*6 48 0.2968 0.4043

0.47 0.48GHSOM 5005 layers

2*253 0.4235 0.4466

SOM-k 150 10*5 13 0.2783 0.3896

Problems

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h T

erm

ino

log

y M

ap

Thread Clusters

Externalization Combination LinkingInternalization

Mapping Tacit Knowledge to Explicit Knowledge in Medical Literature

PPMLThread

CreationData

CleaningPubMed Articles

Mes

h a

nd

UM

LS

Threads

Externalization Combination LinkingInternalization

Access

Combination: Organizing the Threads

• Text clustering – unsupervised learning process – groups documents into clusters so that the documents

within a cluster have high similarity with one another, but are very dissimilar to the documents in the other clusters

• Text classification or categorization– supervised learning process– Assigns documents to pre-defined classes or

categories

MetaMap Transfer (MMTx)• Discovers UMLS Metathesaurus concepts in text

• Text is parsed into components including sentences, paragraphs, phrases, lexical elements and tokens. Produces a shallow syntacitc analysis with part-of-speech tagging.

• Variants are generated from the resulting phrases. Includes acronyms, abbreviations and synonyms.

• Candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against the phrases.

• The best of the candidates are organized into a final mapping in such a way as to best cover the text.

Metathesaurus CandidatesThe word "discharge" returns

Semantic Group: Anatomy Discharge, Body Substance (C0012621) - [Body Substance] Discharge, Body Substance, Sample (C0600083) - [Body

Substance]Semantic Group: Procedures Patient Discharge (C0030685) - [Health Care Activity]

from the UMLS Knowledge Server

Metathesaurus Candidates"He is to be discharged home..."

Phrase: "discharged"Meta Candidates (3) 966 C0030685:Discharge <1> (Patient Discharge) [Health Care Activity] 966 C0600083:Discharge <3> (Discharge, Body Substance, Sample)

[Body Substance] 966 C0012621:Discharge, NOS (Discharge, Body Substance) [Body

Substance]

Phrase: "home"Meta Candidates (3) 1000 C0442517:Home [Manufactured Object] 928 C0237154:homeless <1> (Homelessness) [Finding] 928 C0019863:homeless <2> (Homeless persons) [Population Group]

MMTx Scores

MeSH Concept Number

MeSH Concept Term

UMLS Semantic Type

Using the MMTx Results

• The MMTx results were used in three different ways:– Organize the PPML threads according to the

UMLS Semantic Groups -134 semantic types in 15 semantic groups

– Organize the PPML threads according to the MeSH Hierarchy – 15 MeSH trees

– Select terms that can be used as queries to PubMed

Organization by UMLS Semantic Group

Semantic Groups Semantic Types Terms Retained

Activities & Behaviors ACTI Activity, Behavior, Event, Machine Activity … NO

Anatomy ANAT Anatomical structure, Body location … YES

Chemicals & Drugs CHEM Amino Acid, Antibiotic, Chemical … YES

Concepts & Ideas CONC Classification, Concept Entity … NO

Devices DEVI Medical Device, Research Device … NO

Disorders DISO Acquired Abnormality, Disease … YES

Genes & Molecular Sequence GENEAmino Acid Sequence, Gene or Genome,

Molecular Sequence …NO

Geographic Areas GEOG Geographic Area NO

Living Beings LIVB Age group, Alga, Animal … NO (except age group)

Objects OBJC Entity, Food, Manufactured Object … NO

Occupations OCCU Biomedical Occupation … NO

Organization ORGA Organization, Professional Society … NO

Phenomena PHEN Biologic Function, Test Result … NO

Physiology PHYS Cell Function, Clinical Attribute … NO

Procedures PROC Diagnostic procedure … NO

Organization by MeSH Tree

• There are 15 MeSH trees

• It was determined to keep only two trees:– The C tree (Diseases) as the PPML largely

deals with disorders and diseases– The D tree (Chemicals and Drugs) as the

PPML contains discussions on drugs hence it was deemed important to retain drug-related terminology.

Filtering to Generate PubMed Queries

• Filtering approach operates at the semantic/conceptual level as opposed to the term level

• UMLS semantic types associated with each MeSH term are used as the basis for term filtering

• Working at the semantic level we can – Establish a medical context for the thread which can assist in

subsequent search for corresponding literature; – Characterize the entirety of medical terms into a small number of

medical concepts – Design filtering rules that apply to broad semantic types as opposed to

focused individual terms

Filtering UMLS Concepts Associated with MeSH Terms Found in Subject Line

If mapping score = 1000 then retain the MeSH term.

If semantic type = Age group (T100) then retain the MeSH term.

If semantic group = CHEM | DISO | ANAT AND (mapping score > 800) then retain the MeSH term.

If semantic type = Diagnostic Function (T060) | Therapeutic or Preventive Procedure (T060) | Laboratory or Test Result (T034) AND (mapping score > 800) then retain the MeSH term.

Generating a PubMed Query

Concept Name

ScoreSemantic

GroupSemantic Type Retain

Year 694 CONC Temporal Concept No

Old 861 CONC Temporal Concept No

Feline osteogenesisimperfecta

1000 DISO Disease or Syndrome Yes

Adolescent 694 LIVB Age Group Yes

Osteoporosis 861 DISO Disease or Syndrome Yes

Query Terms: Feline osteogenesis imperfecta, Adolescent, Osteoporosis

Summary

• We have various hierarchical organizations of the PPML threads that can be browsed by the user

• We have linked the PPML to the best-evidence literature via PubMed

Knowledge Transformation Processes

Socialization– Tacit to tacit– Face to face meetings– Synchronous

Externalizaton– Tacit to explicit– Respond to question– List-server discussion– Asynchronous

Internalization– Explicit to tacit– Access to organized explicit knowledge– Structure helps user internalize knowledge

Combination– Explicit to explicit– Organization into categories– Reflects knowledge of domain

Future Research

• Improve the filters

• Link from medical literature to PPML

• Evaluate the overall system with respect to the users:– Is it useful?– Is it helpful?– Does it improve outcomes?

Thank You

Web Information Filtering Lab

http://www.cs.dal.ca/wifl/

Closeness of Document Vectors

information science

Doc0 0 0

Doc1 0 1

Doc2 1 0

Doc3 1 1

(0,0)

science

information

θ cos θ

cos 0o = 1

cos 90o = 0Doc2

Doc1

(1,1)

Doc3