Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic

7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic

1/14

Expert Systems With Applications 44 (2016) 386399

Contents lists available atScienceDirect

Expert Systems With Applications

journal homepage:www.elsevier.com/locate/eswa

Evaluation of semantic similarity metrics applied to the automaticretrieval of medical documents: An UMLS approach

Israel Alonso,David Contreras

Department of Telematics and Computer Science, Comillas Pontifical University, C/ Alberto Aguilera, 25, 28015 Madrid, Spain

a r t i c l e i n f o

Keywords:

Semantic similarity

Information retrieval

Electronic Health Record

UMLS

a b s t r a c t

One promise of current information retrieval systems is the capability to identify risk groups for certain dis-

eases and pathologies based on the automatic analysis of vast amounts of Electronic Medical Records repos-itories. However, the complexity and the degree of specialization of the language used by the experts in this

context, make this task both challenging and complex. In this work, we introduce a novel experimental study

to evaluate the performance of the two semantic similarity metrics (Pathand Intrinsic IC-Path, both widely

accepted in the literature) in a real-life information retrieval situation. In order to achieve this goal and due

to the lack of methodologies for this context in the literature, we propose a straightforward information re-

trieval system for the biomedical field based on the UMLSMetathesaurus and on semantic similarity metrics.

In contrast with previous studies which focus on testbeds with limited and controlled sets of concepts, we

use a large amount of information (101,712 medical documents extracted from TREC Medical Records Track

2011). Our results show that in real-life cases, both metrics display similar performance, Path (F-Measure

=0.430) e Intrinsic IC-Path(F-Measure =0.427). Thereby we suggest that the use ofIntrinsic IC-Pathis not

justified in real scenarios.

2015 Elsevier Ltd. All rights reserved.

1. Introduction

The exponential growth, in recent times, of the amount of

biomedical information that is stored on purely electronic supports

Electronic Health Records, or EHR, spring promptly to our mind

has turned them into an element of undeniable relevance to the

most diverse fields of scientific research (Hoffman, 2010; Prokosch, &

Ganslandt, 2009).

One of these fields is that of Information Retrieval, and its tradi-

tional challenge of identifying those records which most efficiently

answer a users immediate needs for information; for this task

to be accomplished, it is critical to first establish a recognition of

patterns in medical histories which would permit, ultimately, the

early detection of epidemic outbreaks, the prevention of disease, or

the identification of cohort groups (Roque, et al., 2011). The maindifficulty in undertaking this task arises from Natural Language

Processing, as natural language is not only complex, but also highly

context-sensitive. In a broad field such as that of the English lan-

guage, for instance, it becomes necessary draw upon resources and

ontologies like WordNet to aid representation (Fellbaum, 1998).

Corresponding author. Tel.: +34 915422800; fax: +34 91 559 65 69.

E-mail addresses: [email protected] (I. Alonso), [email protected]

(D. Contreras).

Unfortunately, these tools are of limited use to more specializeddisciplines, such as that of biomedicine, whose technical jargon

is often as complex as it is ambiguous; the parsing of biomedical

information calls for very specific terminology (Friedman, Kra, &

Rzhetsky, 2002) and, hence, for new search strategies, designed

from the outset to the particular demands of this branch of science

(Alpi, 2005). In such cases, one must resort to specialist resources

dictionaries and thesauri like UMLS (McCray et al., 1993) to give a

semantic value to relevant information.

Our present work aims to bridge this gap, helping the information

retrieval systems based on Electronic Health Records, according to

their semantic content; in a nutshell, being able to interpret the

information needs of any given query, and consequently select those

medical documents most relevant in terms of semantic proximity.

An endeavor which is, we believe, much needed for the correctidentification of patients in cohort studies, given the complexity,

variability, and lack of structure in the information traditionally

contained in such records. This will require, to define and represent,

through biomedical concepts, the information contained in both

health records and medical queries, in order to establish the seman-

tic proximity between them. The use, in this fashion, of semantic

relationships between said concepts, closely emulates the analogous

process in the human mind to establish similarity between two

given terms (Miller, & Charles, 1991; Rubenstein, & Goodenough,

1965). It should be pointed out beforehand that previous works have

http://dx.doi.org/10.1016/j.eswa.2015.09.028

0957-4174/ 2015 E lsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2015.09.028http://www.sciencedirect.com/http://www.elsevier.com/locate/eswamailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2015.09.028http://dx.doi.org/10.1016/j.eswa.2015.09.028mailto:[email protected]:[email protected]://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.09.028&domain=pdfhttp://www.elsevier.com/locate/eswahttp://www.sciencedirect.com/http://dx.doi.org/10.1016/j.eswa.2015.09.028


2/14

I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 387

shown interest in establishing metrics for determining the degree of

semantic similarity between two terms (Collins, & Loftus, 1975) in

a more general context like the English language, and based on the

WordNet infrastructure (Meng, Huang, & Gu, 2013). Unfortunately,

however, these approaches not always yield satisfactory results when

they are applied in the biomedical domain, since WordNets coverage

of this domain is rather limited. (Burgun, & Bodenreider, 2001).

Later works have attempted to solve this by incorporating specific

resources and ontologies (MeSH, and SNOMED CT) in the study ofsimilarity metrics in the field of biomedicine, always in a theoretical

context and a controlled environment. (Al-Mubaid & Nguyen, 2006;

Batet, Snchez, & Valls, 2011; Caviedes, & Cimino, 2004; Nguyen, &

Al-Mubaid, 2006; Pedersen, Pakhomov, Patwardhan, & Chute, 2007).

These works prove it becomes necessary to resort to a specialised

infrastructure namely UMLS if we are to determine the similarity

existing between two concepts in the field of biomedicine, with the

degree of precision that a human expert would expect to achieve.

In this work we propose an experimental study to evaluate the

performance of the two semantic similarity metrics (PathandIntrin-

sic IC-Path, both widely accepted in the literature) in a real-life in-

formation retrieval context. Moreover, to perform this assessment,

we deploy a straightforward information retrieval system for the

biomedical field based on the UMLS Metathesaurus and on semantic

similarity metrics, due to the lack of methodologies for this context

in the literature.

Our paper will be structured as follows:

InSection 2,we will describe the main components and charac-

teristics of UMLS. In Section 3, we offer an outline of the current state

of the art, focusing on different tools and strategies used nowadays

in the retrieval of biomedical information, as well as the metrics used

in calculating the semantic similarity between two concepts in this

particular field. Then, inSection 4,we will define our proposal, along

with the materials used in our work. InSection 5,we conduct a study

of the inner workings of the different sources and relationships con-

tained in UMLS, and how they are reflected in the results obtained by

semantic similarity metrics in a purely theoretical context; we will

later use, as our reference, the two main metrics based on the ap-

proaches Intrinsic ICandPath findingfor their study and their appli-cation to a real-life context;Section 6will describe the procedures

involved in our proposal for an ad-hoc and straightforward concept-

based medical document retrieval system, and evaluate the efficacy

of the two main semantic similarity metrics when applied to a real-

life context (reflected inTREC 2011).Section 7covers the analysis and

interpretation of the results obtained. Last, Section 8will comment

on the conclusions derived from all conducted tests, as well as the

contributions obtained from their results, and the future lines of re-

search that would give continuity to our work.

2. UMLS

UMLS1 (Unified Medical Language System) is an ongoing project

started in 1986 by the National Library of Medicine. It was envi-sioned as a common environment for the access and treatment of

biomedical information (Bodenreider, 2004; Humphreys, Lindberg,

Schoolman, & Barnett, 1998; Lindberg, Humphreys, & McCray, 1993).

To this end, it structures said information as a series of concepts, with

a setrelationship between them. At itscore,UMLS is made up of three

components, all of which undergo regular updates and revision: a

Metathesaurus, a Semantic Network, and a Specialist Lexicon (lexical

information and tools for natural language processing). Of these ele-

ments, the Metathesaurus and the Semantic Network are of particular

interest to our work: the former for its contained concepts, sources

and relationships, and the latter for its offer of semantic types.

1

http://www.nlm.nih.gov/research/umls/.

Table 1

Representation structure of UMLS concept C0018787.

CUI LUI SUI AUI Source String

C0 018787 L0 018787 S0 047194 A0 06 636 8 M eSH Heart

C0 018787 L0 018787 S0 047194 A16757661 NCI Heart

C0018787 L0018787 S0047194 A2882201 S NOMED Heart

C0 018787 L0 018787 S03759 48 A16766 657 NCI H EART

C0 018787 L0018787 S0419735 A0480532 CSP heart

C0 018787 L0 018787 S0 419735 A18628913 C HV heart

C0 018787 L024 8647 S0324326 A1280280 6 NCI C ardiacC0 018787 L024 8647 S134 4787 A1304355 CSP Cardiac

C0 018787 L024 8647 S134 4787 A186 47556 C HV Cardiac

The Metathesaurus is, in essence, a vast multipurpose and multi-

language database covering more than one million concepts, all of

them represented under a common framework, and stored in over a

hundred different sources. Said sources are grouped in several dis-

tinct perspectives of the biomedical environment, such as scientific

information (MeSH-CRISP), clinical terminology (SNOMED-CT), ad-

ministrative terminology (ICD-9-CM, CPT-4), or data exchange (HL7,

LOINC), as well as general or specific thesauri including anatomy

(UWDA, NeuroNames ), drugs (RxNorm, First Data Bank), medical de-

vices (UMD, SPN), nursing (NIC, NOC, NANDA), oncology (PDG), ad-verse reactions (COSTART, WHO) or gene products (Gene Ontology-

GO), to name a few.

The data compiled in these various sources is organized in the

Metathesaurus following a unique identifier structure, with a hier-

archy of four significance levels: Concepts, Terms, Strings, and Atoms.

In this order:

CUI (Concept Unique Identifier): Each concept represents a dis-

tinct meaning, which encompasses, within a unique code, all its

synonym terms. LUI (Lexical Unique Identifier): Identifies each of the known lexi-

cal variations or terms for any given concept. SUI (String Unique Identifier): Represents each descriptive string

associated to a given term. One of them is designated as its name,or preferred term. All predicted variations in the character se-

quence of the string (upper and lower case, punctuation) are cov-

ered in separate identifiers. AUI (Atom Unique Identifier): correspond to each individual oc-

currence of a given string in a specific source.

Hence, for instance, the concept (C0018787), which represents the

muscle organ that keeps blood circulation going, is grouped into a

number of descriptive strings, of which we now show a few for the

sake of the example. (Table 1).

We must keep in mind that a givendescriptive string (SUI),may be

referenced in oneor manyconceptidentifiers (CUIs). For example, the

string Heart, identifies the preferred term for concept (C0018787),

but it is also oneof the synonym terms forconcept (C1281570) Entire

heart. We will now show the series of descriptive strings for both

these concepts, as well as the semantic type they belong to:

CUI: C0018787

SUI (Prefered term): Heart

Other SUIs (string terms): Hearts; Cardiac; coronary; cardiac struc-

ture; heart structure; structure of heart, unspecified; corazn; es-

tructura cardiaca; Cuore; herzen; Hart; etc.

Semantic Type: (bpoc) - Body Part, Organ, or Organ Component.

CUI: C1281570

SUI (Prefered term): Entire heart

Other SUIs (string terms): Heart; Entire heart (body structure);

corazn; etc.

Semantic Type: (bpoc) - Body Part, Organ, or Organ Component.
http://www.nlm.nih.gov/research/umls/http://www.nlm.nih.gov/research/umls/http://www.nlm.nih.gov/research/umls/


3/14

388 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399

The Metathesaurus includes different kinds of relationships

between concepts, whether they are found in the same source

(intra-source) or in two different ones (inter-source). These relation-

ships may be hierarchical and non-hierarchical (Burgun, & Bodenrei-

der, 2001). Hierarchical relationships cover either direct synonymical

relations of the Parent/Child (PAR/CHD) type or indirect ones

of the Broader/Narrower (RB/RN) type. In turn, non-hierarchical re-

lationships may belong to the Siblings (SIB), Other (RO), Similar (RL),

Source Asserted Synonymy (SY), Possible Synonymy (RQ), AllowedQualifier (AQ), or Can Be Qualifier (QB) types. One advantage of the

UMLS Metathesaurus lies in that it comprehends all known biomed-

ical sources, which can then be used as a whole or independently.

As for the Semantic Network, it categorizes all concepts in UMLS

into 133 semantic types, between which 54 relationships are estab-

lished. This is done through a tree structure, stemming from two

main hierarchies:Entityand Event(Bodenreider, 2001; Bodenreider,

& McCray, 2003; Erdogan, Erdem, & Bodenreider, 2010). Each con-

cept (CUI) in the Metathesaurus belongs to, at least, one semantic

type; the deeper in the structure these types lie (that is, the closer

to the tree leaves), the more specific they will be. Thus, the concept

(C0497327), associated to the termDementia, belongs to theseman-

tic type Mental or Behavioral Dysfunction (mobd), which in turn is

encompassed by the semantic group Disorders(DISO), which at last

is contained in the hierarchy Event.

The representation hence achieved of the biomedical knowledge

contained in UMLS is the basis for itsapplicationin a variety of strate-

gies and tools for natural language processing, and the computation

of semantic similarity between concepts.

3. Related work

Information Retrieval Systems (IRS) based on natural language

processing has been thoroughly studied in the context of biomedi-

cal literature and, more recently, in that of clinical documentation.

This interdisciplinary research field, dubbed biomedical informatics

(Jiang et al., 2013), is among the fastest-growing in recent years.

3.1. Information Retrieval Systems (IRS) in the biomedical environment

IRS attempt to solve the information needs that users set out

through queries. These queries contain, intrinsically, their search ob-

jectives, whose sensitivity proves to be a crucial factor in the devel-

opment of search algorithms and the retrieval of their results (Rose, &

Levinson, 2004). Themain difficulty in reaching said results is thelack

of precision in the queries themselves a problem only made worse

by the inherent complexity of the language, and by the kind of infor-

mation at hand. To address this problem, by completing and improv-

ing the information presented in the query, the application of query

expansion techniques was developed (Efthimiadis, 1996). These tech-

niques, widely used in IRS to improve performance, focus on the ad-

dition of new terms to the original query to narrowdown results. In a

broad categorization, we could separate query expansion techniquesinto two general approaches.

A first group of techniques is established on the analysis of vast

collections of documents, for their grouping through co-occurring

vectors (Xu, Zhu, Zhang, Hu, & Song, 2006; Zhu, Wu, Carterette, & Liu,

2014) and probabilistic models (Qi, & Laquerre, 2012) of the most rel-

evant terms. More recent works employ semantic distribution mod-

els on linguistic elementsin said collections, in order to automatically

extract synonyms and abbreviations (Henriksson, Moen, Skeppstedt,

Daudaravicius, & Duneld, 2014; Zeng, Redd, Rindflesch, & Nebeker,

2012). Lastly, otherworks in this groupanalyzethe applicationof spe-

cific semantic similarity metrics on structures defined beforehand as

containing the elements to be evaluated such as the Vector Space

Model (Turney & Pantel, 2010)and the comparison of histogram dis-

tance orcross-bin distances(Kurtz, Beaulieu, Napel, & Rubin, 2014).

A second group would instead focus on the use and analysis of

structures based on existing knowledge of the field of biomedicine,

such as UMLS. The use of these resources calls for the disambiguation

of the original querys terms, so that they point at unique concepts

within the ontology (Bhogal, Macfarlane, & Smith, 2007; Voorhees,

1994). In this manner, those concepts which arerelatedto theoriginal

search terms would be used to expand the query.

One tool that, alongside the UMLS Metathesaurus, allows us to

identify the concepts that are referred to in a given text, is Metamap(Aronson, 2001; Aronson, & Lang, 2010). This tool gives us the foun-

dation for the development of various query expansion techniques

(Aronson, & Rindflesch, 1997), through the exploitation of the seman-

tic relations contained in UMLS. In this approach, different efforts use

defined UMLS structures to develop solutions stemming from, for in-

stance: the representation of texts from clinical documents via se-

mantic graphs, based on concepts and relationships (Plaza, & Daz,

2010); query expansion through random walks based on the UMLS

structure (Martinez, Otegi, Soroa, & Agirre, 2014); query expansion

through the creation of an ontology of the query itself, associated to

closely related concepts(Babashzadeh, Huang, & Daoud, 2013); the

use of relationships between concepts, to reflect the semantic dis-

tance between patients from stored information (Melton et al., 2006).

As a direct consequence of the need to improve exploitation tech-

niques of the semantics of various biomedical sources, several works

arise which focused on evaluating the semantic similarity between

any given concepts in the field of biomedicine.

3.2. Semantic similarity

Over time, a large variety of metrics have been defined, analyzed,

and implemented, for the computation of semantic similarity be-

tween concepts contained in biomedical sources such as SNOMED-

CT, MeSH, UMLS, etc. These metrics can be categorized according to

two major strategies:

Based on the estimation of the semantic similarity between two

terms, on account of the distance between the links relating them

within the ontology (Path finding). Based on the semantic similarity between two concepts according

to the information they contain (Information Content).

3.2.1. Path finding similarity measures on taxonomical structure

These metrics attempt to measure semantic information across

concepts, based on the hierarchical relationships defined between

them in biomedical sources. The most important among these will

now be explained:

The first metric defines the semantic similarity (sim) between two

concepts as the shortest path between them (sp), according to their

interrelationships. This metric, known as Concept Distance(CDist), is

defined by Rada, Mili, Bicknell, and Blettner (1989) as the number

of nodes in the shortest path between two concepts, c1 andc2, and

is applied with (RB/RN) relationships on MeSH vocabulary.Caviedes,

& Cimino (2004) later evaluate it with (PAR/CHD) relationships on

MeSH, SNMI, ICD9-CM resources in the field of biomedicine.

simCDist(c1, c2) = sp(c1, c2)wheresp is theshortest pathbetween c1, c2

(1)

A later variation, called Path Measure or Path Length (Path), was

defined byPedersen et al. (2007)and applied from is-a type rela-

tionships in SNOMED-CT. This variation corresponds to the inverse

of the distance between two concepts (CDist), hence normalising the

similarity result to a value ranging from 0 to 1.

simPath(c1, c2) = 1/sp(c1, c2) (2)

Later metrics introduce certain characteristics associated to the

structure of taxonomy which had not been explored before, such as

its depth or size, or thelocation of different concepts within it (Fig. 1).


4/14


Fig. 1. Example of hierarchical relationships between concepts in UMLS Metathesaurus. The termsdepth, path, andLCS, are represented.

For instance, Leacock and Chodorow (1998) (lch) consider the

shortest path (sp) between two concepts (c1, c2), scaling it logarith-

mically to the total depth of the taxonomy (D). Thus, the deeper the

taxonomy (that is, the more complex and thorough), the larger the

relative value of semantic proximity between two terms would be. A

proposal for the normalization of this metric to the unit interval can

be found inGarla and Brandt (2012).

simlch(c1, c2) = 1 log (sp)/ log2D (3)

Other approaches introduce a new element in the hierarchy of

both concepts, corresponding to their closest common ancestor. The

depth of both concepts will be established according to the depth of

theirLeast Common Ancestor (LCA), also known asLeast Common Sub-sumer (LCS)(Fig. 1).

Wu and Palmer (1994) (wup) apply a measurement of the similar-

ity between two concepts obtain by scaling the depth (depth) of their

Least Common Ancestor(LCS) to the depth of each of the two concepts

from the root of the taxonomy by way of their LCS.Garla and Brandt

(2012) introduces a change including theshortestpath (sp) inthedef-

inition, to avoid the case (c1 = c2), which would result in simwup(c1,

c2)0and>0are contribution factors of two features andkisa constant.

3.2.2. Information Content (IC) similarity measures

The following approaches to calculating semantic similarity are

based on Shannons Information Theory (Shannon, 2001), by which

2

http://search.cpan.org/dist/UMLS-Similarity/.
http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/


5/14


the similarity between two concepts must be measured according to

the amount of information content they provide

Information Content (IC) may be obtained from the distribution

of a concept within a text corpus alongside a taxonomy (Corpus IC)

(Resnik, 1995), or from the structure of a taxonomy alone (Intrinsic

IC) (Snchez, Batet, & Isern, 2011; Seco, Veale, & Hayes, 2004; Zhou,

Wang, & Gu, 2008).

TheCorpus IC of any concept c is defined as the inverse of the

log of the concepts frequency, whereas this frequency is the proba-bility of said concept occurring a number of times in a given corpus

C, q(c,C).The number of times that the children concepts (cs) of

the first appear within the said corpus also adds up, so that the more

frequently the concept occurs, the lesser its information content will

be(Resnik, 1995).

ICCorpus(c) = log (f q(c))

f q(c) = f q(c,C) +

cs children(c)

f q(cs) (8)

On the other hand, the Intrinsic ICof a concept c is a proposal

also defined in various works(Seco et al., 2004; Zhou et al., 2008)

which has however been adapted to the biomedical context (Batet

et al., 2011). The Intrinsic IC of a concept c is defined as the ratioof the number of its terminal concepts ( leaves(c)) to its associated

ancestors (subsumers(c))(Snchez & Batet, 2011). This ratio is then

normalized to the interval [0, 1] by the total number of leaves in the

taxonomy (max_leaves). Thus, the more terminal elements a concept

has relative to its number of ancestors, the lesser its information con-

tent will be.

ICIntrinsic(c) = log

leaves(c)subsumers(c)

+ 1

max_leaves + 1

(9)

These Information Content approaches lead to the semantic simi-

larity measures defined byLin (1998) andlJiang and Conrath (1997).

Lins definition proposes a ratio of the common information content

of a given pair of concepts IC(LCS(c1, c2))to the information contentthat describes each concept separately IC(ci). In this approach, the

higher the IC value of the LCS (that is, the more specific), the greater

the similarity between the concepts thus compared.

simLin(c1, c2) =2 IC(LCS(c1, c2))

IC(c1) + IC(c2) (10)

Jian and Conrath, in turn, propose an analogousmeasure (opposite

to similarity) based on the distance between to concepts, which is

evaluated as the difference between the information content of the

two conceptsIC(ci)and that of their common ancestor IC(LCS(c1,c2)).

DistJC(c1, c2) = IC(c1) + IC(c2) 2 IC(LCS(c1, c2)) (11)

Metrics based on the Path findingapproach have been redefined

(Batet et al., 2011) in terms of Information Content, and implemented

for their evaluation(Garla, & Brandt, 2012). To this end, the shortest

path (sp) between two concepts is redefined as the semantic distance

(as proposed by ), and maximum depth as the maximum Information

Content of any concept (icmax).

All told, the metric (lch) based on Intrinsic IC (Intrinsic IC-lch) is

redefined as:

simIntrinsic

IClch

(c1, c2) = 1 (log(DistJC(c1, c2) + 1))

log(2 icmax + 1) (12)

and the metric (Path) based on Intrinsic IC (Intrinsic IC-Path) as:

simIntrinsic

IClch

(c1, c2) =1

DistJC(c1, c2) + 1 (13)

These metrics have been evaluated in various works and on dif-

ferent test benchmarks (Batet et al., 2011; Garla, & Brandt, 2012;

Pedersen et al., 2007). Said works reveal a betterperformance of met-

rics based on Intrinsic IC over those based on Path finding.

4. Proposal and materials

As described in the previous section, metrics based on Intrin-

sic IC perform better than those based on Path Finding (Batet

et al., 2011; Garla, & Brandt, 2012) working on testbeds with limited

and controlled sets. For this reason, our experimental study will fo-

cus on assessing the performance, in a real-life context, of the Intrin-

sic IC-Path metric itself, andon thesimplestof distance-based metrics

(for its lower computational cost), Path. In order to perform this as-

sessment, we have deployed an information retrieval system for the

biomedical field based on the UMLS Metathesaurus and on semantic

similarity metrics.

As we covered in the previous section, some earlier works focused

on defining retrieval systems and language processing supported bythe UMLS resource(McCray et al., 1993) and others on the applica-

tion of semantic similarity metrics on defined structures, indepen-

dent from UMLS, such as thecomparisonof histogram distance (Kurtz

et al., 2014). None of these works integrate the use of the UMLS re-

source with semantic similarity metrics into information retrieval

systems for the context of biomedical information.

Later on, in Section 5, we will assess the performance of the UMLS

Metathesaurus at calculating the semantic similarity between con-

cepts from a theoretical perspective. To this end, we will use previous

works as reference (Batet et al., 2011; Garla, & Brandt, 2012; McInnes

et al., 2009; Pedersen et al., 2007), and compare their results with

the ones attained in our own work, in order to validate our frame-

work. Said works evaluate the semantic similarity of several lists of

paired concepts, using different metrics, and compare the results to

those proposed by a team of medical coders and physicians. We will

also analyze the impact, in those results, of using different versions

of UMLS and new types of relationships. Lastly, we will highlight the

great diversity appreciable in the results of previous studies, which is

due to the lack of a single correlation coefficient.

InSection 6, we will analyze the results of the Path and Intrin-

sic IC-Path metrics in a real-life information retrieval context based

on semantic similarity. In this part of our paper, we will use the test

dataset from the 2011 Text Retrieval Conference (TREC) (Voorhees &

Tong, 2011). This test dataset is made up of three elements: a cor-

pus of 101,712 de-identified documents or health records, compris-

ing 17,265 visits or medical episodes of various patients (each visit

canhave between 1 and 415 documents or reports);35 queries repre-

senting information needs or inclusion criteria that must be fulfilled

by theretrieval of themost relevantvisits or episodes;and lastly, a se-ries of relevance judgements defined by a team of experts, in which

each individual visit is deemed relevant or not relevant according

to the information needs of each search query in real-life context.

For the development of the solution proposed in this paper,

we have used a range of different tools: the UMLS Metathesaurus,

2010AB and 2011AB, as base for medical knowledge; Metamap20133

for concept-based representation this version of Metamap allows

for the identification of negative statements, and the classification

of concepts for any semantic type they may possess; and two open-

source tools for the semantic similarity computation between con-

cepts the first(McInnes et al., 2009)is a framework composed of

3

http://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdf.
http://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdfhttp://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdfhttp://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdf


6/14


Table 2

Semantic similarity correlation values using the Path metric, with PAR/CHD rela-

tionships, for SNOMED-CT (UMLS versions 2008AB and 2010AB). Spearman correla-

tions based on minimum rank values (results reproduced from Pedersen and McInnes

(McInnes et al., 2009; Pedersen et al., 2007) forversion2008AB) andon average values

(used in this work).

S NO MED- CT 2008AB S NO MED- CT 2010AB

Minimum values Average values Minimum values Average values

Physicians 0.3500 0.3170 0.3134 0.2744Coders 0.5000 0.4500 0.4596 0.4160

two packages (UMLS-Similarity4 y UMSL-Interface5)based on PERL

modules available in CPAN (The Comprehensive Perl Archive Net-

work), and the second (Garla & Brandt, 2012) one of the components

of the Ytex6 framework.

Finally, it is worth noting that health records processed by the

system are a series of XML files. Although each document is hence

structured in XML language, the label containing the most important

information (the document itself) is written as natural language

5. Evaluation of UMLS

Firstly, wewill analyze theexisting characteristics of the main tool

used in this work the UMLS Metathesaurus that can effect an im-

provement in semantic similarity computation. These would be the

evaluation of sources, and the existing types of relationship between

concepts, present in the different UMLS versions used.

5.1. Versions of UMLS Metathesaurus

UMLS compiles the knowledge of the biomedical domain, and is

thus undergoing constant evolution and improvement. Small changes

between versions, affecting concepts or relationships, can have a no-

ticeable impact in the results obtained in a tightly-defined context,

such as Pedersens benchmark (29 pairs of concepts) (McInnes et al.,

2009; Pedersen et al., 2007).To reflect this, we have reproduced the results obtained by

Pedersen et al. (2007) and McInnes et al. (2009) on the source

SNOMED-CT of UMLS version 2008AB (used in their work), compared

to those of version 2010AB(Table 2).

This table shows Spearmans rank correlation coefficients for the

Path metric, compared to the estimates of physicians and medical

coders. On one side, we show the correlation coefficient results based

on theminimum rank values, forgroups of similarity values with rep-

etition (as used in(McInnes et al., 2009; Pedersen et al., 2007)); on

the other, the correlation results based on the average values of said

rank (employed in this paper, as we consider it to be the most ade-

quate approach in this context).

As we can see, the obtained correlations (using Spearmans co-

efficient) vary significantly (from 6% to 13%) between versions ofSNOMED-CT. These results show some refinement to the relation-

ships between concepts in version 2010AB, which leads to lower val-

ues(similarityrelationships found in theearlier version arenot found

anymore). We must, then, bear this in mind when we compare the

results of different studies, since many of them may be comparing

the performance of metrics run on different versions of the UMLS

Metathesaurus.

In these results, and in others obtained throughout this work, we

can observe that the metrics are better adjusted to the similarity cri-

teria defined by medical coders than to those set by physicians.

4 http://search.cpan.org/dist/UMLS-Similarity/.5 http://search.cpan.org/dist/UMLS-Interface/.6

https://code.google.com/p/ytex/.

Table 3

Semantic similarity correlation values, using Spearman and Pearson, for a

number of metrics based on Path findingwith PAR/CHD relationships,

for SNOMED-CT with UMLS 2010AB.

Path lch wup nam

Spearman Physicians 0.2744 0.2744 0.3377 0.4063

Coders 0.4160 0.4156 0.4190 0.5578

Pearson Physicians 0.5451 0 .3348 0.3372 0.4301

Coders 0.7170 0 .4566 0.3840 0.4456

5.2. Impact of correlations used

Analyzing the results obtained, and comparing them with the re-

sults of previous works, we observe a lack of a standard criterion

for the coefficient used. For instance, some works used Spearmans

correlation coefficient (Garla, & Brandt, 2012; Pedersen et al., 2007)

while others use Pearsons linear coefficient (Batet et al., 2011). This

took us to the study and interpretation of both kinds of correlations,

for the analysis of various semantic similarity metrics. Pedersen him-

self, who uses Spearmans coefficient in his results ( Pedersen et al.,

2007), points to a maximum Pearson correlation of 0.85 between the

estimates of the evaluating experts (medical coders and physicians).

For this, and for the sake of a better interpretation, we calculatesimilarity for the 29 paired concepts (McInnes et al., 2009)with the

main metrics based on Pathfinding, andobserve that there is signif-

icant variation in the results depending on the correlation coefficient

used (Table 3).

As in previous works (McInnes et al., 2009; Nguyen, & Al-Mubaid,

2006), the nam metric (Nguyen & Al-Mubaid), applied to SNOMED-

CT sources, reflects better correlation values for the Spearman coeffi-

cient. Pearsons correlation, however, offers better results for the Path

metric(Table 3).

Far from joining the discussion over the kind of correlation that

should be used (Pearson correlates similarity values, while Spearman

correlates their order), our study reveals that the results of various

works are simply not comparable with each other, as was already

pointed out byGarla and Brandt (2012). For this reason, and to fur-ther clarify the matter, we now show the results obtained with both

correlation coefficients.

5.3. Study of UMLS relationships and resources

The semantic similarity calculations in previous works defined by

Pedersen et al. (2007) andMcInnes et al. (2009) were done through

direct hierarchical relationships (PAR-CHD), defined also as type

is-a semantic relationships, on a single source. Later works, such

asGarla and Brandt (2012) andBatet et al. (2011), do not specify the

kind of relationships used in the calculation of semantic similarity, so

it is not possible to determine the implications of their results.

For this reason, in the first part of our work, we have also evalu-

ated the impact that different kinds of relationships between con-cepts can have in the calculation of semantic similarity. The kinds

of relationships we evaluate are: direct hierarchical relationships

(PAR/CHD), indirect hierarchical (RB/RN), and non-hierarchical exist-

ing in the UMLS Metathesaurus (SIB, RO, RL, SY, RQ, AQ, and QB).

Firstly, we will run the similarity calculations for Pedersens

benchmark, using the Path metric applied to the sources and rela-

tionships contained in UMLS 2010AB. As shown in Table 4,there is

a significant improvement in the correlation coefficients for hierar-

chical relationships. However, the combined use of all relationships

(both hierarchical and non-hierarchical) degrades the results consid-

erably. This is dueto thefact that these non-hierarchical relationships

generate cycles that do not represent parent/child or sibling relation-

ships between concepts (synonymy) (Bodenreider, 2001; Erdogan

et al., 2010) we do not, then, recommend using them, as they add
http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Interface/http://search.cpan.org/dist/UMLS-Interface/https://code.google.com/p/ytex/https://code.google.com/p/ytex/https://code.google.com/p/ytex/http://search.cpan.org/dist/UMLS-Interface/http://search.cpan.org/dist/UMLS-Similarity/


7/14


Table 4

Semantic similarity correlations for the Pathmetric, with relationships

PAR/CHD, PAR/CHD+RB/RN, and ALL relationships for sources existing in

UMLS 2010AB.

PAR/CHD PAR/CHD + RB/RN ALL

Spearman Phys. 0.6382 0.5761 0.4788

Coders 0.6 422 0.6 495 0.4338

Pear so n Phys. 0.7059 0.6740 0.6168

Coders 0.7982 0.8012 0.7046

Table 5

Table summarising results, extracted from Garlas work for UMLS re-

lease 2011AB concept graph(Garla, & Brandt, 2012).

Benchmarks Knowledge Based

Path Finding Intrinsic IC

wup Path / lch Path

Pedersen Combined N = 29 0.70 0.61 0.70

Mayo N = 101 0.38 0. 30 0.41

UMN relatedness N = 430 0.33 0.36 0.36

UMN similarity N = 566 0.39 0. 40 0.43

UMN relatedness N = 587 0.32 0.34 0.35

noise to the results. We can also observe how the application of the

entirety of the knowledge offered by the sources within the UMLS

Metathesaurus improves results as well.

The previous tests were also conducted on version 2011AB of the

UMLS Metathesaurus, obtaining similar results.

5.4. Comparison of Path and Intrinsic IC-Path metrics

Although many works have evaluated the performance of metrics

with different sets of pairedconcepts, Garlaoffers a definitive view on

various existing frameworks, and the various metrics defined (Garla,

& Brandt, 2012). As we can see inTable 5 (summary of the resultsreached by Garla), the best overall results are given by the Intrinsic

IC-Pathmetric.

For this reason, our work will be evaluating the performance of

the metric yielding the best results (Intrinsic IC-Path) and the compu-

tationally simplest metric (Path), in a real information retrieval sce-

nario that is, working on large volumes of information.

6. Evaluation of metrics in a real information retrieval context

Now that we have shown the importance of using the latest ver-

sions of the UMLS Metathesaurus (for an updated knowledge of the

biomedical domain) and of applying the right relationships (to re-

duce noise), we will now focus, in this section, on the applicationof these conclusions to a real environment. Also, in contrast with

of earlier works, we will evaluate the impact of using the Path and

Intrinsic IC-Pathmetrics in this real environment, namely the set of

electronic medical records found in TREC Medical Records Track 2011

(Voorhees, & Tong, 2011).

In order to perform this evaluation, each of the medical reports

making up eachvisitfor a given patient, along with the search topics,

will be represented via concepts contained within UMLS. This rep-

resentation will allow us to relate the topics concepts semantically

with the contents of each report; the semantic similarity between

these will determine the relevance of each visit.

For the calculation of metrics of semantic similarity between con-

cepts, we have used Ytex, developed by Garla and Brandt (Garla, &

Brandt, 2012).

6.1. Processing the information to be used

In order to represent, treat, and evaluate the semantic similarity

described above, we must extract UMLS concepts from the search

topic, as well as from the report. That done, the semantic similarity

between the concepts extracted from both is calculated. Lastly, these

results will be aggregated into a single similarity value, which will

determine the relevance (or irrelevance) of the document for a given

search topic. We will now detail the process.Pre-processing of reports and search topics:reports taken from

Text Retrieval Conference (TREC) are in XML format, and contain a

series of headers, footers, codes, and labels that must be removed be-

fore processing. Hence, in this stage, we remove the documents XML

tags, as well as any information that is not relevant to this study, such

as the reports checksum code which identifies the visit it belongs to

its signatures, and its ICD-9 codes. The result is a plain-text version

of the report, written in natural language with no codes or labels.

Topics, on the other hand, require no such processing, as they al-

ready are a mere text string.

Processing of search topics: the topics are processed using the

tool Metamap, breaking them into simple strings termed phrases

which represent symptoms, parts of the body, illnesses, etc. After

this, we obtain the CUIs of each of these resulting phrases. Some of

these phrases or strings may generate more than one CUI (as was

described inSection 2), in which case we combine these CUIs, giv-

ing each phrase a number of sub-phrases, and hence expanding the

query.

As an example of this method, we will now describe the process-

ing ofTopic 104, which defines the search criteriaPatients diagnosed

with localized prostate cancer and treated with robotic surgery. The

strings or phrases that make up this topic are:

1. Patients.

2. diagnosed with localized prostate cancer.

3. treated with robotic surgery.

Following this, we extract the UMLS concepts (CUIs) associatedto each topic phrase, obtaining the 11 sub-phrases shown in Table 6.

For instance, phrase 3 (treated with robotic surgery) generates sub-

phrases 1009, 1010, and 1011, while phrase 1 (Patients) generates

only sub-phrase 1001.

In case of processing a single search criteria, for example Topic 101

Patients with hearing loss, only one phrase will be generated, with

the concept sub-phrases 1001, 1002, and 1003, as seen inTable 7.

In both examples, we can see how phrases given more than one

sub-phrase implicitly expand the original query, through the varia-

tions in the concepts (CUIs) they contain; all of them carry a meaning

that is unique, but common to that of the original query.

Processing of medical reports: reports are processed in a sim-

ilar fashion to topics, identifying the UMLS concepts correspond-

ing to each phrase in the document, and generating all the possiblesub-phrases from the combination of CUIs of its different contextual

phrases.

As an example, we show a brief excerpt from a report (Fig. 2),

after being pre-processed in this stage to generate the correspond-

ing phrases. These phrases are expanded into different sub-phrases

through variations in the concepts (CUIs) that represent them

(Table 8). This way, we will be able to combine and match them with

each of thesub-phrasesdefining thetopic,and obtain the maximum

semantic proximity betweentopicandreport(as will be explained in

detail inSection 6.3). It is also worth noting that those phrases con-

taining a negation (assigned code 1), will be eliminated from the sim-

ilarity calculation process. In both cases, topicand reporthave been

conceptually expanded from the sub-phrases generated in both pro-

cesses.


8/14


CONGESTIVE HEART FAILURE. CYSTIC STRUCTURE AT THE POSTERIOR LEFT SIDE OF THE URINARY BLADDER WHICH

CAUSES MASS EFFECT ON THE URINARY BLADDER AND ADJACENT TO UTERUS, DETECTED ON CT OF THE ABDO-

MEN. NO CHANGE IN 7 X 8 CM FOCAL CYSTIC STRUCTURE. HARD OF HEARING. IRON DEFICIENCY ANEMIA.

Fig. 2. Example pre-processed excerpt of a report (Report 90230).

Table 6

Phrase table (Topic 104).

SUBPHRASE PHRASE Topic 104:"Patients diagnosed with localized prostatecancer and treated with robotic surgery"

1001 1 CUI1 = (C0030705) : podg : "Patients"

1002 2 CUI1 = (C0011900) : fndg : "Diagnosis"

CUI2 = (C0796563) : neop : "Localized Malignant

Neoplasm"

CUI3 = (C0033572) : bpoc : "Prostate"


CUI2 = (C0796563) :neop : "Localized Malignant

Neoplasm"

CUI3 = (C1278980) :bpoc : "Entire prostate"


CUI2 = (C1334407) : neop : "Localized Carcinoma"

CUI3 = (C0033572) : bpoc : "Prostate"


CUI2 = (C1334407) : neop : "Localized Carcinoma"

CUI3 = (C1278980) : bpoc : "Entire prostate"1006 2 CUI1 = (C0011900) : fndg : "Diagnosis"

CUI2 = (C0392752) : spco : "Localized"

CUI3 = (C0376358) : neop : "Malignant neoplasm of

prostate"



CUI3 = (C0600139) : neop : "Prostate carcinoma"



CUI3 = (C2984325) : ftcn : "Prostate Cancer Pathway"

1009 3 CUI1 = (C0332293) : topp : " Treated with"

CUI2 = (C0035785) : ocdi : "Robotics"

CUI3 = (C0038894) : bmod : "Surgery specialty"



CUI3 = (C0038895) : ftcn : "Surgical aspects"



CUI3 = (C0543467) : diap : "Operative Surgical

Procedures"

Table 7

Phrase table (Topic 101).

SUBPHRASE PHRASE Topic 101:"Patients with hearing loss"


CUI2 = (C0011053) : dsyn : "Deafness"


CUI2 = (C0018772) : fndg: "Hearing Loss, Partial"


CUI2 = (C1384666) : fndg: "hearing impairment"

6.2. Filtering by topic semantic types

The query expansion conducted in the previous point enhances

the information retrieval process, as it unveils new relationships be-

tween concepts. Still, this expansion may generate relationships be-

tween concepts belonging to semantic types with little semantic

specialization or specificity (Bodenreider, 2001; Bodenreider, &

McCray, 2003; Erdogan et al., 2010; Plaza, & Daz, 2010). These re-

lationships may skew the accuracy of similarity results for those con-

cepts of greater semantic relevance to our current context.

For this reason, wehavegone on to classify semantic typesby their

importance, dividing them into generic and specific types. Spe-

cific semantic types group concepts that carry more importance in

the biomedical domain, such as diseases, symptoms, procedures, and

Table 8

Example processed excerpt of a report (report90230).

SUBPHRASE PHRASE Negation Excerptreport90230

190 254 0 C0018802 dsyn CONGESTIVE HEART

FAILURE.

191 255 0 C0010709 dsyn CYSTIC STRUCTURE AT

THE POSTERIOR LEFT SIDE

191 255 0 C0678594 spco CYSTIC STRUCTURE AT






192 255 0 C0010709 dsyn CYSTIC STRUCTURE AT




192 255 0 C0205095 spco CYSTIC STRUCTURE ATTHE POSTERIOR LEFT SIDE



193 256 0 C0577559 fndg MASS EFFECT ON THE

URINARY BLADDER

193 256 0 C1280500 qlco MASS EFFECT ON THE

URINARY BLADDER

193 256 0 C0005682 bpoc MASS EFFECT ON THE

URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER195 256 0 C0042027 bpoc MASS EFFECT ON THE

URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER


URINARY BLADDER

199 257 0 C0442726 fndg DETECTED ON CT

200 258 1 C0205234 spco NO CHANGE IN 7 8 CM

FOCAL CYSTIC STRUCTURE

200 258 1 C1511605 fndg NO CHANGE IN 7 8 CM


200 258 1 C0678594 spco NO CHANGE IN 7 8 CM


201 259 0 C0018772 fndg HARD OF HEARING

202 259 0 C1384666 fndg HARD OF HEARING

203 260 0 C0162316 dsyn IRON DEFICIENCY

ANEMIA.


9/14


medication. We now show, as an example, the semantic types which

appear in Topics 104 and 101:

-Generic

spco - Spatial Concept (CONC)

podg - Patient or Disabled Group (LIVB)

ftcn - Functional Concept (CONC)

-Specific

dsyn - Disease or Syndrome (DISO)diap - Diagnostic Procedure (PROC)

neop - Neoplastic Process (DISO)

fndg - Finding (DISO)

bpoc - Body Part, Organ, or Org.Component (ANAT)

topp - Thrapeutic or Preventive Procedure (PROC)

bmod - Biomedical Occupation or Discipline (OCCU)

ocdi - Occupation or Discipline (OCCU)

Concepts associated to generic semantic types will be eliminated

from the phrase table generated in the previous step. For example, in

Tables 6and7,we show eliminated concepts in grey, for Topics 104

and 101. This way, we can identify the concepts that are not relevant

in the context of the specific phrase.

Note how, from sub-phrase 1008, the concepts Localized and

Prostate Cancer Pathway are eliminated, as they belong to generic

types spco and ftcn respectively. We will also note how in Topic

104, phrase 1 will be completely eliminated.

6.3. Maximum semantic similarity matrix (topic vs report) and

relevance computation

In order to assess the semantic similarity between each topicand

each report, we perform a similarity evaluation at several aggregation

levels: CUIs, sub-phrases, and phrases. Similarity computation (Sim)

is achieved by a matrix that pairs the topics CUIs with the reports

CUIs, for both of our chosen metrics: Path and Intrinsic IC-Path. Af-

terwards, we select, for every CUI in the topicsub-phrase, the paired

concepts (topic-report) with the highest similarity value. This process

is then repeated for everytopicsub-phrase within a phrase.

Simsubphrasesubphrasei cuij= max

Sim

CUIsubphraseij CUIreportk

(15)

whereiis each of the topic sub-phrases, jeach of the

sub-phrases CUIs, andkeach of the reports CUIs.

Later, for each individual phrase, we select the maximum similar-

ity value of each CUI present in its topicsub-phrases. In the individual

case of Topic 104 and phrase 2(Table 6) we will obtain the maximum

similarity value of CUI1, CUI2, and CUI3.

Sim_max_phrasecuij = max

Sim_subphrasesubphrasei cuij

(16)

This done, we average their values, obtaining a single similar-

ity value per phrase. In our example for Topic 104 and phrase 2,

Sim_avg_phrase = CUI1 + CUI2 + CUI3/3.Sim_avg_phrasei

=

num_cuis_phrasei=0

(Sim_max_phrasei/num_cuis_phrase) (17)

Lastly, we average the maximum similarity values of all the

phrases in the search, which will derive the final relevance of the re-

portrespecting the topic. In thecase of Topic 104, Sim_topicvsreport =

(Sim_avg_phrase1+ Sim_avg_phrase2+ Sim_avg_phrase3)/3.

It is interesting to point out that, in the particular case of Topic

104, the final relevance value is determined by the average similar-

ity value of the last two phrases. Phrase 1 (Patient)is completely

eliminated from the result, since all the concepts (CUIs) that make it

up are associated to generic semantic types (podg).

Sim_topicvsreport =

num_phrasesi=0

(Sim_avg_phrasei/num_phrases)

(18)

We can then say that the final value (Sim_topic vsreport) of the

maximum similarity matrix of a reportin relation toa topicwill deter-

mine whether or not it is relevant for the terms defined by said topic.

The lower extreme (value 0) indicates maximum non-relevance, and

the upper extreme (value 1) indicates maximum relevance.In order to compare the final value obtained by the semantic sim-

ilarity matrix to the relevance criteria offered by experts in each case,

it will be necessary to establish a cut-off value (within the range

[0,1]), which will determine whether a certain report is relevant or

not to a given topic. This will be studied and defined in the next sec-

tion.

Since a medicalvisitmay be made up of more than onereport, the

visits relevance will be determined by the maximum similarity value

of itsreports.

This method tries to preserve the informational uniqueness and

completeness of the query (topic) for its automated treatment, with-

out any input needed from the user. For this, it is necessary toinclude

each of thetopiccomponents by a process of aggregation of the aver-

age of the maximum similarity values of the different phrases. In thisway, each subphrase, which is expanded from the phrases that make

up the topic, is measured with the same precision when the aggrega-

tion of their averages takes place. However, what will determine, in

the end, the relevance of each component, will be the maximum se-

mantic similarity of the topic concepts in relation to the report, along

with the semantic type they belong to.

Through this straightforward example (Table 9), we can observe

the importance of concept-based expansion, both of the topicand of

the report, between theconcepts of whichwe canestablish maximum

similarity relationships, even when the terms or strings are different

in themselves. So, for example, the terms associated to the CUIs of

topic (Deafness; Hearing Loss, Partial; hearing impairment), are

different from the terms associated to the CUIs of report (Hard of

Hearing), and yet, we obtain the maximum possible similarity.Tables 9 and 10, are composed by the following elements: the first

two columns (topicandreport)are formed by the id. sub-phrases, id.

phrases, CUIs, semantic type and string phrases of the topic and the

report respectively. The two last columns correspond to the maxi-

mum similarity for each metric between pairs of topic-reportCUIs.

7. Result analysis

In this section, we will analyze the results obtained after evaluat-

ingtopicsmatched toreports, by the procedure described in the pre-

vious section.

In order to contrast the relevance criteria set by the experts with

the results of the retrieval system we propose in this paper, we have

generated a histogram(Fig. 3) which reflects the similarity of eachvisit (thereportwith the highest similarity value in each) to a search

topic. Thesereportsare distributed along the X axis according to their

degree of relevance (0 being Not relevant, and 1 Relevant). Lastly,

to ease the understanding of the histogram, we highlight in black

those reports which were deemed Relevant by the experts, and in

ochre those deemed Not relevant.

7.1. Justification of topic semantic type filtering

Firstly, we have carried out a series of experiments to validate fil-

tering by concepts associated to specific topic semantic types. Thus,

inFig. 3, we show the results of evaluating the reports matched to

Topic 107(Patients with ductal carcinoma in situ (DCIS)), both filtered

by semantic types (Fig. 3b) and unfiltered(Fig. 3a). We can easily see


10/14


Table 9

Example maximum similarity values matrix for each sub-phrase Sim_subphrase from Topic101 - Report90230.

Topic 101 Report90230 Path IC-Path

Max. Sim Max. Sim

1001 1 C0030705 podg Patients 168 52 C0030705 podg the patient on consultation 1.0000 1.0000

1001 1 C0011053 d syn De afnes s 201 73 C0018772 fnd g HARD O F HEARING. 0.5000 0.8042

1002 1 C0030705 podg Patients 113 11 C0030705 podg the patient in consultation 1.0000 1.0000

1002 1 C0018772 fndg Hearing Loss, Partial 201 73 C0018772 fndg HARD OF HEARING. 1.0000 1.0000

1003 1 C0030705 podg Pati ents 111 9 C0030705 podg The patient appare ntly 1.0000 1.0000

1003 1 C1384666 fndg hearing impairment 202 73 C1384666 fndg HARD OF HEARING. 1.0000 1.0000

Table 10

Example maximum similarity values matrix of each sub-phrase Sim_subphrase from Topic104 - Report51139.

Topic 104 Report51139 Path IC-Path

Max. Sim Max. Sim

1001 1 C0030705 podg Patients 43 15 C0030705 podg he patient 1.0000 1.0000

1002 2 C0011900 fndg Diagnosis 75 28 C0543467 diap DESCRIPTION OF OPERATION 0.3333 0.7172

1002 2 C0796563 neo p Local ized Mal ignant Neo plasm 28 12 C0796563 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000

1002 2 C0033572 bpoc Prostate 61 18 C0033572 bpoc now for removal of his prostate 1.0000 1.0000

1003 2 C0011900 fndg Diagnosis 40 14 C0376358 neop LOCALIZED PROSTATE CANCER. 0.3333 0.7172

1003 2 C0796563 neo p Local ized Mal ignant Neo plasm 28 12 C0796563 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000

1003 2 C1278980 bpoc Entire prostate 380 19 C1278980 bpoc The prostate 1.0000 1.0000


10 04 2 C1334407 neop Localized Carcinoma 30 12 C1334407 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00

1004 2 C0033572 bpoc Prostate 171 75 C0033572 bpoc at the prostate. 1.0000 1.00001005 2 C0011900 fndg Diagnosis 63 19 C0184661 diap benefits of the procedure 0.3333 0.7172

10 05 2 C1334407 neop Localized Carcinoma 30 12 C1334407 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00

10 05 2 C12789 80 bpoc E ntire prostate 236 113 C12789 80 bpoc sharp dissec tion until the prostate 1.0 00 0 1. 00 00

1006 2 C0011900 fndg Diagnosis 15 7 C0184661 diap PROCEDURE 0.3333 0.7172

1006 2 C0392752 spco Local ized 53 16 C0392752 spco 50s- ye ar-o ld male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000

1006 2 C0376358 neo p Mali gnant neo plasm o f prostate 32 12 C0376358 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000

1007 2 C0011900 fndg Diagnosis 14 6 C0543467 diap SURGERY DATE 0.3333 0.7172

1007 2 C0392752 s pco Locali ze d 44 16 C0392752 spco 50s- ye ar-o ld male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000

10 07 2 C060 0139 neop Prostate carcinoma 33 12 C060 0139 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00


1008 2 C0392752 s pco Locali ze d 44 16 C0392752 spco 50s- ye ar- old male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000

10 08 2 C2984325 f tcn Prostate Cancer Path way 42 14 C29 84325 ftc n LOCALIZED PROSTATE CANCER. 1.0 00 0 1. 00 00

1009 3 C0332293 topp Treated with 523 24 C0444667 qnco present for the entire procedure. 0.0000 0.0000

1009 3 C0035785 ocdi Robotics 17 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.0000

1009 3 C0038894 bmod Surgery specialty 9 6 C0038894 bmod SURGERY DATE 1.0000 1.0000

1010 3 C0332293 topp Treated with 522 24 C0450011 topp present for the entire procedure. 0.0000 0.0000

1010 3 C0035785 ocdi Robotics 19 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.00001010 3 C0038895 ftcn Surgical aspects 11 6 C0038895 ftcn SURGERY DATE 1.0000 1.0000

1011 3 C0332293 topp Treated with 522 24 C0450011 topp present for the entire procedure. 0.0000 0.0000

1011 3 C0035785 ocdi Robotics 19 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.0000

1011 3 C0543467 diap Opera tive Surgical P rocedures 75 28 C0543467 diap DESCRIP TION OF OPERATION 1.0 00 0 1. 00 00

how, after filtering, the most significant reports deemed Not rele-

vant (ochre) and Relevant (black) are displaced towards areas of

lower and higher relevance respectively.

These results highlight the necessity to perform a query expan-

sion by specific semantic types only, hence obtaining more accu-

rate results for a lower computational cost (as we eliminate the need

to calculate similarity for generic semantic types).

7.2. Behavior of Path and Intrinsic IC-Path metrics

To comparatively evaluate the performance (in terms of semantic

similarity) of the PathandIntrinsic IC-Pathmetrics in a real-life con-

text, we show a preliminary experiment on two search criteria. One

is a simple topic, Topic 101 (Patients with hearing loss), applied to

4073 reports grouped in 249 visits. The other is a complex topic, Topic

104, (Patients diagnosed withlocalized prostate cancer and treated with

robotic surgery), applied to 3439 reports grouped in 196 visits.

The results obtained from applying the Path metric to a simple

topic (Topic 101), show a discrete distribution of results, derived from

its definition which is based on the inverse of the distances(Fig. 4a).

This makes for uncertainty zones, since some reportsare localised in

similarityvalues between 0.45 and 0.50 (27 non-relevant reports, and

9 relevant).

In the case of the Intrinsic IC-Path metric, the internal nature of

its calculation does away with this discrete character (Fig. 4b). The

global results compared to those of the Path metric are similar, but

distributed in a smoother fashion, more evenly distributed towards

both extremes.

Conversely, when processing complex topics (with multiple

phrases) such as Topic 104, calculations based on aggregated averages

of the maximum similarity values obtained (Section 6.3)counter the

discrete character of thePathmetric. Also, for both metrics, the sim-ilarity values of thereportstend to spread following a normal distri-

bution function(Fig. 5a and b), which removes the previously men-

tioned discrepancies.

7.3. Choosing the cut-off value

From thereportsimilarity distributions generated for each search

criteria, as shown in the previous part (Figs. 4 and 5), we must

establish a cut-off value to determine whether the report is relevant.

Based on that value, reports with an estimated similarity greater or

equal to it will be deemed relevant by the system, and the rest not

relevant. By doing this with reports that have already been assessed

by experts as Relevant or Not relevant for each topic, we can esti-

mate the accuracy of the retrieval system we propose in this work.


11/14


Fig. 3. (a) Histogram for Topic 107withoutsemantic type filtering. (b) Histogram for Topic 107withsemantic type filtering.

Fig. 4. PathvsIntrinsic IC-Pathfor a simple search topic (Topic 101).

Fig. 5. PathvsIntrinsic IC-Pathfor a complex search topic (Topic 104).

As the previous part gives out, it is easy to determine the cut-

off point for simple topics, due to the observed distribution of their

values towards the extremes. However, when working with complex

searches, the decision will be more complex, as well as more critical

for the performance of the system. For all this, to define the cut-off

value, we will adhere to the following premises:

The value must be common to both metrics and lie between 0

and 1. It must be greater than 0.5, as this value represents a syn-

onymy relationship between concepts under the Path metric,

but is not sufficient in itself to establish relevance in complex

search.


12/14


Fig. 6. Documents evaluated by the proposed system forTopic 104, usingPathandIntrinsic IC-Pathmetrics.

Table 11

Final relevance values for the examples inSection 6.3.

Path Intrinsic IC-Path

Sim_topic101vsreport90230 = 1.0 00 1.0 00

Sim_topic104vsreport51139= 0.7149 0.7783

It must show a balance in classifying documents by relevance;

that is, the higher the cut-off value is, the more documents it will

classify as not relevant, to the detriment of relevant results.

From the stated premises, and for a simple maximum similarity

matrix of a mere two concepts, a reportwould only be deemed rel-

evant to a topic, if the two concepts had a similarity value of 1.0

(distance equals 1 and represents the same concept) or 0.5 (dis-

tance equals 2 and represents a synonym). In a real-life context, with

complex phrases made of multiple pairs of concepts, applying an av-

erage value to all similarities carries errorsof variance that distort thefinal results. For this reason, it is necessary to enact two additional

requirements to ensure the proper application of the average value:

that atleast one of the pairs of concepts has a similarity value of 1.0,

and that at most one of the pairs has a value lower than 0.5 (values

lower than 0.5 represent a distant synonymy between concepts). If

these two additional criteria are not met, the report is deemed Not

Relevant.

Once this correction was applied, a test group of 1000 reports

showed that all Relevant documents presented values equal to or

greater than 0.6.

For this reason, we have established a cut-off value as 0.6, as the

minimum value to meet all the requirements above.

Hence, the final result of the maximum similarity matrix

(Sim_topicvsreport) will reflect the relevance of a reportin relation toatopicin the following manner:

- If value of (Sim_topicvsreport) is within the range [0.0; 0.6); thereport, is Not Relevant to the topic.

- If value of (Sim_topicvsreport) is within the range [0.6; 1.0]; the

report, is Relevant to the topic.

In this way, the examples shown inSection 6.3(Tables 9and10),

correspond to tworeportsthat were deemed Relevant to the topics

they were evaluated for both metricsTable 11.

UsingTopic 104 as an example, for the Pathmetric with the pro-

posed cut-off value, we can observe (Fig. 6) how 9 reports assessed by

experts as not relevant aretagged as relevant by thesystem, while

1 deemed relevant by the experts turns out as not relevant. In the

case of theIntrinsic IC-Pathmetric, 4 reports tagged not relevant by

Table 12

Aggregated results.

Path Intrinsic IC-Path

Recall 0 .753 0.639

Precision 0 .364 0.392

F-Measure 0.430 0.427

experts are seen as relevant by the system, and 2 relevant as not

relevant.

All told, in the specific case of Topic 104, the results obtained

for the Path metric are: (Precision = 44.4%; Recall = 88.9%; F-

Measure = 59.3%). And for the Intrinsic IC-Path metric: (Preci-

sion = 63.6%; Recall = 77.8%; F-Measure =70.0%). In this case, the

Intrinsic IC-Pathmetric shows a better performance thanPathmetric.

7.4. Evaluation of Path and Intrinsic IC- Path metrics with the

TREC dataset

In this part, we will evaluate the performance of the two met-

rics analyzed in our work (PathandIntrinsic IC-Path) in a real-life in-

formation retrieval scenario. To do so, we will use the 35 topics (or

search criteria) proposed in TREC 2011, with an information source of

101,712 reports (grouped into 17,265 visits).

The metrics used in this evaluation are the standard ones in the

field of information retrieval: Precision, Recall, and F-Measure. The

latter is best at reflecting a balance between the first two, since it is

defined as:

F-Mesaure = 2

Precision Recall

Precision + Recall

(19)

Table 12shows the average of all the results obtained in the re-trieval of relevant reports for each of the proposed search topics.

As we can see, the F-Measure value of both metrics is very similar

(Path = 0.430, Intrinsic IC-Path = 0.427), with a slight edge for the

Path metric. Although these results suggest that, in terms of Recall,

Path is the superior metric, with Intrinsic IC-Path having the upper

hand in Precision, we cannot consider them to be conclusive, as both

indicators are complementary.

Digging deeper into the results, and analyzing their dispersal pat-

tern, Figs. 7 and 8 shows the detailed values of theprevious indicators

for all the search topics, and for both metrics studied in this work.

This figure reveals the complexity of a number of topics (such as 116,

123, 124, 125, 126, 130, 133, or 134) for which the results, in terms

of F-Measure, lie below 20%, for both metrics; good examples further

illustrating the level of complexity of these topics would beTopic 123


13/14


Fig. 7. Results usingPathfor the 35 topics. Recall, Precision, and F-Measure shown.

Fig. 8. Results usingIntrinsic IC-Pathfor the 35 topics. Recall, Precision, and F-Measure shown.

(Diabetic patients who received diabetic education in the hospital) or

Topic 133 (Patients admitted for care who take herbal products for os-teoarthritis) .

Topics 123 and 134 produce a completely anomalous result, due

to an error detected in the UMLS relationships for two particular

concepts. These concepts, C0241863 Diabetic for Topic 123, and

C1148454 Seizure activityfor Topic 134, offer no similarity distance,

and are particularly important for said topics.

8. Conclusions

The extraction of information through natural language process-

ing in biomedical documents is both important and complex enough

to deserve very particular attention. For this reason, many works have

been published that address the matter by dealing with similarity

metrics in a theoretical context, using the UMLS resource; however,none of them manage to fulfil the actual need for information re-

trieval from medical documents.

It is for this reason that, in this paper, we have proposed a novel

experimental study for assessing the performance ofIntrinsic IC-Path

and Path metrics in a real-life context that is, real medical re-

ports. Also, in order to perform that study, we have deployed an ad-

hoc framework to formalize the use of the UMLS Metathesaurus for

the retrieval of medical information from these actual reports (TREC

Medical Records Track 2011) through maximum semantic similarity

matrices.

The conclusion drawn from our work is that, in a real-life con-

text, both assessed metrics display similar performance, Path (F-

Measure = 0.430) e Intrinsic IC-Path (F-Measure = 0.427). Therefore,

the variations in performance obtained in these theoretical contextsdisappear when the amount of data is increased, and real visits and

reports are used. So, these results do not justify the use of complex

metrics (with their associated high computational cost) as are these

variations of the Path metric, particularly Intrinsic IC-Path in this case.

The justification for these results lies in the fact that, unlike the com-

parison between isolated pairs of concepts conducted in previous

works, the information contained within a report or topic is inter-

related, extensive, and expressed in a natural language.

Theresults of this work are applicable to any similarity search pro-

cess conducted on biomedical documents (patient histories, clinical

reports, diagnostic tests like CT scans, X-Rays, etc.) as long as they are

contained in text files.

Once we have determined that the improved performance of

these similarity metrics has no impact in a real-life context, it be-

comes necessary to improve, in the future, the straightforward re-

trieval system we have proposed to perform this assessment. In thissense, it may prove beneficial to eliminate those sub-phrases within

a topic which, although syntactically correct, are not semantically

related to its meaning. Furthermore, the reports dealt with are fre-

quently ambiguous, as they refer to disparate (subjects) symptoms or

illnesses for the same patient, making automatic retrieval more dif-

ficult. It would be appropriate to filter or separate these documents

so that eachreportcovers one subject exclusively. By relating the re-

ports subject more closely with the search topic, we could exclude

secondary subjects from the results, which merely add noise, and in-

crease the computational costs of the query.

References

Alpi, K. M. (2005). Expert searching in public health. Journal of the Medical Library As-sociation, 93(1), 97103.

Al-Mubaid, H., & Nguyen, H. (2006). A cluster-based approach for semantic similar-ity in the biomedical domain. In Engineering in Medicine and Biology Society, 2006.EMBS06. 28th Annual International Conference of the IEEE(pp. 27132717).

Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathe-saurus: The MetaMap program. In Proceedings of the American Medical Informatics

Association Symposium 2001(pp. 1721).Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: Historical perspective

and recent advances.Journal of the American Medical Informatics Association, 17(3),229236.

Aronson, A. R., & Rindflesch, T. C. (1997). Query expansion using the UMLS Metathe-saurus. In Proceedings of the American Medical Informatics Association Annual FallSymposium(p. 485).

Babashzadeh, A., Huang, J., & Daoud, M. (2013, July). Exploiting semantics for improv-ing clinical information retrieval. In Proceedingsof the 36thinternational Association

for Computing Machinerys Special Interest Group on Information Retrieval Confer-ence on Research and development in information retrieval (pp. 801804). ACM SIGIR2013.

Batet, M., Snchez, D., & Valls, A. (2011). An ontology-based measure to compute se-mantic similarity in biomedicine. Journal of biomedical informatics, 44(1), 118125.

Bhogal, J., Macfarlane, A., & Smith, P. (2007). A review of ontology based query expan-sion.Information processing & management, 43(4), 866886.

Bodenreider, O. (2001). Circular hierarchical relationships in the UMLS: Etiology, di-agnosis, treatment, complications and prevention. In Proceedings of the AmericanMedical Informatics Association Symposium(p. 57).

Bodenreider, O. (2004). The unified medical language system (UMLS): Integratingbiomedical terminology.Nucleic acids research, 32(suppl 1), D267D270.

Bodenreider, O., & McCray, A. T. (2003). Exploring semantic groups through visual ap-proaches.Journal of biomedical informatics, 36(6), 414432.

Burgun, A., & Bodenreider, O. (2001). Comparing terms, concepts and semantic classesin WordNet and the Unified Medical Language System. InProceedings of the North

American Chapter of th e Association for Computational Linguistics 2001; WorkshopWordNet and Other Lexical Resources: Applications, Extensions and Customiza-tions (pp. 7782).

Caviedes,J. E., & Cimino, J. J. (2004).Towards thedevelopment of a conceptual distance

metric for the UMLS.Journal of biomedical informatics, 37(2), 7785.
http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0001http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0001http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.

Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic

Documents

Transcript of Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic