An ontology-based fuzzy decision support system for multiple sclerosis

Engineering Applications of Artificial Intelligence 24 (2011) 1340–1354

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

0952-19

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/engappai

An ontology-based fuzzy decision support system for multiple sclerosis

Massimo Esposito n, Giuseppe De Pietro

National Research Council of Italy (CNR), Institute for High Performance Computing and Networking (ICAR), Via P. Castellino 111, 80131 Naples, Italy

a r t i c l e i n f o

Available online 25 September 2011

Keywords:

Multiple sclerosis

Decision support system

Knowledge Representation

Knowledge Reasoning

Ontology

Fuzzy logic

76/$ - see front matter & 2011 Elsevier Ltd. A

016/j.engappai.2011.02.002

esponding author. Tel.: +39 081 6139512; fa

ail address: [email protected] (

a b s t r a c t

The use of Magnetic Resonance (MR) as a supporting tool in the diagnosis and monitoring of multiple

sclerosis (MS) and in the assessment of treatment effects requires the accurate determination of

cerebral white matter lesion (WML) volumes. In order to automatically support neuroradiologists in the

classification of WMLs, an ontology-based fuzzy decision support system (DSS) has been devised and

implemented. The DSS encodes high-level, specialized medical knowledge in terms of ontologies and

fuzzy rules and applies this knowledge in conjunction with a fuzzy inference engine to classify WMLs

and to obtain a measure of their volumes. The performance of the DSS has been quantitatively

evaluated on 120 patients affected by MS. Specifically, binary classification results have been first

obtained by applying thresholds on fuzzy outputs and then evaluated, by means of ROC curves, in terms

of trade-off between sensitivity and specificity. Similarity measures of WMLs have been also computed

for a further quantitative analysis. Moreover, a statistical analysis has been carried out for appraising

the DSS influence on the diagnostic tasks of physicians. The evaluation has shown that the DSS offers an

innovative and valuable way to perform automated WML classification in real clinical settings.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Multiple sclerosis (MS) is an autoimmune inflammatory dis-ease of the Central Nervous System, which causes the damage ordestruction of myelin surrounding nerve fibers. In more detail,such a disease is characterized by multiple demyelinated lesionsinvolving the brain and spinal cord, that interrupt communica-tions between the nerves and the rest of the body (Compston andColes, 2008).

According to the National Institute of Neurological Disordersand Stroke, about 250.000–350.000 people in the United Stateshave been diagnosed with MS. Worldwide, the incidence of MS isapproximately 0.1%. Northern Europe, the northern United States,southern Australia and New Zealand have the highest prevalence,with more than 30 cases per 100.000 people.

MS has an unpredictable clinical course and it is morecommon in women and in Caucasians. The average age of onsetis between 20 and 40, but the disorder may develop at any age(Rosati, 2001). It produces neurological dysfunctions, such asnumbness, impaired vision, loss of balance, weakness, bladderdysfunction and psychological changes. Many MS cases evolveover a long period (20–30 years) with remissions and exacerba-tions, but, in almost half of all cases, it relentlessly progresses tosevere disability and premature death (Kidd, 2001).

ll rights reserved.

x: +39 081 6139531.

M. Esposito).

Diagnosis of MS is based on the principle of dissemination inboth time and space. Recent criteria state that patients shouldexperience two attacks of such dysfunctions, occurring at differentpoints of time and affecting different parts of the Central NervousSystem. Many years may elapse between the first attack and thesecond one, and not all the patients who experience a first attackdevelop MS. Nevertheless, such attacks are extremely variable, oftenquite subtle; hence, they can lead to a suspicion of disease, but, inmany cases, they cannot be sufficient on their own for the diagnosis.In such a sense, recently, Magnetic Resonance Imaging (MRI) hasbeen applied as a supporting tool in MS diagnosis, enabling thevisualization of cerebral MS lesions, both in clinically suspectedcases and in silent ones (Miller et al., 2004).

Furthermore, the lack of laboratory markers for MS activity,progression and remission has brought much interest to theapplication of MRI, especially as a monitoring tool both in thecourse of MS and in the assessment of treatment effects (Miller,1994; Miller et al., 1998; Filippi et al., 1995). As a matter of fact,brain Magnetic Resonance (MR) images allow to characterize MSlesions in both space and time, i.e. providing information abouttheir number, size and spatial distribution for every single studyand, moreover, highlighting changes among studies performed atdifferent times. The use of MR images as MS marker requires theexpert’s knowledge and intervention to classify MS lesions;nevertheless, manual classification is a very thorny and time-consuming task due to the huge amount of MR images to beexamined and the variable number, size and spatial distributionof MS lesions per image.

www.elsevier.com/locate/engappai

dx.doi.org/10.1016/j.engappai.2011.02.002

mailto:[email protected]

dx.doi.org/10.1016/j.engappai.2011.02.002

M. Esposito, G. De Pietro / Engineering Applications of Artificial Intelligence 24 (2011) 1340–1354 1341

Operator-assisted techniques, such as local thresholding, havebeen successfully employed (Filippi et al., 1996), but they aretime-consuming and require operator intervention since they aremonoparametric, i.e. based only on MR signal intensity data. As aconsequence, they are strongly associated with some degree ofvariability due to the operator intervention and, in this respect,the use of an automated system for supporting neuroradiologiststo detect MS lesions could be undoubtedly advantageous. Suc-cessful methods assessing multiple parameters have been devel-oped in the last decade, in which automated classification of MSlesions is characterized by high specificity and sensitivity (Alfanoet al., 2000; Akselrod-Ballin et al., 2009; Anbeek et al., 2004;Freifeld et al., 2009; Khayati et al., 2008; Sajja et al., 2006; Weiet al., 2002; Wels et al., 2008; Zijdenbos et al., 2002).

Although all these methods have a high classification accuracy,they suffer from two important limitations. First, most part ofthem does not reproduce the real decision-making processperformed by neuroradiologists, but implements advanced algo-rithms that are often too complex to be understood by a non-technical audience such as the physicians. Indeed, their goal is tomaximize the classification accuracy without providing compre-hensible outcomes to physicians, in terms of a clear and semanticdescription of the generated results. As a result, they are con-sidered by physicians as black boxes that simply generateanswers with no further explanation. Differently, neuroradiolo-gists would like to have some insight as to how a classificationmethod works and derives its outputs.

Second, lot of the methods existing in literature rely onmathematical models based on thresholding to classify MSlesions. Hence, they neither take into account the fuzziness ofinput data nor reproduce the expert decision-making processapplied in a vague-laden domain such as medicine. As a matter offact, the decision-making model every trustworthy physician hasin mind to perform heuristic diagnosis is often pervaded byuncertainty and vagueness. Expert knowledge therefore aboundswith imprecise formulations that are not a consequence ofrhetorical inability, but an intrinsic part of expert knowledgeacquired through laborious experience (Steimann and Adlassnig,2000). Any formalism disallowing uncertainty, such as crispmathematical models, is therefore inapt to capture this knowl-edge (Adlassnig, 1998; Steimann and Adlassnig, 2000) and canrepresent an unrealistic oversimplification of reality, leading topossible wrong interpretations when compared to a directobservation.

Therefore, a system able to detect MS lesions by handlingfuzziness and providing interpretation for the classification out-puts would be of great clinical value.

In such a sense, an ontology-based fuzzy decision supportsystem (DSS) has been developed in order to automaticallysupport neuroradiologists in the classification of a type of MSlesion, i.e. white matter lesion (WML). Such a DSS encodeshigh-level medical knowledge elicited from experts in terms ofontologies and fuzzy rules and applies such a knowledge inconjunction with a fuzzy inference engine to classify WMLsand to obtain a measure of their volumes. Specifically, ontologiesare used to represent the semantic structure of the expert’sknowledge and to provide a comprehensible formulation ofthe generated outcomes. Fuzzy logic is used to handle fuzzinessof input dataset and reproduce the expert’s decision-making process to classify WMLs with further capability ofattributing a confidence measure to the output results. Theperformance of the DSS is quantitatively evaluated on 120patients affected by MS. Moreover, a statistical analysis is carriedout to appraise to the extent the DSS has an influence on thediagnostic tasks of physicians, and whether this influence can bequantified.

2. Background

The proposed DSS relies on a knowledge-based approachintegrating two knowledge-representation techniques, namelyontologies and fuzzy logic, shortly outlined in the followingsub-sections.

2.1. Ontology

The philosophical concept of ontology, as ‘‘the branch ofmetaphysics that deals with the nature of being’’ has been usedin many areas of science and literature. This concept, also co-opted by computer specialists, means a vocabulary and a set ofterms and relations that define, with the needed accuracy, a set ofentities enabling the definition of classes, hierarchies, and otherrelations among them (Gruber, 1995; Guarino, 1995).

The ontology modeling defines the terms to describe andrepresent a knowledge domain. As such, an ontology of a domainis a form of computer-acceptable representation of knowledgeabout a part of an abstract or real world. In general, an ontologymodel O can be presented in the form of a set C of concepts and afinite family of ontological models Mk, k¼1, 2, y, K, defined asrelationships described on selected subsets of C. The relationshipsmay be of various kinds, e.g. binary relationships, named roles.However, taxonomies Ti, i¼1, 2, y, I of the concepts are manda-tory elements of the ontology.

The current standard language to formalize ontologies is WebOntology Language (OWL), a Semantic Web tool based onDescription Logics to represent knowledge and support a widerange of reasoning facilities (Patel-Schneider et al., 2004).Description Logics (DLs) (Baader et al., 2003) are a family ofconcept-based knowledge representation formalisms, equippedwith well defined model-theoretic semantics (Tarski, 1956). Theyare characterized by the use of various constructors to buildcomplex concept descriptions from simpler ones and by theprovision of sound, complete and empirically tractable reasoningservices, e.g. to ensure logical consistency or infer new knowledge(Horrocks et al., 2000).

Both the well-defined semantics and the powerful reasoningtools that exist for DLs make ontologies ideal to encode knowl-edge in a plethora of applications. Nevertheless, although ontol-ogy languages and, more in general, Description Logics provideconsiderable expressive power, they feature limitations regardingthe ability to represent vague knowledge (Lukasiewicz andStraccia, 2008; Stoilos et al., 2007), i.e. knowledge characterizedby boundaries that are not clear-cut between concepts. As aresult, in this work a hybrid approach combining ontology andfuzzy logic formalisms together is proposed with the final aim ofproviding both the abilities of describing semantically a domainof interest and reasoning with uncertainty and/or vagueness.

2.2. Fuzzy logic

Fuzzy logic resembles human reasoning in its use of vagueinformation to generate decisions (Zadeh, 1965). Unlike classicallogic, which requires a deep understanding of a system, exactequations and precise numeric values, fuzzy logic incorporates analternative way of thinking that allows modeling complex sys-tems using a higher level of abstraction originating from knowl-edge and experience (Siagian, 2003).

A fuzzy variable (also named linguistic variable) is character-ized by its name tag, a set F of fuzzy values (also known aslinguistic values) and the membership functions mF of thesevalues; fuzzy values assign a membership value to elements u

within some predefined range U (known as the universe of

M. Esposito, G. De Pietro / Engineering Applications of Artificial Intelligence 24 (2011) 1340–13541342

discourse) in the following way:

F ¼ fðu,mF Þ9uAF and mF : U-½0,1�g

The goal of fuzzy variables is to facilitate a gradual transitionbetween states, so as to have, unlike crisp variables, a naturalcapacity to express vagueness in measurements.

Fuzzy logic provides an inference morphology enabling theapproximate human reasoning capabilities of drawing conclu-sions from existing data: specifically, it is possible to infer newtruths from old. In more detail, the fuzzy inference morphologyrelies on fuzzy rules, defined as conditional statements written inthe following form:

‘‘if antecedent then consequent’’

where antecedent is a fuzzy-logic expression composed of one ormore simple fuzzy expressions connected by fuzzy operators, andconsequent is an expression that assigns fuzzy values to theoutput variables.

A fuzzy reasoning evolves through four steps, namely fuzzifi-cation of input variables, rule evaluation, aggregation of ruleoutputs and, finally, defuzzification.

Fuzzification of input variables translates crisp (real-valued)inputs into fuzzy values. Rule evaluation applies the fuzzifiedinputs to the antecedents of fuzzy rules. If a given fuzzy rule hasmultiple antecedents, a fuzzy operator is used to obtain a singlenumber that represents the degree of activation of the rule. Thelatter is used to determine the conclusion of the rule: thisoperation is called the implication. There are several implicationoperators, but the most common is the ‘‘minimum’’, also definedas clipping. This operator cuts off the values of membershipfunction of the rule consequent, which are higher than the degreeof activation of the rule. Aggregation is the process that unifiesthe rule outputs: the membership functions of all rule conse-quents previously clipped are combined into a single fuzzy set.Finally, defuzzification is the process that generates a single crispvalue starting from the aggregated output fuzzy set.

In fuzzy modeling, parameters are usually predefined by thedesigner based on experience and problem characteristics; num-ber of rules, antecedents and consequents of rules, linguisticvariables and membership functions may be constructed usingknowledge from a human expert.

Typical choices for the reasoning mechanism are Mamdani-type and Sugeno-type (Yager and Filev, 1994). Common fuzzyoperators are min, max, product, probabilistic and bounded sum.The most common membership functions are triangular andtrapezoidal. For the defuzzification, several methods have beenproposed, with the center of area and the weighted averagemethods being the most popular, respectively, for Mamdani-typeand Sugeno-type fuzzy inference systems (Yager and Filev, 1994).

Table 1Summary of features calculated from a lesion or context.

# Features Type Units

1 Bounding box dimensions Lesion Pixel

2 Voxel spacing (x, y, z) Lesion mm

3 Volume Lesion cm3, voxel

4 Sphericity Lesion Dimensionless

5 Compactness Lesion Dimensionless

6 Tissue contrast Contextual Percentage %

7 Surrounding white matter Contextual Percentage %

3. Materials

3.1. Data description

The dataset used in this study includes 120 patients withclinically definite MS. Their age range is between 20 and 63 years.The clinical data for this study were collected, in an anonymizedform, in the Department of Bio-Morphological and FunctionalSciences of the University of Naples ‘‘Federico II’’.

MR brain images were acquired on a 1.5 T Philips Achievascanner. All patients had the same MR protocol consisting in twospin-echo sequences with the following parameters: (1) spin echosequence: repetition time (TR)/echo time (TE)¼640/30 ms;(2) dual echo sequence: TR/TE¼2200/30 ms, TR/TE¼2200/90 ms. All scans were performed with a 5 mm thickness, 15 mm

slice gap, 32 not interleaved slices, covering the entire brain, a230 mm�230 mm field of view and a 256�256 image matrix.

3.2. Image segmentation

Signal intensity of normal and abnormal tissues in MR imagesis affected by multiple physical parameters, the most importantbeing the longitudinal (T1) and transverse (T2) relaxation timesand the proton density (PD). In a conventional brain MRI study, acomparative evaluation of different sets of images, including T1, T2

and PD-weighted images, is generally required.In this study, the post-processing technique, named Quantita-

tive Magnetic Color Imaging (QMCI), proposed in Alfano et al.(1995) has been applied. Such a technique combines into a singlecolor-coded multispectral standardized image the informationobtained with conventional spin-echo sequences (i.e. R1 (1/T1)and R2 (1/T2) relaxation rates and PD). Starting from generatedQMCI images, the multiparametric segmentation procedure pro-posed in Alfano et al. (2000) is applied to the whole dataset inorder to identify either normal brain tissues, namely gray matter,white matter (WM) and cerebrospinal fluid, or clusters of poten-tially abnormal white matter voxels (i.e. a volumetric pixelrepresenting a value on a regular grid in three dimensional space),labeled as White Matter Potential Lesions (WMPLs).

For each WMPL, the set of features enumerated in Table 1 iscalculated. They can be divided in two groups:

�
features computed directly from the candidate lesion; � contextual features extracted from the lesion’s surrounding.
3.2.1. Features computed from WMPL

This group includes size and shape features. Size features arethe following: (i) x, y and z dimensions of the bounding box (B)enclosing the lesion, given in terms of number of pixels; (ii) x, y

and z voxel spacing, that means the spacing between adjacentvoxels in the x, y and z directions, expressed in millimeters; (iii)the lesion volume, given in terms of the number of voxels (N) andcalculated in centimeters as follows:

V ¼XN

1

x_voxel_spacing � y_voxel_spacing � z_voxel_spacing ð1Þ

It is worth noting that 3D MR images used in this study arecharacterized by different values for x, y and z voxel spacing. Inparticular, voxel spacing values in x and y directions are about0.08984 cm and in z direction is 0.5 cm.

Shape features are sphericity and compactness. Sphericityevaluates how spherical a lesion is. The more elongated the lesionis and deviates from a sphere, the lower sphericity will be. Themeasure of discrete sphericity for a 3D object (Sph3D) applied inthis study resembles the one proposed in (Brown et al., 2003).Essentially, it is defined as the ratio between the volume V of thelesion and the volume of the smallest sphere enclosing the


bounding box B of the lesion, as reported

Sph3D¼3V

4pr3ð2Þ

r¼ 0:5� maxðx1 ,y1 ,z1Þ,ðx2 ,y2 ,z2ÞAB

fðx2�x1Þ � x_voxel_spacing,

ðy2�y1Þy_voxel_spacing,ðz2�z1Þ � z_voxel_spacingg

where (x2�x1), (y2�y1) and (z2-z1) are the bounding box dimen-sions, expressed in terms of spatial coordinates.

On the other hand, compactness evaluates how compact alesion is. The measure of discrete compactness for a 3D object(Comp3D) relates the enclosing surface area, composed of theneighboring voxels of the object, with its volume. The area of theenclosing surface and the volume are both measured using thesame unit, i.e. the voxel. For a given shape, Comp3D is high eitherin the case where the volume is large or in the case where theenclosing surface is small, which means the object is stronglycompact.

Summarizing, compactness is calculated as follows:

Comp3D¼N

Nbð3Þ

where Nb is the number of voxels neighboring the lesion.

3.2.2. Contextual features

This group includes two intensity level features, namely tissuecontrast and surrounding WM. Tissue contrast evaluates theminimum color contrast to detect a WML in QMCI images. Inthe present work, the measure of tissue contrast is defined as adistance factor (DFp) that appraises the distance in the multi-parametric space necessary to detect a WML. It is represented bya percentage value. On the other hand, surrounding WM evalu-ates the amount of WM enclosing a lesion. The measure ofsurrounding WM (SWMp) is calculated in the study as a ratiobetween the surrounding WM and the number of voxels Nb

neighboring the lesion and it is expressed as a percentage value.For the total 120 patients, a set of 2844 WMPLs has been

detected by the segmentation procedure. The features justdescribed, calculated for each WMPL of this set, represent theactual input data for the proposed DSS.

4. Method

The methodology proposed in the work relies on the construc-tion of a knowledge-based DSS responsible of (i) encoding thehigh-level, specialized medical knowledge elicited from clinicalexperts; (ii) making inferences through the application of thisknowledge to the features described in Section 3.2; (iii) drawingconclusions with the final aim of supporting the clinicians’everyday practice.

Essentially, the methodology can be described in terms ofthree stages, namely Knowledge Elicitation, Knowledge Repre-sentation and Knowledge Reasoning. They are, respectively, incharge of eliciting the knowledge from the experts, representingsuch an expertise by means of knowledge modeling techniquesand, finally, constructing the reasoning engine to process theformalized know-how. These three stages are diffusely describedlater.

4.1. Knowledge Elicitation

The aim of Knowledge Elicitation is to obtain a body ofknowledge that is as complete, consistent and correct as possible(Fox et al., 1985). Most common methods of Knowledge Elicita-tion involve drawing out information from human experts and

range from informal or semi-structured interviews and observa-tions to more structured methods like the transcription andanalyses of verbal reports or conceptual techniques such asladdering, hierarchical sorting, graph construction, etc. (Cooke,1994). Some of these elicitation techniques can be conductedthrough the interaction between human experts and purposelybuilt computer tools (Khan and Hoffmann, 2003).

In the medicine context, the application of these techniques isaimed at making the implicit body of medical knowledge, knownas tacit knowledge, more explicit and, contextually, at motivatingphysicians to provide an explanation to their actions. Generallyspeaking, tacit knowledge refers to the unarticulated knowledgethat physicians apply in daily tasks, but are not able to describeeasily in words.

The main goal of Knowledge Elicitation procedure applied inthis work has been to gain an understanding of the terminologyand the reasoning path applied by neuroradiologists when mak-ing decisions about WM lesions.

An exploratory approach has been followed at this stage andthink-aloud protocols have been analyzed. In more detail, a teamof five neuroradiologists employed at the Department of Bio-Morphological and Functional Sciences of the University of Naples‘‘Federico II’’ has taken part by interpreting WM lesions from 10symptomatic patients: 5 with reported WMPLs but no actual WMlesion and 5 with reported WMPLs that included many actual WMlesions.

The Knowledge Elicitation procedure has been conductedindividually with each radiologist at his/her own workplace. Theparticipants were presented with the 10 sets of MR studies, oneset at a time and shown by means of a conventional medicalimage viewer. All radiologists saw the same MR studies, though indifferent sequences. The participants were asked to read each MRstudy as they would in a normal clinical situation and to ‘‘thinkaloud’’, reporting everything that went through their mind. Sincethe method of representing medical knowledge in terms of rulesappears more appropriate and generally accepted (Kong et al.,2008; Straszecka, 2004), because it is easily usable and under-standable also by a non-technical audience, e.g. the clinicians, wehave instructed the participants to analyze all the features out-lined in Section 3.2 on the MR images, and to formalize a tentativediagnosis in terms of the heuristic rules. As a result, the physi-cians have formulated rules at a higher level of abstraction,expressing their knowledge in form of qualitative linguistic labelsassociated to the features defined in Section 3.2.

These verbal reports were then recorded on audio-tape. Theresulting audio-recorded think-aloud reports were transcribedand an overall textual report containing all the different con-tributions provided by the participating radiologists was finallygenerated.

After that, subsequent analyses have been carried out with thefinal aim of identifying the linguistic labels used by the radiolo-gists to describe the reported WM lesions starting from thefeatures defined in Section 3.2. As indicated by the participatingradiologists, the most commonly used linguistic labels for eachfeature and for the expected diagnostic outcome are summarizedin Table 2. The table includes those linguistic labels noted bythree or more participants.

After that, successive analyses have been performed to identifythe reasoning path followed by the radiologists to provide atentative diagnosis for the reported WMPLs starting from thelinguistic labels shown in Table 2. The outcome of these analysesis synthesized, in natural language, as follows.

The tissue composing a WMPL is abnormal in the case wherethe lesion is somewhat surrounded by WM, characterized by astrong compactness and greatly contrasted in the multipara-metric space (that is, it borders with different kinds of cerebral

�

Fig. 1. Knowledge Representation approach to model descriptive and procedural

knowledge.

Table 2Linguistic labels identified by the participating radiologists.

Feature Linguistic labels

Volume Small, medium, large

Sphericity Low, moderate, high

Surrounding white matter Bit, partially, almost completely, completely

Compactness Weak, strong

Tissue contrast Little, great

Tissue structure Normal, abnormal


tissue). The sphericity is moderate or high in small lesions;whereas as their volume increases, the sphericity starts decreas-ing progressively.

Finally, a correlation was identified between the volume,shape features and the percentage of surrounding white matterof a WMPL: as volume increases and sphericity starts lessening, alesion can be surrounded by gradually decreasing WM and itscompactness still remains high.

The reasoning path just described includes only the heuristicrules formulated by three or more participants. The linguisticlabels indicated in Table 2 and such a reasoning path representsthe outcomes produced by the Knowledge Elicitation procedure.

4.2. Knowledge Representation

The elicited knowledge implicitly includes both the structure ofthe domain knowledge, named descriptive knowledge, and the pro-cessing that takes place throughout the whole course of the decisionmaking activity, named procedural knowledge. In particular, descrip-tive knowledge concerns the domain compositional elements, suchas raw and abstract concepts, their properties and inter-relations. Onthe other hand, procedural knowledge captures the behavioral logicand provides more explicit information about the actions to be takenor conclusions to be drawn from descriptive knowledge.

In this work, techniques of ontology modeling have been utilizedto represent descriptive knowledge, whereas procedural knowledgehas been modeled by means of fuzzy logic techniques, as synthe-sized in Fig. 1 and described in detail in the next sub-sections.

4.2.1. Ontology modeling

In such a scenario, descriptive knowledge includes both theterminological structure identified in Table 2 and the numericmeasures associated to the features defined in Section 3.2.

In detail, a middle-out approach has been employed to identifythe basic terms composing the domain structure. This strategyenables to identify the most fundamental terms in the domain ofinterest before moving on to more abstract and more specific

terms. In such a way, it strikes a balance in terms of the leveldetail. Indeed, detail arises only as necessary, by specializing thebasic terms, so some effort is avoided and the higher level termsare more likely to be stable. This, in turn, leads to less re-work andless overall effort (Uschold and Gruninger, 1996).

Next, the properties of the terms, i.e. attributes, and therelationships between these terms have been specified. The resultso far has been a summary containing all terms, relationships andattributes to be used afterwards to build the ontology describedbelow, in accordance with the following design criteria:

�
Definition of primitive and defined elements. A primitive elementhas been considered the essential factor to the representationof the specialty in the ontology, since it cannot be expressed interms of other elements. On the other hand, a defined elementhas been built by means of a closed-form description. Bybasing the ontology on primitive elements, it has been possibleto remove the circularity found in other types of informationsources, such as dictionaries, where terms can be defined withrespect to other ones. All other definitions of elements withinthe ontology can be traced back to the primitive ones. � Clear distinction between concepts and individuals. Terms pre-
liminarily identified have been modeled in the ontology asconcepts or individuals, each of them having been assigned aspecified role: individuals have been considered as primitiveelements, whereas concepts have been built as defined ele-ments. In particular, defined concepts can be specified as:� a class of all individuals that satisfy a particular restrictions

formulated on attributes and/or relationships;� enumeration of individuals. In other words, the meaning of

each concept Ci has been formalized in the ontology byspecifying all its possible individuals aj

Ci ¼ aij

n oj ¼ 1,...,mi

, i¼ 1, . . ., nc

where aij belongs to Ci, nc is the number of enumerated

concepts of the ontology and mi is the number of indivi-duals for the ith concept.

Restricted usage of the possible types of relationship. The onlykind of relationship adopted in the ontology is the role, i.e. abinary relationship between concepts; whereas n-ary relation-ships, with n greater than two, have not been used. Moreover,the is–a relationship has not been utilized and, as a result, noform of inheritance has been considered. Both these choiceshave been carried out since the aim of this ontology is toprovide only a very simple and intuitive definitional descrip-tion of terms to be used in the rules for assembling theirantecedent and consequent parts. In such a sense, the use of aminimal ontology with a limited expressiveness, very similarto a dictionary of terms, can simplify undoubtedly the writingof rules and, more in general, their final structure and makethem easy to be written and understood by a non-technicalaudience, e.g. physicians.
� Unambiguous and consistent usage of attributes and relation-
ships. Attributes and relationships between concepts havebeen characterized by a well-specified semantics throughproperties and restrictions. In particular, relationships havebeen defined in terms of domain, i.e. the admissible subjectconcept, and range, i.e. the admissible object concept. Attri-butes have been similarly characterized by a domain and arange, but, in this case, the range identifies a value type ratherthan a set of possible object concepts.
� Specification of a user-defined vocabulary. The ontology is
essentially ‘‘an application ontology’’ (Guarino, 1997), describ-ing concepts dependent on a particular domain and task; inparticular, the domain chosen is the neuroradiology applied to


multiple sclerosis, and the task is the identification of WMLsby means of MR brain images previously segmented. For thisreason, the vocabulary for this ontology has been constructedas a collection of terms strictly related to the specific applica-tion, drawn up in cooperation with the team of neuroradiol-ogists involved in the study.

The ontology is organized as follows. The expert knowledge,expressed in form of qualitative linguistic labels just reportedin Table 2, is formalized by means of concepts and individuals. Indetail, the features volume, sphericity, surrounding white matter,compactness and tissue contrast are modeled in terms of conceptswith one-to one mapping, whereas the nature of the tissuecomposing a WMPL, which represents the expected diagnosticoutcome, is modeled with the concept tissue structure. They areformalized as defined concepts built in form of collections ofindividuals, each of them representing a possible linguistic labelidentified in Table 2 by the experts.

The core concept of the ontology, named potential lesion,represents a possible lesion identified in the white matter; itcan be associated to the other concepts through the roles listed inTable 3. Moreover, it is formalized as a defined concept built byapplying a collection of restrictions on such roles.

Finally, a set of attributes have been defined as listed in Table 4so as to associate the numerical values calculated for the featuresreported in Section 3.2 to the concept potential lesion. In addition,the attribute Confidence Degree has been also defined to associatea confidence measure to a WMPL indicating the likelihood that itcorresponds to an actual lesion. The notion of confidence will besoundly explained in the Section 4.3.

Table 3Roles of the ontology.

Role Domain Range

hasVolume Potential lesion Volume

hasSphericity Potential lesion Sphericity

hasCompactness Potential lesion Compactness

hasTissueContrast Potential lesion Tissue contrast

isSurroundedByWhiteMatter Potential lesion Surrounding white matter

hasTissueStructure Potential lesion Tissue structure

Table 4Attributes of the ontology.

Attribute Domain Data type

hasVoxelNumber Potential lesion Integer

hasSph3D Potential lesion Real

hasComp3D Potential lesion Real

hasDFp Potential lesion Integer

hasSMWp Potential lesion Integer

hasConfidenceDegree Potential lesion Real

(1) IF [PotentialLesion hasSphericity (Moderate OR High)] AND [PotentialLe[PotentialLesion hasTissueContrast Great] AND [PotentialLesion isSurrounTHEN [PotentialLesion hasTissueStructure Abnormal]

(2) IF [PotentialLesion hasSphericity Moderate] AND [PotentialLesion hasCom[PotentialLesion hasTissueContrast Great] AND [PotentialLesion isSurrounTHEN [PotentialLesion hasTissueStructure Abnormal]

(3) IF [PotentialLesion hasCompactness Strong)]AND [PotentialLesion hasVo[PotentialLesion isSurroundedByWhiteMatter (Partially OR AlmostCompleTHEN [PotentialLesion hasTissueStructure Abnormal]

(4) ELSE [PotentialLesion hasTissueStructure Normal]

Fig. 2. Procedural Knowledge elicited from experts an

4.2.2. Fuzzy modeling

The procedural knowledge relying on this descriptive structurecan be extracted from the reasoning path reported in Section 4.1and built in terms of if-then rules on the top of concepts andproperties defined in the ontology. In detail, three ‘‘if-then rules’’to identify the cases when a potential lesion is an actual one havebeen formulated as outlined in Fig. 2.

In addition, an ‘‘else rule’’ has been added to determine, in allthe remaining situations, all the potential abnormal cases that arenot actual lesions. Since the neuroradiologists’ expertise has beenformulated at a high level of abstraction without specifying clear-cut boundaries between concepts, the reported rules have beenformalized in accordance with the fuzzy theory. In particular, thedefinition of linguistic variables, linguistic values and member-ship functions has been carried out according to the following setof criteria, with the final aim of granting the interpretability:

�

siondedB

padedB

lumetely

d fo

Semantics: linguistic variables and their possible values have asemantic meaning well defined by the ontology. Fundamen-tally, each linguistic variable is one of the ontology conceptsdefined and its possible linguistic values are the admissibleindividuals for the corresponding concept.
� Distinguishability: the universe of discourse for each linguistic
variable is determined depending on the range of possiblevalues in the input dataset that the corresponding featuresdescribed in Section 3.2 can assume. Since all linguistic valueshave a semantic meaning, the corresponding fuzzy sets aredistinguishable, i.e. well disjoint with defined ranges in thesame universe of discourse, so as to represent distinct conceptsand can be associated to metaphorically different linguisticvalues (Mencar et al., 2007). Therefore, this enables to clearlyassign, starting from a numerical input, a possible value to alinguistic variable (i.e. an admissible individual to a concept).
� Coverage: any element from the universe of discourse belongs
to at least one of the fuzzy sets defined for the linguisticvalues.
� Normalization: Since all linguistic values have a semantic
meaning, for each of them, at least one element of the universeof discourse has a membership value equal to one.
� Orthogonality: for each element of the universe of discourse,
the sum of all its membership values is equal to one.

The linguistic variables, linguistic values and correspondingmembership functions opportunely defined and iteratively tunedwith the collaboration of the physicians have been reportedin Fig. 3.

In more detail, each linguistic variable is described in terms ofa set of graph curves representing the membership functions ofits linguistic values and by means of a table reporting a numericaldescription about the curve shapes. With regards to the inputlinguistic variables, each membership function associated to alinguistic value has a trapezoidal shape and is specified in a row of

hasCompactness Strong)]AND [PotentialLesion hasVolume Small] ANDyWhiteMatter Completely]

ctness Strong)]AND [PotentialLesion hasVolume Medium] ANDyWhiteMatter (AlmostCompletely OR Completely)]

Large] AND [PotentialLesion hasTissueContrast Great] ANDOR Completely)]

rmalized in rules on the top of the ontology.

Fig. 3. Membership functions for the linguistic variables.

Fig. 4. Knowledge Reasoning architecture.


the corresponding table by an ordered set of four real values,where the first represents the starting value of the leading edge,the second its ending value, the third the starting value of thetrailing edge and the fourth its final value. The membershipfunction for the output variable tissue structure is defined bytwo singleton spikes, respectively, in 1 for the value normal and in2 for the value abnormal.

4.3. Knowledge Reasoning

The Knowledge Reasoning is the process of combining theknowledge formalized in the ontology with the fuzzy logicinference morphology, so as to enable the approximate humanreasoning capabilities of drawing conclusions from existing vaguedata. The architecture shown in Fig. 4 has been devised to executethis task and consists of a fuzzy inference engine and a knowledgebase.

4.3.1. The fuzzy inference engine

The fuzzy engine implements a zero-order Sugeno-type fuzzyinference method that resembles the Fuzzy Inference Ruled byElse-Action (FIRE) method proposed in Russo and Ramponi (1995)to filter images. In the following, the basic knowledge about theproposed method is described.

Definition 1. Given M fuzzy rules, N continuous input variablesv1, v2, y, vN and one discrete output variable s, the formulation of

the Sugeno-type fuzzy rules is defined as follows:

IF ðv1,A11Þ AND=OR ðv2,A12Þ. . .ðvN ,A1NÞTHENðs,B1Þ

IF ðv1,A21Þ AND=OR ðv2,A22Þ. . .ðvN ,A2NÞ THEN ðs,B2Þ. . .

IF ðv1,AM1Þ AND=OR ðv2,AM2Þ. . .ðvN ,AMNÞ THEN ðs,BMÞELSE ðs,BEÞ

Aij ¼ ðaijÞ AND=OR ðbijÞ. . .AND=OR ðzijÞ ð4Þ


where Aij is a logical expression associated to each rule i andcalculated for the variable j in accordance with its linguisticvalues aij, bij, y, zij, whereas Bi and BE are the singleton spikesassociated, respectively, to rule i and to the ELSE rule calculatedfor the output variable s. This formulation models a disjunctivesystem of rules where at least one rule must be satisfied, i.e. therules are linked by OR connectives. The ELSE rule is activatedwhen the other rules are weakly satisfied or not satisfied at all.

The introduction of the logical expression Aij in this formula-tion is motivated by the need of using different AND and ORoperators depending on whether they connect fuzzy values ofdifferent linguistic variables or fuzzy values of the same linguisticvariable. In fuzzy theory, AND and OR operators are oftenmodeled as fuzzy-set intersection and union in the rules. Thisinvolves that, for instance, the disjunction of two contiguous andorthogonal fuzzy sets, modeled as the fuzzy-set union, cangenerate a scarcely intuitive behavior. Indeed, in the area inwhich the two fuzzy sets are partially overlapping, the uncer-tainty is the greatest between the corresponding values of the twofuzzy sets. Differently, in the common sense, the disjunction inthe overlapping area should have an additive behavior, thatmeans the values of uncertainty of both the fuzzy sets in thatarea should be summed.

As a result, we have chosen to model AND and OR operators asfuzzy-set intersection and union in the case when fuzzy values ofdifferent linguistic variables have to be logically connected and asproposed in the Lukasiewicz Logic (Gilesa, 1975), in accordancewith a common sense fashion, in the case when fuzzy values ofthe same linguistic variable have to be logically connected. Theformal description of the operators used is reported in thefollowing definition.

Definition 2. The strength level l of the rule i is calculated interms of grade of membership for the antecedent clause. Inparticular, when the antecedents are connected by the ANDoperator the strength level is formulated as

li ¼minfmAijðvjÞ; j¼ 1,2, . . ., Ng ð5Þ

whereas, when the antecedents are connected by the OR operator,it is defined as

li ¼maxfmAijðvjÞ; j¼ 1,2, . . ., Ng ð6Þ

The logical expression Aij associated to each rule i is calculated

for the variable j in terms of grade of membership for its linguistic

values aij, bij, y, zij in the antecedent clause. In particular, when

the linguistic values of the variable in the antecedent clause are

connected by the AND operator, the logical expression Aij is

calculated as

Aij ¼max 0,X

y ¼ aij ,bij , ..., zij

myðvjÞ�ðN�1Þ

8<:

9=; ð7Þ

whereas, when they are connected by the OR operator Aij it is

calculated as

Aij ¼min 1,X

y ¼ aij ,bij , ..., zij

myðvjÞ

8<:

9=; ð8Þ

Definition 3. The strength level lE of the ELSE rule is calculatedby applying a NOT operator to the strength levels li of the otherrules, since the ELSE rule is activated when all the other ones arepartially or completely unsatisfied. Moreover, since the IF-THENrules are connected by OR connectives, the results obtained byapplying the NOT operator to the strength levels li are combinedin a conjunctive system, i.e. by means of AND connectives.

Summarizing, the strength level lE is defined as

lE ¼minf1�liðvjÞ; i¼ 1,2, . . ., Mg ð9Þ

Definition 4. The operations of implication and aggregation forall the M rules are conjunctly calculated as a weighted average ofthe strength levels using the numerical values associated to thesingleton spikes defined in the output universe. In detail, theweighted average is formulated as

WAðsÞ ¼

PM1 lixBi

þlExBEPM1 liþlE

ð10Þ

Definition 5. The result of the weighted average is a continuousappraisal value of the classification produced by the wholeinference process. The final discrete output is produced by simplyidentifying the singleton spike whose numerical value is nearestto the appraisal value with respect to a fixed threshold value.

Moreover, a confidence measure is calculated as a function ofthe distance existing between the appraisal value and the numer-ical value of the final discrete output. This measure represents thesystem’s trustworthiness in its outcomes expressed on an ordinalscale; specifically, the larger the distance is evaluated, the less thesystem is confident in the generated results.

Summarizing, the final discrete output and the linked con-fidence are defined as

if ð9WAðsÞ�xBi9othresholdÞs¼ Bi, C ¼ 1�9WAðsÞ�xBi

9; i¼ 1,2, . . ., M

if ð9WAðsÞ�xBE9othresholdÞs¼ BE, C ¼ 1�9WAðsÞ�xBE

9

ð11Þ

where threshold is included between 0 and 1.

4.3.2. The knowledge base and the reasoning process

The knowledge base is made of the components described asfollows:

�
T-Box: the terminological box contains the intentional part ofthe formalized knowledge, i.e. the ontology, the rules built interms of concepts and roles and, for each concept equivalentlyformalized with a linguistic variable, the relative membershipfunction. � A-Box: the assertional box contains the extensional part of the
formalized knowledge, i.e. the individuals (instances of con-cepts) with the corresponding instances of roles and attributes.It is dynamically populated starting from features extracted bythe preliminary multiparametric segmentation procedure.

With these components in mind, the Knowledge Reasoningprocess can be delineated in the following manner. Input dataextracted by the preliminary multiparametric segmentation pro-cedure mainly include all the WMPLs and their peculiar features.Starting from these input data, a corresponding number ofindividuals for the concept potential lesion is instanced. More-over, the numerical values calculated for the input features arealso associated to the corresponding attributes of each individual.

All the individuals are successively stored in the A-Box asstatements that will be used to match with the IF part of the rulesstored in the T-Box of the knowledge base. In particular, the fuzzyinference engine will be responsible for linking the rules in theT-Box with the data in the A-Box and performing the decision-making process.

In more detail, the engine compares the individuals in theA-Box with the membership functions in the rule antecedentclause, so as to obtain the membership values of each linguisticvalue. Then, it calculates the strength level of each rule inaccordance with the method formalized in the definitions 1,2 and 3. After that, the qualified rule consequences are calculated


and aggregated to form a unified output by applying the proce-dure described in the definition 4. Finally, the discrete output andthe associated measure of confidence are resolved as described inthe definition 5.

The output produced by the Knowledge Reasoning process is atextual report containing the dichotomous (normal/abnormal)diagnosis, its confidence expressed on an ordinal scale and anexplanation of it. This textual report is also linked back to the MRimages where the actual WMLs are highlighted. The explanationof the diagnosis summarizes the reasoning path executed and thestrength level calculated for each rule fired. Moreover, the crispvalues of the input dataset and their fuzzified values are alsoindicated in order to enable physicians to evaluate the suggestionautomatically generated by the DSS by also referring to the linkedMR images. All this information is expressed in terms of ontologyconcepts, roles and attributes with the final aim of providing asemantic description extremely simple and intuitive to be readand understood also by a non-technical audience. An example ofoutput generated by the Knowledge Reasoning process isreported in the following:

Classification resultsPotentialLesion_1

hasTissueStructure AbnormalhasConfidenceDegree 0.898

Reasoning path executedRule 1: Strength level 0.8876471350267597

IF [PotentialLesion hasSphericity (Moderate OR High)] AND [PotentialLesion hasCompactness Strong)]AND [PotentialLesionhasVolume Small] AND [PotentialLesion hasTissueContrast Great] AND [PotentialLesion isSurroundedByWhiteMatterCompletely] THEN [PotentialLesion hasTissueStructure Abnormal]

Rule 2: Strength level 0.11235286497324028IF [PotentialLesion hasSphericity Moderate] AND [PotentialLesion hasCompactness Strong)]AND [PotentialLesion hasVolume

Medium] AND [PotentialLesion hasTissueContrast Great] AND [PotentialLesion isSurroundedByWhiteMatter (AlmostCompletelyOR Completely)]

THEN [PotentialLesion hasTissueStructure Abnormal]

Rule Else: Strength level 0.11235286497324026ELSE [PotentialLesion hasTissueStructure Normal]

Crisp values characterizing the lesion Fuzzy values characterizing the lesionPotentialLesion_1 PotentialLesion_1

hasVoxelNumber 10 hasVolume Small 0.887 Medium 0.113hasSph3D 0.08 hasSphericity Moderate 1.0hasComp3D 0.45 hasCompactness Strong 1.0hasDFp 1.00 hasTissueContrast Great 1.0hasSMWp 1.00 isSurroundedByWhiteMatter Completely 1.0

5. Evaluation

Review papers (Trivedi et al., 2002; Kaplan, 2001; Tierney,2001; Johnston et al., 1994) point out that the vast majority ofDSSs presented in literature are evaluated in a somewhat artificialcontext, i.e. as stand-alone software systems designed to operatein parallel to, but not necessarily in support of the physician.

Moreover, they are predominantly evaluated depending on theirdiagnostic performance, in terms of accuracy, sensitivity, specificity,or complete ROC curves, but a wide array of issues raised by the useof these systems in a real clinical environment is sidestepped. Theseissues pertain mainly to the role of the system in relation to thephysician and, in particular, regard to the extent DSSs have aninfluence on the diagnostic activities of physicians, and whether thisinfluence can be quantified (Berner et al., 1999; Hunt et al., 1998).

The literature that addresses these and similar issues is muchsmaller (Haug et al., 2003; McCowan et al., 2001), but, it is

absolutely relevant to assessing the role and potential benefitsof DSSs in real clinical settings.

As a result, in order to face these issues, the evaluation of theproposed DSS has been performed first analyzing quantitativeperformance measures, namely ROC curve and similarity mea-sures, and then looking directly at the effect that the system hason physicians, i.e. finding out how the system can influence thediagnostic opinion of physicians, regardless of the diagnosticperformance measured.

5.1. Performance analysis

In the absence of histological confirmation, the true WMLs arenot known. Therefore, the expertise of the team of neuroradiol-ogists participating at this study and employed at the Departmentof Bio-Morphological and Functional Sciences of the University ofNaples ‘‘Federico II’’ has been utilized in order to establish thetrue lesions among the identified WMPLs. In particular, only 1905out of 2844 detected WMPLs have been classified as provenlesions by the team of neuroradiologists. This result is considered

as the gold standard in the study.Moreover, according to the WML load, patients have been

arbitrarily grouped into four categories:

1.
All patients (n¼120). 2. Patients with small lesion load (n¼50). 3. Patients with moderate lesion load (n¼41). 4. Patients with large lesion load (n¼29).
The patients have been inserted into categories 2, 3, 4,respectively, if the total lesion volume in the whole brain was(i) less than 3 cm3, (ii) greater than or equal to 3 cm3 and less thanor equal to 10 cm3 and (iii) greater than 10 cm3.

Binary classifications of WM lesions were produced by apply-ing different thresholds in order to obtain discrete outputs fromthe continuous appraisal values generated by the fuzzy engine.


These classifications were compared with the gold standard,where the number of correctly classified voxels, that is, the truepositives (TP) and true negatives (TN), was counted as well as thenumber of false positives (FP) and false negatives (FN). The truepositive rate (TPR), which is the sensitivity, and the false positiverate (FPR), which is 1-specificity, were calculated for the thresh-old, running from 0 to 1. They are defined by

TPR¼TP

TPþFNð12Þ

FPR¼FP

FPþTNð13Þ

The TPF was represented in an ROC curve as function of the FPR

for all the patient categories listed above.Furthermore, the binary classifications were evaluated by

three different similarity measures proposed in (Anbeek et al.,2004): the similarity index (SI), the overlap fraction (OF) and theextra fraction (EF).

The SI is a measure for the correctly classified lesion volumerelative to the total volume of WML in both the gold standard(reference) and the DSS outcome. The OF measures the correctlyclassified lesion volume relative to the WML volume in thereference. The EF measures the volume that is falsely classifiedas lesion relative to the WML volume in the reference. These

QMCI Preliminary multiparametric segmentation

Manual segmentation(the Gold Standard)

DSS Classification (threshold = 0.75)



Fig. 5. Classification of a patient with a small lesion load by applying the

thresholds of 0.25, 0.50 and 0.75.

similarity measures are formally defined as follows:

SI¼2ðVolREF \ VolDSSÞ

VolREFþVolDSSð14Þ

OF ¼VolREF \ VolDSS

VolREFð15Þ

EF ¼VolREF \ VolDSS

VolREFð16Þ

In these definitions, VolREF and VolDSS denote the WML volumesestimated through the manual expert classification and the classifica-tion automatically carried out by the DSS, respectively. The intersec-tion of VolREF and VolDSS represents the volume of the correctlyclassified voxels. The volume Vol

REF\ VolDSS corresponds to the false

positives. In all these measures the spatial correspondence betweenthe automatic and manual classification was considered.

If SI and OF are close to 1 and EF computation is close to 0, thenthe WML volume determined by the automatic classification ofthe DSS is similar to the manual outcome obtained by a physician.The similarity measures thus allow the evaluation of the binaryclassifications performed by the DSS in a quantitative and objec-tive way. The SI was represented in a graph as function of thethreshold, running from 0 to 1, for all the patient categories. Thesimilarity measures were calculated for all the patient categories.

Binary classifications of WM lesions have been performed onthe whole dataset composed of 120 patients. In Figs. 5–7, example





DSS Classification (threshold = 0.25 )

Fig. 6. Classification of a patient with a moderate lesion load by applying the



images are shown of the classification results of patients with asmall, moderate and large lesion load.

The images demonstrate that the choice of the threshold toobtain discrete outputs from the continuous appraisal valuesgenerated by the fuzzy engine has large influence on the binaryclassifications. A higher threshold increases the specificity of theresult, but has a negative effect on the sensitivity.






Fig. 7. Classification of a patient with a large lesion load by applying the


0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4

Sen

sitiv

ity

1 - S

RO

Fig. 8. ROC curves of binary classifications over all the pa

5.1.1. ROC analysis

The ROC curves were calculated for the classifications of all thepatient categories. Fig. 8 shows a detail of the ROC curves withthresholds running from 0 to 1. The areas under the curves havebeen calculated and are presented in Table 5.

The ROC curves show that the classifications performed by theDSS with variable thresholds produce better results in terms ofspecificity for patients with a large lesion load rather than forpatients with a moderate or small lesion load. Indeed, the ROCcurve for patients with a large lesion load follows the left-handborder of the ROC space closer than the curves for the otherpatient categories. This means that, given a threshold, the numberof false positive is lower in patients with a large lesion load. Inparticular, this ROC curve starts following the left-hand borderand then the top border of the ROC space closer than the otherones. For values of specificity included between 0.2 and 0.7, itinverts this trend and starts following the top border of the ROCspace farther than the other curves. Finally, the trend of all thecurves is approximately overlying.

In accordance with these considerations about the curvetrends, it is possible to note that better results in terms ofsensitivity are instead generated for patients with a moderatelesion load. Indeed, the ROC curve for patients with a moderatelesion load follows the top border of the ROC space closer than thecurves for the other patient categories. This means that, given athreshold, the number of false negative is lower in patients with amoderate lesion load.

Finally, also the areas under the ROC curves confirm that theclassifications performed by the DSS produce more accurate

0.5 0.6 0.7 0.8 0.9 1pecificity

C Curve

All PatientsSmall Lesion Load

Moderate Lesion LoadLarge Lesion Load

tient categories with threshold running from 0 to 1.

Table 5Area under Roc curve and similarity measures.

Threshold All patients Small lesion

load

Moderate

lesion load

Large

lesion load

AUC – 0.8515 0.8276 0.8473 0.8749

SI 0.25 0.9424 0.8049 0.9312 0.9714

0.50 0.9482 0.8347 0.9421 0.9691

0.75 0.9054 0.7276 0.8936 0.9365

OF 0.25 0.9695 0.9287 0.9780 0.9707

0.50 0.9538 0.8843 0.9669 0.9568

0.75 0.8633 0.6711 0.8613 0.8918

EF 0.25 0.0879 0.3787 0.1225 0.0278

0.50 0.0580 0.2345 0.0857 0.0178

0.75 0.0435 0.1733 0.0662 0.0127

0.4

0.5

0.6

0.7

0.8

0.9

1

Sim

ilarit

y In

dex

Threshold

Similarity Index Graph

All PatientsSmall Lesion Load

Moderate Lesion LoadLarge Lesion Load

0 0.1 0.2 0.3 0.4 10.6 0.7 0.8 0.90.5

Fig. 9. Similarity index graph of binary classifications over all the patient categories with threshold running from 0 to 1.

Table 62�2 Contingency table DSS suggestion–physician opinion.

Physician opinion

Change No change

DSS suggestion DSS diagnosis 10 59

DSS diagnosis+confidence+explanation 27 42


results for patients with a large lesion load rather than forpatients with a moderate or small lesion load.

5.1.2. Similarity measures

Fig. 9 presents the SI for the binary classifications of all thepatient categories with the threshold running from 0 to 1. Similarto the ROC curves, the SI graph shows that the best performance isreached for patients with a large lesion load.

Fig. 9 also shows that the optimal threshold for the binaryclassification in all the patient categories is approximately0.5. Table 5 shows the SI, OF and EF of the binary classificationswith three different thresholds (the optimal threshold is included,too) for all the patient categories.

On one hand, the SI improves significantly and the EF becomesvery small in the cases when the lesion load increases. On theother hand, the OF is not characterized by notable differences inpatients with a large or moderate lesion load, but it decreasesappreciably in patients with a small lesion load.

Finally, Table 5 also suggests that the optimal thresholdidentified by means of the SI graph is able to determine the besttrade-off values for all the similarity measures if evaluatedconjunctly as a whole.

5.2. Statistical analysis on the effect of DSS on physician opinion

The statistical analysis performed has been aimed at evaluat-ing the extent the proposed DSS has an influence on thediagnostic tasks of physicians, and whether this influence canbe quantified.

In such a sense, a study was arranged and involved a team of10 volunteer students working for the specialty of Radiology atthe Faculty of Medicine of the University of Naples ‘‘Federico II’’.Under the pretense of measuring their diagnostic performance,each participant was presented with a sequence of 3 MR studiescontaining in total 25 WMPLs. After seeing a WMPL, the partici-pants had to give a dichotomous normal/abnormal diagnosis. Thesystem then provided its own suggestion.

Since the dichotomous diagnoses were collected, it was pos-sible to quantify the effect of a contradiction, i.e. DSS differs fromstudent opinion on diagnosis. In particular, the correct diagnosis(gold standard) was known for all 25 WMPLs. Of these WMPLs,only 14 were actual lesions. The WMPLs were chosen to cover awide range of diagnostic difficulty. Moreover, to investigate theeffect of a disagreement, the proposed DSS was set up with an ad-hoc configuration in order to contradict radiology students onpurpose and without their knowledge. In particular, by exploiting

the gold standard known for the 25 WMPLs, the DSS was forced togive a wrong diagnosis for 8 of 25 WMPLs.

This setup was necessary as, otherwise, there might never be adisagreement between the system, which was built with theexpertise of a team of neuroradiologists and the participatingradiology students. As a result, the participants did not know thatsome of the DSS answers they were given were false. In this way,it was possible to determine how physicians reacted when theDSS provided concordant and discordant suggestions. In particu-lar, the system has been evaluated from the point of view of thefollowing motivating question:

Do physicians change their opinion when contradicted by theDSS more often when the DSS output is not only a dichot-omous suggestion but also contains both a confidence and asemantically defined explanation of it?

In particular, the students’ responses have been analyzed to seewhether there is any difference in the willingness to follow the DSSoutputs when the latter are composed of the only dichotomousdiagnosis (i.e. normal/abnormal tissue) or when they include alsothe confidence of diagnosis on an ordinal scale and an explanationof it that is semantically formalized and, for this reason, simple andintuitive to be understood also by a non-technical audience.

In such a sense, the behavior of the participants was investi-gated by applying the chi-square analysis. In more detail, thedataset for the analysis was built in the following way. Eachparticipant was presented with the sequence of 3 MR studiescontaining 25 WMPLs. For all the WMPLs, the participants gavetheir dichotomous normal/abnormal diagnosis and, then, the DSSprovided its own diagnosis linked to the pertaining MR images.Over all 10 participants and 25 WMPLs, the students’ initialdichotomous diagnosis disagreed with that of the DSS in 69 cases.In 10 of these cases (14.49%), the students changed their opinionto follow the DSS suggestion. After this first step, the systemprovided again its own diagnosis for all the cases and, in addition,it also reported both the confidence and the explanation of its

Table 7DSS suggestion–physician opinion cross-tabulation.

Physician opinion

Change No change Total

DSS suggestion DSS diagnosis Count 10 59 69

Expected count 18.5 50.5 69.0

DSS diagnosis+confidence+explanation Count 27 42 69


Total Count 37 101 138



diagnosis. Here, out of 69 cases, 27 students (39.13%) changedtheir opinion to follow the DSS suggestion.

From this material, a 2�2 contingency table (shown inTable 6) involving the variables DSS suggestion (DSS diagnosis,DSS diagnosis+confidence+explanation) and physician opinion

(change, no change) has been constructed with all 69 cases inwhich there is disagreement between the students’ and the DSS’sdiagnosis.

It has been also hypothesized that 10 opinion changes whenthe system provided only its own diagnosis (out of 69 times thesystem advised against the initial diagnosis) are significantlydifferent from 27 cases when the system provided also itsconfidence and explanation. This means that the null hypothesisto be tested with the chi-square analysis was the following:

Ho: there is no pattern of relationship between the students’changes of opinion and the type of suggestion produced bythe DSS.

As first step of the analysis, the cross-tabulation reported inTable 7 has been calculated, showing that the expected countfrequency in each of the four cells generated by the factorialcombination of DSS suggestion and physician opinion is greaterthan 5. This means that the analysis does not violate the mainassumption underlying the chi-square test, i.e. the latter shouldonly be applied when at least 80% of the cells have expectedfrequencies of five or larger. Otherwise, applying the analysiswhen there are fewer cells with this minimum expected fre-quency can lead to inaccurate results.

After that, the Pearson Chi-Square test has been performed inorder to test the validity of the null hypothesis. According to theresults generated, w2 (df¼1)¼10,672, po0.001, it has beenpossible to reject the null hypothesis and conclude that there isa relationship between the values on the variable DSS suggestion

and values on the variable physician opinion, in the populationrepresented in the sample. In other words, the students’ opinionchanges when the system provided only its own diagnosis aresignificantly different from the cases when the system providedalso the linked confidence and explanation. In particular, asreported in the cross-tabulation, it can be seen that the students,who were given with the DSS diagnosis and its confidence andexplanation, have showed a major and significant willingness tofollow the DSS outputs and change their initial opinion.

6. Discussion and conclusions

The ontology-based fuzzy decision support system provides aknowledge-based method to automatically support neuroradiol-ogists in the WML classification with high sensitivity and speci-ficity for all the patient categories.

The DSS encodes high-level medical knowledge elicited fromexperts in terms of ontologies and fuzzy rules and applies such a

knowledge in conjunction with a FIRE inference engine so as toclassify WMLs and determine their volumes. Its output is atextual report containing a dichotomous (normal/abnormal) diag-nosis and the confidence of this diagnosis, expressed on anordinal scale; moreover, the reasoning path executed and thestrength level calculated for each rule fired are also returned inorder to provide an explanation of the diagnosis produced.Ontologies are used to represent the semantic structure of theexpert’s knowledge and to provide a formulation of the DSSoutcomes extremely simple and intuitive to be read and under-stood also by a non-technical audience. Fuzzy logic is used tohandle fuzziness of input dataset and reproduce the expert’sdecision-making process to classify WMLs with further capabilityof attributing a confidence measure to the diagnosis produced.

The proposed DSS produces binary classifications with highsensitivity and specificity. The ROC curve for all the patientsshows that, at the optimal threshold, the overall sensitivity is0.8787, with a specificity of 0.7562.

Therefore, similarity measures relative to the lesion volumeare used to provide better information on the quality of theclassification. In this sense, the use of the similarity measures(SI, OF and EF) enables to evaluate the binary classifications in aquantitative and objective way. The values of SI and OF for thecategories of all patients and patients with moderate and largelesion load are higher than 0.94 for the optimal threshold. An SI

value greater than 0.7 resembles an excellent agreement as statedin Bartko (1991). For the class of patients with large lesion loadthese values even exceed the value of 0.95.

Both the ROC curves and the similarity measures also suggestthat the proposed DSS produces better results for patients with alarge lesion load than for ones with a small lesion load. From atheoretical point of view, this can be caused by a somewhat lowersensitivity in the detection of subcortical small lesions with alow percentage of surrounding WM and by a somewhat lowerspecificity in the classification of small symmetrical brain tissuessince topological features are not taken into account in this study.More precisely, these tissues, very similar to WM lesions in termsof the evaluated features, should not be classified as actual lesionssince positioned symmetrically in the MR image. From a morepractical point of view, this can be explained by the fact that smallerrors have a relatively larger effect on a smaller referencevolume.

Moreover, the ROC curve for patients with a large lesion loadfirst follows, closer than the other curves, both the left-hand andtop borders of the ROC space and, then, inverts this trend. It isworth noting that this does not indicate a wrong behavior, but itcan be explained by the fact that there is less uncertainty in theclassification of large lesions than in the other situations since thepresence of border-line lesions is very limited in this scenario.

As a matter of fact, the SI graph for patients with a large lesionload is significantly flat and indicates the invariance of theclassification response, which remains approximately the same


even with thresholds running from 0 to 1. As a result, thevariation of thresholds does not cause a sensible improvementin terms of sensitivity for patients with a large lesion load.

The SI is not only used as a measure for the accuracy of theclassification, but also with the aim of determining the optimalthreshold to generate discrete outputs from the continuousappraisal values obtained by the fuzzy engine. For all the patientcategories, this threshold is approximately 0.5. The SI graphs areconsiderably flat, which indicates the robustness of the optimalthreshold. However, in many cases the choice of the thresholdmay be also an expert decision that depends on the acceptableratio between false positive and false negative classified voxels.

At present, many studies on MS brain lesion segmentationhave been published that describe and evaluate the results inquantitative terms by comparison of lesion volume or by analysisof the number of correctly or misclassified lesions (Alfano et al.,2000; Akselrod-Ballin et al., 2009; Anbeek et al., 2004; Freifeldet al., 2009; Khayati et al., 2008; Sajja et al., 2006; Wei et al.,2002; Wels et al., 2008, Zijdenbos et al., 2002).

A comparison with these studies highlights that the presentDSS is able to achieve a lower classification accuracy and todetermine a smaller lesion volume in some cases. This is due tothe fact that the proposed DSS has been thought for obtaining notonly a sufficiently high classification accuracy, but also for findingout how the system can be deeply consulted and easily under-stood with the final aim of being most useful for difficult cases,where physicians might not be sure about their diagnoses.

Differently, the systems proposed in the cited studies arepredominantly devised to maximize only the diagnostic perfor-mance, without taking into consideration whether their outcomesare suitable to the physicians’ needs and, thus, whether they cansupport physicians in a real clinical environment. Indeed, mostpart of them provides only a dichotomous diagnosis withoutreporting any simple insight as to how they work, i.e. any clearand direct interpretation of their diagnostic results. This is thereason why they are scarcely appealing and trustworthy for thephysicians and so few of them are now in routine clinical use.

This consideration has been supported by the results of thestatistical analysis reported in this work, which has revealed thatradiology students who were given not only with a dichotomousdiagnosis, but also with other information aimed at explaining thediagnosis and measuring its trustworthiness, have showed amajor willingness (more precisely, an increase of changes equalto 24.64%) to follow the DSS outputs and change their initialopinion.

As a result, the strength of the proposed DSS with respect tothe other existing systems can be summarized in the ability ofaiding physicians in arriving at a diagnosis by providing, assuggestion, not only a dichotomous diagnosis, but also theconfidence of diagnosis, expressed on an ordinal scale, and thelinked explanation, semantically formalized by means of ontolo-gical concepts and properties. This is the reason why the presentDSS is extremely simple and intuitive to be understood also by anon-technical audience and can actually support physicians in areal clinical environment.

In conclusion, the presented DSS offers an innovative andvaluable way to perform automated WML classification in realclinical environments. Moreover, since both the knowledge-basedmodel and the fuzzy reasoning method have a general basis, thesystem is undoubtedly applicable to many other classificationproblems, for instance, classification of gray matter or cerebrosp-inal fluid. Finally, the encouraging results given by the experi-mental evaluation suggest that the system could be proficientlyutilized on routine MR diagnostic scans or in large and long-itudinal population studies with the remarkable aim of improvingdiagnosis and prognosis for patients affected by MS.

Acknowledgments

The authors are deeply grateful to the Department of Bio-Morphological and Functional Sciences of the University of Naples‘‘Federico II’’ for providing them the input dataset. They are alsothankful to all the neuroradiologists involved in the study forcooperating both in the definition of the domain-knowledge andin the manual segmentation of the input dataset. Finally, a specialacknowledgement is due to Dr. Bruno Alfano, director of theInstitute of Biostructure and Bioimaging (IBB) of Italian NationalResearch Council (CNR), for allowing to apply his multiparametricsegmentation procedure in this work and for supporting profi-ciently such a research.

References

Adlassnig, K.-P., 1998. A fuzzy logical model of computer assisted medicaldiagnosis. Methods Inf. Med. 19, 141–148.

Akselrod-Ballin, A., Galun, M., Gomori, J.M., Filippi, M., Valsasina, P., Basri, R.,Brandt, A., 2009. Automatic segmentation and classification of multiplesclerosis in multichannel MRI. IEEE Trans. Biomed. Eng. 56, 2461–2469.

Alfano, B., Brunetti, A., Arpaia, M., Ciarmiello, A., Covelli, E.M., Salvatore, M., 1995.Multiparametric display of spin-echo data from MR studies of brain. J. Magn.Reson. Imaging 5, 217–225.

Alfano, B., Brunetti, A., Larobina, M., Quarantelli, M., Tedeschi, E., Ciarmiello, A.,Covelli, E.M., Salvatore, M., 2000. Automated segmentation and measurementof global white matter lesion volume in patients with multiple sclerosis.J. Magn. Reson. Imaging 12, 799–807.

Anbeek, P., Vincken, K.L., van Osch, M.J.P., Bisschops, R.H.C., van der Grond, J., 2004.Probabilistic segmentation of white matter lesions in MR imaging. Neuro-Image 21, 1037–1044.

Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (Eds.), 2003.The Description Logic Handbook: Theory, Implementation and Applications.Cambridge University Press.

Bartko, J.J., 1991. Measurement and reliability: statistical thinking considerations.Schizophr. Bull. 17, 483–489.

Berner, E., Maisiak, R., Cobbs, C., Taunton, O., 1999. Effects of a decisionsupport systemon physicians’ diagnostic performance. J. Am. Med. Inf. Assoc. 6, 420–427.

Brown, M.S., Goldin, J.G., Suh, R.D., McNitt-Gray, M.F., Sayre, J.W., Aberle, D.R.,2003. Lung micronodules: automated method for detection at thin-section CT:initial experience. Radiology 226, 256–262.

Cooke, N.J., 1994. Varieties of knowledge elicitation techniques. Int. J. Hum.Comput. Stud. 41, 801–849.

Compston, A., Coles, A., 2008. Multiple sclerosis. Lancet 372, 1502–1517.Filippi, M., Horsfield, M.A., Tofts, P.S., Barkhof, F., Thompson, A.J., Miller, D.H., 1995.

Quantitative assessment of MRI lesion load in monitoring the evolution ofmultiple sclerosis. Brain 118, 1601–1612.

Filippi, M., Rovaris, M., Campi, A., Pereira, C., Comi, G., 1996. Semi-automatedthresholding technique for measuring lesion volumes in multiple sclerosis:effects of the change of the threshold on the computed lesion loads. ActaNeurol. Scand. 93, 30–34.

Fox, J., Myers, C.D., Greaves, M.F., Pegram, S., 1985. Knowledge acquisition for expertsystems: experience in leukaemia diagnosis. Methods Inf. Med. 24, 65–72.

Freifeld, O., Greenspan, H., Goldberger, J., 2009. Multiple sclerosis lesion detectionusing constrained GMM and curve evolution. Int. J. Biomed. Imaging 2009(Article ID 715124, 13 p.).

Gilesa, R., 1975. Lukasiewicz logic and fuzzy set theory. Int. J. Man–Mach. Stud. 8,313–327.

Gruber, T., 1995. Towards principles for the design of ontologies used for knowl-edge sharing. Int. J. Hum. Comput. Stud. 43, 907–928.

Guarino, N., 1995. Formal ontology, conceptual analysis and knowledge repre-sentation. Int. J. Hum.-Comput. Stud. 43, 625–640.

Guarino, N., 1997. Understanding and building, using ontologies. Int. J. Hum.-Comput. Stud. 46, 293–310.

Haug, P., Rocha, B., Evans, R., 2003. Decision support in medicine: lessons from thehelp system. Int. J. Med. Inform. 69, 273–284.

Horrocks, I., Sattler, U., Tobies, S., 2000. Practical reasoning for very expressivedescription logics. Logic J. IGPL 8, 239–264.

Hunt, D., Haynes, R., Hanna, S., Smith, K., 1998. Effects of computerbased clinicaldecision support systems on physician performance and patient outcomes: asystematic review. J. Am. Med. Inform Assoc. 280, 1339–1346.

Johnston, M., Langton, K., Haynes, R., Mathieu, A., 1994. Effects of computer-basedclinical decision support systems on clinician performance and patient out-come. A critical appraisal of research. Ann. Intern. Med. 120, 135–142.

Kaplan, B., 2001. Evaluating informatics applications—clinical decision supportsystems literature review. Int. J. Med. Inf. 64, 15–37.

Khan, A.S., Hoffmann, A., 2003. Building a case based diet recommendation systemwithout a knowledge engineer. Artif. Intell. Med. 27, 155–179.

Khayati, R., Vafadust, M., Towhidkhah, F., Nabavi, M., 2008. Fully automaticsegmentation of multiple sclerosis lesions in brain MR FLAIR images using


adaptive mixtures method and markov random field model. Comput. Biol.Med. 38, 379–390.

Kidd, P., 2001. Multiple sclerosis, an autoimmune inflammatory disease: prospectsfor its integrative management. Alternative Med. Rev. 6, 540–566.

Kong, G., Xu, D., Yang, J., 2008. Clinical decision support systems: a review onknowledge representation and inference under uncertainties. Int. J. Comput.Intell. Syst. 1, 159–167.

Lukasiewicz, T., Straccia, U., 2008. Managing uncertainty and vagueness indescription logics for the semantic web. J. Web Semantics 6, 291–308.

McCowan, C., Neville, R., Ricketts, I., Warner, F., Hoskins, G., Thomas, G., 2001.Lessons from a randomized controlled trial designed to evaluate computerdecision support software to improve the management of asthma. Med. Inf.Internet Med. 26, 191–201.

Mencar, C., Castellano, G., Fanelli, A.M., 2007. Distinguishability quantification offuzzy sets. Inf. Sci. 177, 130–149.

Miller, D.H., 1994. Magnetic resonance in monitoring the treatment of multiplesclerosis. Ann. Neurol. 36 (Suppl.), S91–S94.

Miller, D.H., Filippi, M., Fazekas, F., Frederiksen, J.L., Matthews, P.M., Montalban, X.,Polman, C.H., 2004. Role of magnetic resonance imaging within diagnosticcriteria for multiple sclerosis. Ann. Neurol. 56, 273–278.

Miller, D.H., Grossman, R.I., Reingold, S.C., McFarland, H.F., 1998. The role ofmagnetic resonance techniques in understanding and managing multiplesclerosis. Brain 121, 3–24.

Patel-Schneider, P.F., Hayes, P., Horrocks, I., 2004. OWL Web Ontology LanguageSemantics and Abstract Syntax. Technical report, W3C Recommendation.

Rosati, G., 2001. The prevalence of multiple sclerosis in the world: an update.Neurol. Sci. 22, 117–139.

Russo, F., Ramponi, G., 1995. A Fuzzy operator for the enhancement of blurred andnoisy images. IEEE Trans. Image Process. 4, 1169–1174.

Sajja, B.R., Datta, S., He, R., Mehta, M., Gupta, R.K., Wolinsky, J.S., Narayana, P.A.,2006. Unified approach for multiple sclerosis lesion segmentation on brainMRI. Ann. Biomed. Eng. 34, 142–151.

Siagian, R., 2003. Introducing study fuzziness in complexity research. J. Soc.Complexity 1, 38–46.

Steimann, F., Adlassnig, K.-P., 2000. Fuzzy Medical Diagnosis [online]. Available:/http://www.citeseer.nj.nec.com/160037.htmlS.

Straszecka, E., 2004. Medical knowledge representation in terms of IF–THEN rulesand the Dempster–Shafer theory. Artif. Intell. Soft Comput., Lecture NotesComput. Sci. 3070, 1056–1061.

Stoilos, G., Stamou, G., Pan, J.Z., Tzouvaras, V., Horrocks, I., 2007. Reasoning withvery expressive fuzzy description logics. J. Artif. Intell. Res. 30, 273–320.

Tarski, A., 1956. Logic, Semantics, Metamathematics: Papers from 1923 to 1938.Oxford University Press.

Tierney, W., 2001. Improving clinical decisions and outcomes with information: areview. Int. J. Med. Inf. 62, 1–9.

Trivedi, M.H., Kern, J.K., Marcee, A., Grannemann, B., Kleiber, B., Bettinger, T., Altshuler,K.Z., McClelland, A., 2002. Development and implementation of computerizedclinical guidelines: barriers and solutions. Methods Inf. Med. 41, 435–442.

Uschold, M., Gruninger, M., 1996. Ontologies: principles, methods and applica-tions. Knowl. Eng. Rev. 11, 93–136.

Wei, X., Warfield, S.K., Zou, K.H., Wu, Y., Li, X., Guimond, A., Mugler III, J.P., Benson,R.R., Wolfson, L., Weiner, H.L., Guttmann, C.R.G., 2002. Quantitative analysis ofMRI signal abnormalities of brain white matter with high reproducibility andaccuracy. J. Magn. Reson. Imaging 15, 203–209.

Wels, M., Huber, M., Hornegger, J., 2008. Fully automated segmentation ofmultiple sclerosis lesions in multispectral MRI. Pattern Recognition ImageAnal. 18, 347–350.

Yager, R.R., Filev, D.P., 1994. Essentials of Fuzzy Modeling and Control. Wiley-Interscience.

Zadeh, L., 1965. Fuzzysets. Inf. Control 8, 338–353.Zijdenbos, A.P., Forghani, R., Evans, A.C., 2002. Automatic ‘‘pipeline’’ analysis of 3-D

MRI data for clinical trials: application to multiple sclerosis. IEEE Trans. Med.Imaging 21, 1280–1291.

http://www.citeseer.nj.nec.com/160037.html

An ontology-based fuzzy decision support system for multiple sclerosis

Documents

Transcript of An ontology-based fuzzy decision support system for multiple sclerosis