[IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications...

6
Web Based Medical Expert System with a Self Training Heuristic Rule Induction Algorithm Ivan Chorbev, Dragan Mihajlov Computer Science and Computer Engineering dept. Faculty of electrical engineering and information technology, Ss. Cyril and Methodius University Skopje, R. of Macedonia [email protected], [email protected] Ilija Jolevski Dept. Computer Sciences and Engineering Faculty of Technical Sciences, Univ."St.Clement of Ohrid" Bitola, R. of Macedonia [email protected] Abstract—This paper presents a web based medical expert system that performs self training using a heuristic rule induction algorithm. The data inserted by medical personnel while using the expert system is subsequently used for additional learning. The system is trained using a hybrid heuristic algorithm for induction of classification rules that we previously developed. The SA Tabu Miner algorithm (Simulated Annealing and Tabu Search based Data Miner) is inspired by both research on heuristic optimization algorithms and rule induction data mining concepts and principles. In this paper we compare the performance of SA Tabu Miner with other rule induction algorithms for classification on public domain data sets. Keywords-medical expert system, rule induction, SA Tabu Miner, data mining, Simulated Annealing, Tabu Search I. INTRODUCTION Medical expert systems have been an area of research and implementation since the 1970s. A major part of the research in Artificial Intelligence was focused on expert systems. Various programming languages like LISP and Prolog were introduced for the purpose of declarative programming and knowledge representation, later to be used for development of expert systems. Famous medical expert systems include Mycin, designed to identify bacteria causing severe infections and to recommend antibiotics, CADUCEUS, embracing all internal medicine, also Spacecraft Health Inference Engine (SHINE), STD Wizard etc. There are two fundamental approaches for knowledge base construction: knowledge acquisition from a human expert, and empirical induction of knowledge from collections of training samples. In our research we choose the later. We used an algorithm for rule induction that we developed to create a medical expert system. Except for creating, data mining techniques have been used in validating expert systems, like in the case of PERFEX [20]. The authors applied data mining to validate the confidences (certainty factors) of the heuristic rules in the expert system and improve some of the rules. Authors in [2] propose a method for monitoring of the domain database for “significant” concept changes and upgrade of the rules. Rule induction as a type of data mining that aims to extract knowledge in form of classification rules from data. It is an inter-disciplinary field combining machine learning, statistics, information theory and databases. In our expert system we used a rule induction algorithm that we developed called SA Tabu Miner [15] (Simulated Annealing (SA) and Tabu Search (TS) Data Miner). In SA Tabu Miner, SA [5] and Short-term TS [6] [7] are used to develop an algorithm for the classification task of data mining. The goal is to assign each case (record) to one of a set of predefined classes, based on the values of some attributes for the case. In classification, discovered knowledge is often expressed in the form of IF-THEN rules (IF <conditions> THEN <class>) Different rule induction algorithms have been used for development of expert systems. Mingers [1] uses the ID3 [8] algorithm for the rules in his system. Other authors [2] choose to combine the data mining approach with a combination of a human verification of the derived rules. In such cases, despite the accuracy of the discovered knowledge, it is equally important for the derived rules to be comprehensible for the user [3], [4]. Hence, our goal in the development of SA Tabu Miner was to achieve high predictive accuracy with discovering less rules, and finding rules that contain less terms in the “if” part. The rule discovery problem is NP-hard. Therefore, the optimal solution cannot be guaranteed to be acquired except when performing an exhaustive search of the solution space, which is impossible in reasonable time limits. Currently used algorithms (ID3[8], CN2[9], C4.5[10]) suffer from the problem of trapping into local optimal solutions for large- scale problems when the number of attributes of the dataset is large (>100). For instance, certain training datasets derived by feature extraction from images have large number of attributes. Since these algorithms search on a graph they initially develop, the memory demands and the execution time grow rapidly with the growth of the number of attributes, as well as the number of instances used to train. Modern iterative heuristics such as SA, TS and genetic algorithms have been proven effective in tackling this category of problems which have an exponential and noisy search space with numerous local optima. They perform a heuristic local search instead of exhaustive search, saving time, memory and achieving satisfactory results with fewer resources. 2009 First International Conference on Advances in Databases, Knowledge, and Data Applications 978-0-7695-3550-0/09 $25.00 © 2009 IEEE DOI 10.1109/DBKDA.2009.21 143

Transcript of [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications...

Page 1: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

Web Based Medical Expert System with a Self Training Heuristic Rule Induction Algorithm

Ivan Chorbev, Dragan Mihajlov Computer Science and Computer Engineering dept.

Faculty of electrical engineering and information technology, Ss. Cyril and Methodius University

Skopje, R. of Macedonia [email protected], [email protected]

Ilija Jolevski Dept. Computer Sciences and Engineering

Faculty of Technical Sciences, Univ."St.Clement of Ohrid"

Bitola, R. of Macedonia [email protected]

Abstract—This paper presents a web based medical expert system that performs self training using a heuristic rule induction algorithm. The data inserted by medical personnel while using the expert system is subsequently used for additional learning. The system is trained using a hybrid heuristic algorithm for induction of classification rules that we previously developed. The SA Tabu Miner algorithm (Simulated Annealing and Tabu Search based Data Miner) is inspired by both research on heuristic optimization algorithms and rule induction data mining concepts and principles. In this paper we compare the performance of SA Tabu Miner with other rule induction algorithms for classification on public domain data sets.

Keywords-medical expert system, rule induction, SA Tabu Miner, data mining, Simulated Annealing, Tabu Search

I. INTRODUCTION Medical expert systems have been an area of research

and implementation since the 1970s. A major part of the research in Artificial Intelligence was focused on expert systems. Various programming languages like LISP and Prolog were introduced for the purpose of declarative programming and knowledge representation, later to be used for development of expert systems. Famous medical expert systems include Mycin, designed to identify bacteria causing severe infections and to recommend antibiotics, CADUCEUS, embracing all internal medicine, also Spacecraft Health Inference Engine (SHINE), STD Wizard etc. There are two fundamental approaches for knowledge base construction: knowledge acquisition from a human expert, and empirical induction of knowledge from collections of training samples. In our research we choose the later. We used an algorithm for rule induction that we developed to create a medical expert system.

Except for creating, data mining techniques have been used in validating expert systems, like in the case of PERFEX [20]. The authors applied data mining to validate the confidences (certainty factors) of the heuristic rules in the expert system and improve some of the rules. Authors in [2] propose a method for monitoring of the domain database for “significant” concept changes and upgrade of the rules.

Rule induction as a type of data mining that aims to extract knowledge in form of classification rules from data. It

is an inter-disciplinary field combining machine learning, statistics, information theory and databases. In our expert system we used a rule induction algorithm that we developed called SA Tabu Miner [15] (Simulated Annealing (SA) and Tabu Search (TS) Data Miner).

In SA Tabu Miner, SA [5] and Short-term TS [6] [7] are used to develop an algorithm for the classification task of data mining. The goal is to assign each case (record) to one of a set of predefined classes, based on the values of some attributes for the case. In classification, discovered knowledge is often expressed in the form of IF-THEN rules (IF <conditions> THEN <class>)

Different rule induction algorithms have been used for development of expert systems. Mingers [1] uses the ID3 [8] algorithm for the rules in his system. Other authors [2] choose to combine the data mining approach with a combination of a human verification of the derived rules. In such cases, despite the accuracy of the discovered knowledge, it is equally important for the derived rules to be comprehensible for the user [3], [4]. Hence, our goal in the development of SA Tabu Miner was to achieve high predictive accuracy with discovering less rules, and finding rules that contain less terms in the “if” part.

The rule discovery problem is NP-hard. Therefore, the optimal solution cannot be guaranteed to be acquired except when performing an exhaustive search of the solution space, which is impossible in reasonable time limits. Currently used algorithms (ID3[8], CN2[9], C4.5[10]) suffer from the problem of trapping into local optimal solutions for large-scale problems when the number of attributes of the dataset is large (>100). For instance, certain training datasets derived by feature extraction from images have large number of attributes. Since these algorithms search on a graph they initially develop, the memory demands and the execution time grow rapidly with the growth of the number of attributes, as well as the number of instances used to train. Modern iterative heuristics such as SA, TS and genetic algorithms have been proven effective in tackling this category of problems which have an exponential and noisy search space with numerous local optima. They perform a heuristic local search instead of exhaustive search, saving time, memory and achieving satisfactory results with fewer resources.

2009 First International Conference on Advances in Databases, Knowledge, and Data Applications

978-0-7695-3550-0/09 $25.00 © 2009 IEEE

DOI 10.1109/DBKDA.2009.21

143

Page 2: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

II. THE WEB MEDICAL EXPERT SYSTEM The web expert system is a part of an integrated system

for e-medicine that we are constructing for a developing information society [11], [12]. The main concepts that our system is based on include: creation of necessary Medical Information Systems (MISs) where hospitals had none; creating a framework and interfaces where various multiplatform MISs could interconnect in an integrated MIS; using modern telecommunication technologies for creating an integrated MIS and provision of advanced medical services (sharing knowledge, experience and expertise, enabling better remote patient-doctor communication).

Figure 1. The web expert system.

The goal of the web medical expert subsystem presented here is to serve as a consultant to physicians when setting a diagnosis. The physician logs in the system and chooses the appropriate input form that suites the test data that he/she is going to enter. We have previously collected training data from various patient histories from hospitals and training datasets from the UCI machine learning repository [13]. By using the training data we trained classifiers using the SA Tabu Miner algorithm. For every classifier we developed appropriate input forms that are adequate to the training data attributes. After the doctor inputs the patient’s data and submits the form, the classifier assigns a class (disease) to the patient. The classifier also presents the percent of its predictive certainty, hence the certainty of its diagnosis.

The data entered by the physician is stored. Later, when the diagnosis is confirmed and the entered data is rechecked, the confirmed records are added to the training data for a new training cycle of the classifiers, each time with more data.

The training cycle is repeated in different intervals for different classifiers, since not all diagnostic forms are used with the same frequency. The tendency is to run the training when new training cases exceed 5% of the previous number of training records.

The training process is based on a well-known 10-fold cross-validation procedure [14]. Each data set is divided into 10 mutually exclusive and exhaustive partitions and the algorithm is run once for each partition. Each time a different

partition is used as the test set and the other 9 partitions are grouped together and used as the training set. The predictive accuracies (on the test set) of the 10 runs are then averaged. Eventually the training is performed with the entire dataset so that the generated rules are based on the entire knowledge available. These final rules have significantly more impact in the deciding process when the system is in use. The rules generated in each iteration are stored. When deciding, the different groups of rules vote for the final classification of the new case with different impact, dependant on the predictive accuracy. The system adopts the weighted majority vote approach to combine the decision of the rule groups. The average predictive accuracy is necessary to estimate the reliability of the system when used as a diagnosis consultant.

The system has a modular design so that new classification forms can be added easily, when training data for the new disease is obtained. Modern web technologies (AJAX) are implemented to achieve fast and intuitive user interface.

III. RULE INDUCTION ALGORITHMS Several methods have been proposed for the rule

induction process such as ID3 [8], C4.5 [9], CN2 [10], CART[26], AQ15[27] and Ant Miner [19]. All mentioned algorithms can be grouped into two broad categories: sequential covering algorithms and simultaneous covering algorithms. Simultaneous covering algorithms like ID3 and C4.5 generate the entire rule set at once, while the sequential covering algorithms like AQ15 and CN2 learn the rule set in an incremental fashion.

The algorithm ID3 (Iterative Dichotomiser 3) is used to generate a decision tree. The algorithm forces smaller decision trees (simpler theories). The ID3 algorithm takes all unused attributes and evaluates their entropy. It chooses the attribute for which the entropy is minimal and makes a node containing that attribute. C4.5 is an extended version of ID3.

C4.5 implements a “divide-and-conquer” strategy to create a decision tree through recursive partitioning of a training dataset. The final tree is transformed into an equivalent set of rules, one rule for each path from the root to a leaf of the tree. Creating decision trees means using quite a lot of memory which grows exponentially when the number of attributes and classes increases.

CN2 works by finding the most influential rule that accounts for part of the training data, adding the rule to the induced rule set, removing the data it covers, and then iterating this process until no training instance remains. The most influential rule is discovered by a beam search, a search algorithm that uses a heuristic function to evaluate the promise of each node it examines. The Laplace error estimate is used as a heuristic criterion to evaluate the rules. Always using the most influential rule can lead to trapping in local optima.

Recently Ant Colony Optimization (ACO) has been successfully used for rule induction by Parepinelli [19]. He developed an ACO based algorithm that managed to derive quite simpler and more comprehensible classification rules with higher predictive accuracy than CN2.

144

Page 3: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

IV. SA TABU MINER Some general use of Simulated Annealing and Tabu

Search in data mining applications can be found in literature. Recently Zhang et al. [16] and Tahir et al. [17] have used TS to solve the optimal feature selection problem for data mining datasets. TS is also used [18] for developing a TS enhanced Markov Blanket (TS/MB) procedure to learn from data sets with many discrete variables and relatively few cases that often arise in health care.

Recently swarm intelligence heuristic combinatorial optimization algorithms used have been used for rule induction by Parepinelli [19]. He developed an in Ant Colony Optimization (ACO) based rule induction algorithm. The algorithm also uses entropy measurements of attributes to guide the otherwise heuristic local search for the induction rules. The idea of swarm intelligence is used to diversify the search in a wider area of the state space.

Unlike Simulated Annealing and Tabu search, Genetic algorithms can be found related to classification more often. Bojarczuk et al. [20] use genetic programming for knowledge discovery in chest pain diagnosis. Weise et al. [21] developed a GA based classification system that they used to participate in the 2007 Data-Mining-Cup Contest, proving that combinatorial optimization heuristic algorithms are emerging as an important tool in data mining. Other cases of algorithms for deriving classification rules using Genetic Algorithms are referenced in [22], [23], [24], [25].

The SA Tabu miner algorithm uses Simulated Annealing and Tabu search to combinatorial optimization techniques to create classification rules. It incrementally constructs and modifies a classification rule of the form:

IF < term1 AND term2 AND ...> THEN <class> Each term is a triple <attribute, operator =, value>. Since

the operator element in the triple is always “=”, continuous (real-valued) attributes are discretized in a preprocessing step using the C4.5 algorithm [9]. The use of continual attributes in decision trees has been researched by other authors [29], but the preprocessing discretization step provided sufficient precision.

The algorithm creates rules incrementally, performing a sequential process to discover a list of classification rules covering as many as possible training cases with as big quality as possible. It uses a combination of Simulated Annealing and Tabu Search to perform the search for the optimal rule.

At the beginning, the list of discovered rules is empty and the training set consists of all the training cases. Each iteration of the outer WHILE loop of SA Tabu Miner, corresponding to a number of executions of the inner WHILE loop, discovers one classification rule. After the rule is completed, it is pruned from the excessive terms, to exclude terms that were wrongfully added in the construction process. The created rule is added to the list of discovered rules, and the training cases covered by this rule are removed from the training set. This process is iteratively performed while the number of uncovered training cases is greater than a user-specified number called Max_uncovered_cases, usually 5% of all cases.

A high level description of the SA Tabu Miner algorithm is shown in the following pseudo code:

TrainingSet = all training cases; DiscoveredRuleList = [ ]; /*initialized with an empty

list*/ While (TrainingSet > Max_uncovered_cases) Start with an initial feasible solution S ∈ Ω. Initialize temperature While (temperature > MinTemp) Generate neighborhood solutions V* ∈ N(S). Update tabu timeouts of recently used terms Sort by (quality/tabu order) desc sol-s S* ∈ V* S* = the first solution ∈ V* While (move is not accepted or V* is exhausted) If metrop(Quality(S) - Quality(S*)) then Accept move and update best solution. Update tabu timeout of the used term break while End if S* = next solution ∈ V* End while Decrease temperature End While Prune rule S Add discovered rule S in DiscoveredRuleList TrainingSet = TrainingSet - cases covered by S; End while Where (i) Ω is the set of feasible solutions, (ii) S is the current solution, (iii) S∈ is the best admissible solution, (iv) Quality(S) is the objective function, (v) N(S) is the neighborhood of solution S, (vi) V∈ is the sample of neighborhood solutions. The selection of the term to be added to the current

partial rule depends on both a problem-dependent heuristic function (entropy based probability), a tabu timeout for the recently used attribute values and the metropolitan probability function based on the Boltzman distribution of probability. The algorithm keeps adding one term at a time to its partial rule until one of the following two stopping criteria is met:

• Any term to be added to the rule would make the rule cover a number of cases smaller than a user-specified threshold, called Min_cases_per_rule.

• The control parameter “temperature” has reached its lowest value.

Every time an attribute value is used in a term added to the rule, its tabu timeout is reset to the number of values of the particular attribute. In the same time, all other tabu timeouts of the other values for the particular attribute are decreased. This is done to enforce the use of various values rather than the most probable one, since often the difference in probability between the most probable one and the others is insignificant.

145

Page 4: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

The entropy based probability guides and intensifies the search into promising areas (attribute values that have more significance in the classification). The tabu timeouts that recently used attribute values are given, discourage their repeated use, therefore diversifying the search. The metropolitan probability function controls the search, allowing greater diversification and broader search at the beginning, while the control parameter temperature is big, and later in the process, when the control parameter temperature is low, it intensifies the search only in promising regions.

V. HEURISTIC FUNCTIONS An important step in the rule construction process is the

neighborhood function that generates the set V* of neighborhood solutions of the current rule S. The probability of use of the members in the neighborhood is what guides the search. The neighborhood consists of all possible rules generated by adding one more term to the current rule in every attribute position, as well as changing the attribute values in the attributes already included in the rule. The neighborhood contains as many rule proposals as there are values of attributes in the dataset. Every attribute has several available values. Let termij be a rule condition of the form Ai = Vij, where Ai is the i-th attribute and Vij is the j-th value of the domain of Ai. The selection of the attribute value placed in the term is dependent on the probability given as follows:

timeouttabuP ijij

ϕ= (1)

where

∑=

−=

ib

jij

ijij

Hk

Hk

12

2

)(log

logϕ (2)

where: - bi is the number of values for the attribute i. - k is the number of classes. - Hij is the entropy H(W|Ai=Vij). For each termij that can be “added” to the current rule,

SA Tabu Miner computes the value Hij of a heuristic function that is an estimate of the term quality, with respect to its ability to improve the predictive accuracy of the rule. This heuristic function is based on the Information Theory [5]. More precisely, the value of Hij for termij involves a measure of the entropy (or amount of information) associated with that term. For each termij (of the form Ai=Vij,) its entropy is:

))(log)(()( 21

ijik

wijiijiij VAwPVAwPVAWHH =⋅=−==≡ ∑

=

where: - W is the class attribute (i.e., the attribute whose domain

consists of the classes to be predicted). - P(w|Ai=Vij) is the empirical probability of observing

class w conditional on having observed Ai=Vij. The higher the value of the entropy H(W|Ai=Vij), the

more uniformly distributed the classes are and so, the smaller the probability that termij would be part of the new solution.

The smaller the entropy and the smaller the tabu timeout, the more likely the attribute’s value is going to be used. H(W|Ai=Vij) of termij is always the same, for a constant dataset. Therefore, to save computational time, the H(W|Ai=Vij) of all termij is computed as a preprocessing step to every while loop.

If the value Vij of attribute Ai does not occur in the training set, then H(W|Ai=Vij) is set to its maximum value of log2

k. This corresponds to assigning the lowest possible predictive power to termij. Second, if all cases belong to the same class then H(W|Ai=Vij) is set to 0. This corresponds to assigning the highest possible predictive power to termij. This heuristic function used by SA Tabu Miner, the entropy measure, is the same kind of heuristic function used by decision-tree algorithms such as C4.5. The main difference between decision trees and SA Tabu Miner, with respect to the heuristic function, is that in decision trees the entropy is computed for an attribute as a whole, since an entire attribute is chosen to expand the tree, whereas in SA Tabu Miner the entropy is computed for an attribute-value pair only, since an attribute-value pair is chosen to expand the rule. The tabu timeout given to recently used attribute values servers as diversifier of the search, forcing the use of unused attribute values.

Once a solution proposal is constructed, it is evaluated using the quality measure of the rule. The quality of a rule, denoted by Q, is computed by the formula: Q = sensitivity • specificity, defined by:

TNFP

TNFNTP

TPQ+

⋅+

= (4)

where: TP (true positives) is the number of cases covered by the

rule that have the class predicted by the rule. FP (false positives) is the number of cases covered by the

rule that have a class different from the class predicted by the rule.

FN (false negatives) is the number of cases not covered by the rule but having the class predicted.

TN (true negatives) is the number of cases that are not covered by the rule and that do not have the class predicted by the rule.

Q´s value is within the range 0 < Q < 1 and, the larger the value of Q, the higher the quality of the rule.

The quality of the rule is the “energy” parameter of the SA metropolitan function that decides if the new solution proposal will be accepted as the next solution. As is usual in SA, while the parameter temperature is high, even worse solutions have the chance to be accepted, therefore diversifying the search. As temperature drops, only solutions with better quality have any chance to be accepted, intensifying the search in the promising region.

As soon as our algorithm completes the construction of a rule, the rule pruning procedure is invoked. Rule pruning is an often used technique in data mining. The aim of rule pruning is to remove irrelevant terms that might have been unduly included in the rule, because of the stochastic heuristic local search. Rule pruning improves both the

146

Page 5: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

simplicity of the rule and increases the predictive power of the rule.

The basic idea of rule pruning is to iteratively remove one term at a time from the rule, while this process improves the quality of the rule. In the first iteration one starts with the full rule. The term whose removal most improves the quality of the rule is effectively removed from it, completing each iteration. This process is repeated until there is no term whose removal will improve the quality of the rule.

VI. EXPERIMENTAL RESULTS AND DISCUSSION We compared the performance and results of our SA

Tabu miner with the results derived by CN2 and Ant Miner [11] developed by Parepinelli.

The comparison was performed across two criteria, the predictive accuracy of the discovered rule lists and their simplicity (hence comprehensibility). The comparison of predictive accuracies after the 10-fold cross-validation procedure is given in table 1.

Predictive accuracy was measured by a well-known ten-fold cross-validation procedure [12]. Each data set is divided into 10 mutually exclusive and exhaustive partitions and the algorithm is run once for each partition. Each time a different partition is used as the test set and the other nine partitions are grouped together and used as the training set. The predictive accuracies (on the test set) of the 10 runs are then averaged and reported.

TABLE I. THE PREDICTIVE ACCURACY OF THE SA TABU MINER ALGORITHM COMPARED WITH THE PREDICTIVE ACCURACY OF CN2 AND

ANT MINER

Data sets Predictive accuracies (%)

SA Tabu Miner CN2 Ant Miner

Ljubljana breast cancer

65,1 67,69 75.28

Wisconsin breast cancer

90,3 94,88 96.04

tic-tac-toe 84,7 97,38 73.04 Dermatology 91,3 90,38 94.29

Hepatitis 89,2 90,00 90.00 As shown in the table 1, SA Tabu Miner achieved better

predictive accuracy than CN2 in the Dermatology dataset, and better accuracy than Ant Miner in tic-tac-toe set. In the other cases, the predictive accuracy is almost equal or slightly smaller than the other two. Despite the insignificantly smaller accuracy on some of the datasets, the advantage of SA Tabu Miner is in the robustness of the search, and its independence from the dataset size and number of attributes. While other algorithms might need exponential time or extremely large portions of memory to work, SA Tabu Miner can perform its search quickly, utilizing modest memory resources.

An important feature of classification algorithm is the simplicity of the discovered rule list, measured by the number of discovered rules and the average number of terms (conditions) per rule. Simplicity is very important for any use

of the rules by humans, for easier verification and implementation in praxis. The results comparing the simplicity of the rule lists discovered by SA Tabu-Miner and by CN2 are reported in table 2.

TABLE II. SIMPLICITY OF RULES DISCOVERED BY THE COMPARED ALGORITHMS. THE NUMBER OF RULES AND TERMS PER RULE

Data sets Number of rules; and terms per rule

SA Tabu Miner CN2 Ant Miner

Ljubljana breast cancer

8.55;1.70 55.40;2.21 7.10;1.28

Wisconsin breast cancer

6.10;2.15 18.60;2.39 6.20;1.97

tic-tac-toe 8.62;1.30 39.70;2.90 8.50;1.18 Dermatology 6.92;4.08 18.50;2.47 7.30;3.16

Hepatitis 3.21;2.54 7.20; 1.58 3.40;2.41 SA Tabu Miner achieved significantly smaller number of

simpler rules in all datasets compared with CN2, while deriving simpler rules than Ant Miner in the case of Wisconsin breast cancer, dermatology and Hepatitis. In the Ljubljana breast cancer and tic-tac-toe, the simplicity is very similar with the one achieved by Ant Miner.

Taking into account the predictive accuracy of the discovered rules, SA Tabu Miner achieved comparable, and in some cases better results than the other two algorithms. However, in terms of rule simplicity, SA Tabu miner achieves quite simpler and more comprehensible rules than the other two.

VII. CONCLUSION This paper presents a web based medical expert system

that performs self training using a heuristic rule induction algorithm. The system has a self training component since data inserted by medical personnel while using the expert system is subsequently used for additional learning. The early test use of the system in a public hospital generated positive feedback from the physicians using it. The self training function is expected to fine tune the classification and to adapt it to health issues that are endemic in the region of use.

For the purpose of training, the system uses a hybrid heuristic algorithm for induction of classification rules that we previously developed. The SA Tabu Miner is inspired by both research on heuristic optimization algorithms and rule induction data mining concepts and principles.

We have compared the performance of SA Tabu Miner with CN2 and Ant miner algorithms, on public domain data sets. The results showed that, concerning predictive accuracy, SA Tabu Miner obtained similar and often better results than the other approaches.

Since comprehensibility is important whenever discovered knowledge will be used for supporting a decision made by a human user, SA Tabu Miner often discovered simpler rule lists. Therefore, SA Tabu Miner seems particularly advantageous. Furthermore, while CN2 and C4.5 have its limitations when large datasets with big number of

147

Page 6: [IEEE 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications - Gosier, Guadeloupe, France (2009.03.1-2009.03.6)] 2009 First International Confernce

attributes are in question, SA Tabu Miner can still be applicable and will obtain good results due to the heuristic local search.

Important directions for future research include extending SA Tabu-Miner to cope with continuous attributes, rather than requiring that this kind of attribute be discretized in a preprocessing step.

REFERENCES

[1] Mingers J., Expert Systems-Experiments with Rule Induction, The Journal of the Operational Research Society, Vol. 37, No. 11 (Nov., 1986), pp. 1031-1037

[2] Holmes, G.; Cunningham, S.J., Using data mining to support the construction and maintenance of expert systems..Artificial Neural Networks and Expert Systems, 1993. Proceedings., First New Zealand International Two-Stream Conference on, Nov 1993 pp:156 – 159

[3] U. M. Fayyad, G. Piatetsky-Shapiro and P. Smyth, From data mining to knowledge discovery: an overview, In: Advances in Knowledge Discovery & Data Mining, Cambridge, MA: AAAI/MIT, pp. 1-34, 1996.

[4] A. A. Freitas and S. H. Lavington, Mining Very Large Databases with Parallel Processing, London, UK: Kluwer, 1998.

[5] Kirkpatrick S., Gelatt, C. D. Jr., Vecchi M.P., Optimization by Simulated Annealing, in Technical Report Research Report RC 9335, IBM Thomas J. Watson Center, Yorktown Heights, N.Y., 1982.

[6] H. Zhang and G. Sun, “Feature selection using tabu search method,” Pattern Recognition, vol. 35, no. 3, pp. 701–711, 2002.

[7] F. Glover, “Tabu search I,” ORSA Journal on Computing, vol. 1, no. 3, pp. 190–206, 1989.

[8] Quinlan J.R. Induction of decision trees. In Machine learning, vol. 1, p. 81-106. Kluwer Academic Publishers, 1986.

[9] Quinlan J. R., C4.5: Programs for Machine Learning, San Francisco, CA: Morgan Kaufmann, 1993.

[10] Clark P, Boswell R, "Rule Induction with CN2: Some Recent Improvements". In: Yves Kodratoff editor, Proceedings of the Fifth European Conference on Machine Learning, pages 151-163. Berlin, Springer-Verlag, 1991.

[11] Chorbev, M. Mihajlov, Wireless Telemedicine Services as part of an Integrated System for E-Medicine, IEEE MELECON2008, The 14th IEEE Mediterranean Electrotechnical Conference, Ajaccio (France), May 2008, pages 264-269

[12] Chorbev I., Mihajlov D., Integrated system for e-medicine in a developing information society, Information Society 2007, 10. International multiconference, Ljubljana, Slovenija, October 2007, page 15.

[13] http://archive.ics.uci.edu/ml/datasets.html [14] S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn,

San Francisco: Morgan Kaufmann, 1991. [15] Chorbev I., Mihajlov D, Madzarov G., Rule induction in medical data

with hybrid heuristic algorithms, IS2008, 11. Inter. conf., Ljubljana, Slovenija, October 2008, in print.

[16] H. Zhang and G. Sun, “Feature selection using tabu search method,” Pattern Recognition, vol. 35, no. 3, pp. 701–711, 2002.

[17] Tahir M. A., Bouridane A., Kurugollu F., Amira A., A Novel Prostate Cancer Classification Technique Using Intermediate Memory Tabu Search, in EURASIP Journal on Applied Signal Processing 2005:14, 2241–2249.

[18] Bai X., Tabu Search Enhanced Markov Blanket Classifier for High Dimensional Data Sets, in The Next Wave in Computing, Optimization, and Decision Technologies, vol 29, January 2005, ISBN 978-0-387-23528-8, p. 337-354.

[19] Parepinelli, R. S., Lopes, H. S., & Freitas, A. An Ant Colony Algorithm for Classification Rule Discovery. In H. A. a. R. S. a. C. Newton (Ed.), Data Mining: Heuristic Approach: Idea Group Publishing, 2002.

[20] Bojarczuk C. C., Lopes H. S., Freitas, A. A., Genetic programming for knowledge discovery in chest pain diagnosis in IEEE Engineering in Medicine and Biology Magazine 19(4), (2000), p. 38-44.

[21] Weise T., Achler S., G¨ob M., Voigtmann C., Zapf M., Evolving Classifiers – Evolutionary Algorithms in Data Mining, Kasseler Informatikschriften (KIS) vol. 2007, University of Kassel, September 28, 2007, p. 1-20.

[22] Gopalan J., Alhajj R., Barker K., Discovering Accurate and Interesting Classification Rules Using Genetic Algorithm. In Proceedings of the 2006 International Conference on Data Mining, DMIN 2006, pages 389–395. CSREA Press, June 2006, Las Vegas, Nevada, USA. ISBN 1-60132-004-3.

[23] Otero F. E. B., Silva M. M. S., Freitas A. A., Nievola J. C., Genetic Programming for Attribute Construction in Data Mining. In Genetic Programming: Proc. 6th European Conference (EuroGP-2003), pages 384–393, 2003.

[24] Yang Y. F., Lohmann P., Heipke C.: Genetic algorithms for multi-spectral image classification In: Schiewe, J., Michel, U. (Eds.): Geoinformatics paves the Highway to Digital Earth, Festschrift zum 60. Geburtstag von Prof. M. Ehlers (2008), Nr. 8, p. 153-161.

[25] V. Podgorelec, P. Kokol, M. Molan Stiglic, M. Heričko, I. Rozman, Knowledge Discovery with Classification Rules in a Cardiovascular Database, Computer Methods and Programs in Biomedicine, Elsevier, vol. 80, suppl. 1, pp. S39-S49, 2005.

[26] Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984.

[27] J. H. Holland. “Escaping brittleness: the possibilities of general purpose algorithms applied to parallel rule-based systems,” in Machine Learning, an AI Approach, R. S. Michalski, J. G. Carbonell, and T. M. Mitchell (Eds.), San Mateo, California: Morgan Kaufmann, Vol. 2, pp. 593-623, 1986.

[28] Cooke, C.D., Santana, C.A., Morris, T.I., DeBraal, L., Ordonez, C., Omiecinski, E., Ezquerra, N.F., Garcia, E.V., Validating expert system rule confidences using data mining ofmyocardial perfusion SPECT databases, Computers in Cardiology 2000, Page(s):785 – 788

[29] S. Morasca, A Proposal for Using Continuous Attributes in Classification Trees, SEKE '02, July 15-19, 2002, Ischia, Italy

148