Mining Fuzzy Multiple-Level Association Rules from Quantitative Data
Mining Fuzzy Association Rules in Quantitative Databases
Click here to load reader
Transcript of Mining Fuzzy Association Rules in Quantitative Databases
Mining Fuzzy Association Rules in Quantitative Databases
Yiming Bai, Xianyao Meng, Xinjie Han
Room 303, Information Science and Technology Building B, Dalian Maritime University, China,[email protected], [email protected], [email protected]
Keywords: Fuzzy set, association rule, data mining, support degree.
Abstract. In this paper, we introduce a novel technique for mining fuzzy association rules in
quantitative databases. Unlike other data mining techniques who can only discover association rules
in discrete values, the algorithm reveals the relationships among different quantitative values by
traversing through the partition grids and produces the corresponding Fuzzy Association Rules. Fuzzy
Association Rules employs linguistic terms to represent the revealed regularities and exceptions in
quantitative databases. After the fuzzy rule base is built, we utilize the definition of Support Degree in
data mining to reduce the rule number and save the useful rules. Throughout this paper, we will use a
set of real data from a wine database to demonstrate the ideas and test the models.
Introduction
Mining Association Rules, defined as searching the interesting relationship among different
attributes in a database, is an important task in data mining research. Recently, in order to deal with
quantitative databases, a more common data type in practice, mining association rule in numerical
data is concerned in Data Ming. Some existing algorithms involved first discrete the domains of
quantitative attributes, then re-combine the neighboring partitions. So the task is converted into
mining association rules in Boolean variables. However, the drawbacks of this method are obvious. It
is easy to ignore or over-emphasize the data points around the boundary of the partitions. Therefore,
we introduce the theory of fuzzy set to solve this problem. The fuzzy set theory employs linguistic
terms to reveal the regularities in quantitative database, which could suit to the boundary problem
perfectly. More over, the linguistic representation is much natural for human experts to understand.
As the definition of linguistic terms is based on fuzzy set theory, we also call the rules having these
terms Fuzzy Association Rules.
Problem Descriptions
The fuzzy set theory has been recognized as a suitable tool to model the pattern of quantitative
Data [3]. In order to present our ideas and methods more clearly and to show the application results,
we will use the following real database of the Italy wine throughout this paper.
The database consists of the chemical and physical analysis of the Italy wine. We consider the
physical attributes, color intensity and hue, as the input conditional variables and the chemical
attribute, alcohol, as output concluding variable. Our task in this paper is to extract a set of strong
fuzzy association rules presenting the relationship between physical attributes and chemical attribute
based on the training data, and use these extracted rules to predict the alcohol of the test data. In this
database a set D of N=178 sample data pairs are included, where each data pair represents one sample
of wine. We divide the N input-output data pairs into two groups: a training set Dtrain containing N1
data pairs, and a test set Dtest containing the remaining N2=N-N1 data pairs. N1/N2=7/3 is a
reasonable choices in dividing.
Applied Mechanics and Materials Vols. 182-183 (2012) pp 2003-2007Online available since 2012/Jun/04 at www.scientific.net© (2012) Trans Tech Publications, Switzerlanddoi:10.4028/www.scientific.net/AMM.182-183.2003
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,www.ttp.net. (ID: 130.15.241.167, Queen's University, Kingston, Canada-18/09/13,05:55:33)
Mining fuzzy association rule
Model the quantitative attributes with linguistic terms
The extracted rules can take different forms. In our method in this paper, we use the fuzzy
IF–THEN rules to describe the relationship between the physical attributes and the chemical attribute.
As the data are saved as quantitative values in the wine database, we could acquire the intervals of
these wine attributes, respectively. In this paper, x1 and x2 are defined as the input attributes of color
intensity and hue; y is defined as the output attribute of alcohol. The intervals are Color Intensity x1:
[1.28, 10.80], Hue x2: [0.48, 1.51], Alcohol y: [11.03, 14.83]. According to the fuzzy logic, we could
build the fuzzy modeling in the domain intervals. (The domain interval of a variable means that most
probably this variable will lie in this interval.) The partitions of input and output variables are shown
in fig. 1.
a. The partition of color intensity b. The partition of hue c. The partition of alcohol
Fig.1 Divisions of the input and output spaces into fuzzy regions and the corresponding membership
functions
So the numerical data pairs in quantitative database could be transformed into fuzzy data pairs by
the triangular membership function shown in fig.2 defined as:
x( ; , ) , [ , ]
c
c c c
xm x x x x x
σσ σ σ
σ
− −= ∈ − +
(1)
( ; , ) 0, [ , ]c c cm x x x R x xσ σ σ= ∈ − − + (2)
Here, we will select an example in the sample data pairs Pi(x1, x2; y):(2.2, 0.7; 13.25)to illustrate.
First, calculate the membership function values of each attribute:
x1(Color)=0.4/S2(dark)+0.6/S1(little dark)+0/CE(Moderate)+0/B1(little bright)+0/B2(bright)
x2(Hue)=0/S2(cold)+1/S1(little cold)+0/CE(moderate)+0/B1(little warm)+0/B2(warm)
y(Alcohol)=0/S2(low)+0/S1(little low)+0.75/CE(moderate)+0.25/B1(little high)+0/B2(high)
According to the max degree principle, the Fuzzy Association Rules generated from this sample
data pair could be described as: If the color intensity is little dark and the hue is little cold, Then the
alcohol is moderate i.e. If x1=S1 and x2=S1 Then y=CE. (This rule is not the final Fuzzy Association
Rules.)
Fig.2 The triangular membership function Fig.3 The fuzzy rule base
Generating a fuzzy rule base with a traversal algorithm
A rule base with 5×4=20 fuzzy rules will be built according to the partition of input spaces. As is
shown in fig.3, each grid is a combination of the IF antecedents and could generate a fuzzy association
rule.
Step 1. Put all the sample data pairs into some grid of the rule base. Calculate the membership
function value of each sample and set the product of the input membership function values as the
weight for this sample.
B2
B1
S1
S2
S2 S1 CE B1 B2
X1
X2
16 17 18 19 20
11 12 13 14 15
6 7 8 9 10
1 2 3 4 5
B2
B1
CE
S1
S2
Y
2004 Applied Mechanics and Mechatronics Automation
Here is an example of the weight calculation in gird two. Suppose that the ith sample data pair in
the wine data base is defined as Pi (xi1, xi2, yi). Then we could get the membership function values of
the input attributes and the weight for this data pair:
Color Intensity: 1
1( )sim x Hue: 2
2( )s
im x Weight: 1 2
1 2( ) ( ) ( )s s
i iPi m x m xω = .
Step 2. Determine the conclusion of then part by counting the average membership function
values. This step will continue the work to generate the Then part of the rule in grid two. The output
space has been partitioned into five fuzzy sets. We need to calculate the strength of each output fuzzy
set respectively. For example, the strength in low alcohol output fuzzy set is calculated as following:
2
2 1
1
( ) ( )
( )
Ns
is i
N
i
m y Pi
D
Pi
ω
ω
=
=
∗
=∑
∑ (3)
where N is the total number of data pairs, 2( )s
im y is the fuzzy membership function value in low
alcohol fuzzy set S2 for output attribute, ( )Piω is the weight for input attributes. And we select the
output fuzzy set with the max strength as the conclusion part of this rule, i.e. max (DS2
, DS1
, DCE
, DB1
,
DB2
). In some grid1
( ) 0N
i
Piω=
=∑ , then no rule will be generated. Assume that the strength DS2
is the
max one for output attribute in grid two, the final rule is defined as: IF x1=S1 and x2 =S2 THEN
y=S2, which could be described with language terms as: IF the color intensity is little dark and the hue
is cold, THEN the wine alcohol is low. And the fuzzy rule base could be built after the sample data
pairs are traversing through all the input grids from the beginning to the end.
The algorithm is shown as following:
Input: the sample data pairs DATA, the range of the inputs and output attributes RANGES, the
partition number of fuzzy sets FUZSETS.
Output: a fuzzy rule base RULE.
1) Tell whether the sample data pairs have traversed through all the input partition grids. If the
final grid is traversed through, the algorithm will end.
2) Or, counting the weights of the sample data pairs in current grid.
3) By loading the weights in step 2), count the strength in each output fuzzy set and select the
fuzzy set with the max strength as the conclusion of the current rule.
4) Save the rule generated in step 3) to the rule base, and turn to step 1) to continue.
Reduce the rule number
In order to tell the potential usefulness of a fuzzy association rule, we introduce the definition of
Support Degree in data mining. The support of an itemset is used to describe the proportion of
transactions in the data set which contain this type of itemset. The support degree of association rule
“A→B” in Data Mining is defined as:
sup( ) { : , } sup( )A B T A B T T D D A B→ = ∪ ⊆ ∈ = ∪ (4)
where D is the set of total transactions and T is a set of transactions with A and B items.
We introduce fuzzy logic in this paper by employing the fuzzy membership function variable to
describe the quantitative variable in data base. So the support degree of rule i (i∈{1, 2, 3…P}, P is the
total rule number) could be calculated as follow:
1 1
1sup( ) ( ( ) ( ))
k
nN
iy j ix jk
j k
X y m y m xN = =
⇒ = ∑ ∏ (5)
where N is the sample number, n is the dimension of input attributes, ( )iy jm y is the membership
function value for the conclusion fuzzy set in rule i, ( )kix jkm x is the membership function value for
the kth condition fuzzy set in rule i
Applied Mechanics and Materials Vols. 182-183 2005
We will set a threshold τ according to the actual conditions. The rules meet the requirement of the
threshold τ are strong fuzzy association rules. IF we cut down the rules with the support degree less
than 10-2
, 13 fuzzy association rules will be saved which is shown in fig.3. IF we cut down the rules
with the support degree less than 10-1
, 3 fuzzy association rules will be saved:
IF the color intensity is moderate and the hue is little warm, THEN the wine alcohol is little high.
IF the color intensity is little dark and the hue is little warm, THEN the wine alcohol is little low.
IF the color intensity is little bright and the hue is little cold, THEN the wine alcohol is moderate
Fuzzy association rule validation
In order to test the validation of the fuzzy association rules, a model with 13 fuzzy association
rules is built to predict the wine alcohol. The rule model is shown in fig.4.
The predict model could be viewed as a multiple input single output mapping. Here the inputs are
the physical attributes and the output is the corresponding wine alcohol. We use the following
centroid defuzzification formula to determine the output y for given input x:
1
1
Pi i
c
i
Pi
i
y
y
ω
ω
=
=
=∑
∑ (6)
where P is the number of fuzzy rules in the fuzzy rule base, 1
( )n
i i
j
j
m xω=
=∏ is the weight for the ith
rule, n is the dimension of input attributes, i
cy is the center of collusion fuzzy set for the ith rule.
Table 1 lists the predicting results of the sample data pairs in little high alcohol output region.
Basically, the predicted values and the actual values are the same. Each alcohol error is less than 0.1,
which satisfy the fuzzy association rules.
Table 1 The predicting results
Fig.4 The fuzzy rules and its model
Conclusions
We use fuzzy set to describe the quantitative sample data and generate fuzzy association rule base
by traversing through the input partition grids. And the definition of support degree is employed to
select strong fuzzy association rules from the origin fuzzy base.
The algorithm has the following advantages:
1) Regularization. The quantitative data are turned into the membership function value. And the
membership function value is regularized into the interval of [0, 1], which could improve the
robustness of the system [6].
2) Completeness. All the training sample data go through all the grid regions. I.e. every rule in our
data base is elected by all the sample data, not just by the sample data in one grid region. So the rule
base is also a complete rule base by means of covering all the sample information.
3) Interpretability. Each fuzzy set has the corresponding linguistic label which could be interpret
by the experts. It is easy to compare between rules and encourage the knowledge discovery.
2006 Applied Mechanics and Mechatronics Automation
References
[1] R.Srikant and R.Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,’’
in Proc. of the ACM SIGMOD Int’l Conf., Monreal, Canada, June 1996, pp. 1-12.
[2] Wai-Ho Au and Keith C.C. Chan, “FARM: A Data Mining System for Discovering Fuzzy
Association Rules,” in Proc. Of 1999 IEEE International Fuzzy Systems Conf., Seoul, Korea,
August 1999, pp. 1217-1222.
[3] Miguel Delgado, Nicolás Marín, Daniel Sánchez and María-Amparo Vila, “Fuzzy Association
Rules: General Model and Applications,” IEEE Trans. Fuzzy Syst., vol.11, pp. 214-225, April,
2003.
[4] J. Han and M. Kamber “Data Mining: Concepts and Techniques [M],” San Francisco: Morgan
Kaufmann Publishers, 2005.
[5] S. Guillaume, “Designing fuzzy inference systems from data: An interpretability-oriented
review,” IEEE Trans. Fuzzy Syst., vol. 9, pp.426–443, June, 2001.
[6] Y. F. Wang, D. H. Wang and T. Y. Chai, “Extraction of Fuzzy Rules with Completeness and
Robustness,” Acta Automatica Sinica, Vol. 36, No. 9, September, 2010. In Chinese
[7] Wang L X, Mendel J M. “Generating fuzzy rules by learning from examples,” IEEE Trans.
Systems, Man and Cybernetics, Vol. 22, No. 6, pp. 1414–1427, November/December, 1992.
Applied Mechanics and Materials Vols. 182-183 2007
Applied Mechanics and Mechatronics Automation 10.4028/www.scientific.net/AMM.182-183 Mining Fuzzy Association Rules in Quantitative Databases 10.4028/www.scientific.net/AMM.182-183.2003