Mining Fuzzy Association Rules in Quantitative Databases

6

Click here to load reader

Transcript of Mining Fuzzy Association Rules in Quantitative Databases

Page 1: Mining Fuzzy Association Rules in Quantitative Databases

Mining Fuzzy Association Rules in Quantitative Databases

Yiming Bai, Xianyao Meng, Xinjie Han

Room 303, Information Science and Technology Building B, Dalian Maritime University, China,[email protected], [email protected], [email protected]

Keywords: Fuzzy set, association rule, data mining, support degree.

Abstract. In this paper, we introduce a novel technique for mining fuzzy association rules in

quantitative databases. Unlike other data mining techniques who can only discover association rules

in discrete values, the algorithm reveals the relationships among different quantitative values by

traversing through the partition grids and produces the corresponding Fuzzy Association Rules. Fuzzy

Association Rules employs linguistic terms to represent the revealed regularities and exceptions in

quantitative databases. After the fuzzy rule base is built, we utilize the definition of Support Degree in

data mining to reduce the rule number and save the useful rules. Throughout this paper, we will use a

set of real data from a wine database to demonstrate the ideas and test the models.

Introduction

Mining Association Rules, defined as searching the interesting relationship among different

attributes in a database, is an important task in data mining research. Recently, in order to deal with

quantitative databases, a more common data type in practice, mining association rule in numerical

data is concerned in Data Ming. Some existing algorithms involved first discrete the domains of

quantitative attributes, then re-combine the neighboring partitions. So the task is converted into

mining association rules in Boolean variables. However, the drawbacks of this method are obvious. It

is easy to ignore or over-emphasize the data points around the boundary of the partitions. Therefore,

we introduce the theory of fuzzy set to solve this problem. The fuzzy set theory employs linguistic

terms to reveal the regularities in quantitative database, which could suit to the boundary problem

perfectly. More over, the linguistic representation is much natural for human experts to understand.

As the definition of linguistic terms is based on fuzzy set theory, we also call the rules having these

terms Fuzzy Association Rules.

Problem Descriptions

The fuzzy set theory has been recognized as a suitable tool to model the pattern of quantitative

Data [3]. In order to present our ideas and methods more clearly and to show the application results,

we will use the following real database of the Italy wine throughout this paper.

The database consists of the chemical and physical analysis of the Italy wine. We consider the

physical attributes, color intensity and hue, as the input conditional variables and the chemical

attribute, alcohol, as output concluding variable. Our task in this paper is to extract a set of strong

fuzzy association rules presenting the relationship between physical attributes and chemical attribute

based on the training data, and use these extracted rules to predict the alcohol of the test data. In this

database a set D of N=178 sample data pairs are included, where each data pair represents one sample

of wine. We divide the N input-output data pairs into two groups: a training set Dtrain containing N1

data pairs, and a test set Dtest containing the remaining N2=N-N1 data pairs. N1/N2=7/3 is a

reasonable choices in dividing.

Applied Mechanics and Materials Vols. 182-183 (2012) pp 2003-2007Online available since 2012/Jun/04 at www.scientific.net© (2012) Trans Tech Publications, Switzerlanddoi:10.4028/www.scientific.net/AMM.182-183.2003

All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,www.ttp.net. (ID: 130.15.241.167, Queen's University, Kingston, Canada-18/09/13,05:55:33)

Page 2: Mining Fuzzy Association Rules in Quantitative Databases

Mining fuzzy association rule

Model the quantitative attributes with linguistic terms

The extracted rules can take different forms. In our method in this paper, we use the fuzzy

IF–THEN rules to describe the relationship between the physical attributes and the chemical attribute.

As the data are saved as quantitative values in the wine database, we could acquire the intervals of

these wine attributes, respectively. In this paper, x1 and x2 are defined as the input attributes of color

intensity and hue; y is defined as the output attribute of alcohol. The intervals are Color Intensity x1:

[1.28, 10.80], Hue x2: [0.48, 1.51], Alcohol y: [11.03, 14.83]. According to the fuzzy logic, we could

build the fuzzy modeling in the domain intervals. (The domain interval of a variable means that most

probably this variable will lie in this interval.) The partitions of input and output variables are shown

in fig. 1.

a. The partition of color intensity b. The partition of hue c. The partition of alcohol

Fig.1 Divisions of the input and output spaces into fuzzy regions and the corresponding membership

functions

So the numerical data pairs in quantitative database could be transformed into fuzzy data pairs by

the triangular membership function shown in fig.2 defined as:

x( ; , ) , [ , ]

c

c c c

xm x x x x x

σσ σ σ

σ

− −= ∈ − +

(1)

( ; , ) 0, [ , ]c c cm x x x R x xσ σ σ= ∈ − − + (2)

Here, we will select an example in the sample data pairs Pi(x1, x2; y):(2.2, 0.7; 13.25)to illustrate.

First, calculate the membership function values of each attribute:

x1(Color)=0.4/S2(dark)+0.6/S1(little dark)+0/CE(Moderate)+0/B1(little bright)+0/B2(bright)

x2(Hue)=0/S2(cold)+1/S1(little cold)+0/CE(moderate)+0/B1(little warm)+0/B2(warm)

y(Alcohol)=0/S2(low)+0/S1(little low)+0.75/CE(moderate)+0.25/B1(little high)+0/B2(high)

According to the max degree principle, the Fuzzy Association Rules generated from this sample

data pair could be described as: If the color intensity is little dark and the hue is little cold, Then the

alcohol is moderate i.e. If x1=S1 and x2=S1 Then y=CE. (This rule is not the final Fuzzy Association

Rules.)

Fig.2 The triangular membership function Fig.3 The fuzzy rule base

Generating a fuzzy rule base with a traversal algorithm

A rule base with 5×4=20 fuzzy rules will be built according to the partition of input spaces. As is

shown in fig.3, each grid is a combination of the IF antecedents and could generate a fuzzy association

rule.

Step 1. Put all the sample data pairs into some grid of the rule base. Calculate the membership

function value of each sample and set the product of the input membership function values as the

weight for this sample.

B2

B1

S1

S2

S2 S1 CE B1 B2

X1

X2

16 17 18 19 20

11 12 13 14 15

6 7 8 9 10

1 2 3 4 5

B2

B1

CE

S1

S2

Y

2004 Applied Mechanics and Mechatronics Automation

Page 3: Mining Fuzzy Association Rules in Quantitative Databases

Here is an example of the weight calculation in gird two. Suppose that the ith sample data pair in

the wine data base is defined as Pi (xi1, xi2, yi). Then we could get the membership function values of

the input attributes and the weight for this data pair:

Color Intensity: 1

1( )sim x Hue: 2

2( )s

im x Weight: 1 2

1 2( ) ( ) ( )s s

i iPi m x m xω = .

Step 2. Determine the conclusion of then part by counting the average membership function

values. This step will continue the work to generate the Then part of the rule in grid two. The output

space has been partitioned into five fuzzy sets. We need to calculate the strength of each output fuzzy

set respectively. For example, the strength in low alcohol output fuzzy set is calculated as following:

2

2 1

1

( ) ( )

( )

Ns

is i

N

i

m y Pi

D

Pi

ω

ω

=

=

=∑

∑ (3)

where N is the total number of data pairs, 2( )s

im y is the fuzzy membership function value in low

alcohol fuzzy set S2 for output attribute, ( )Piω is the weight for input attributes. And we select the

output fuzzy set with the max strength as the conclusion part of this rule, i.e. max (DS2

, DS1

, DCE

, DB1

,

DB2

). In some grid1

( ) 0N

i

Piω=

=∑ , then no rule will be generated. Assume that the strength DS2

is the

max one for output attribute in grid two, the final rule is defined as: IF x1=S1 and x2 =S2 THEN

y=S2, which could be described with language terms as: IF the color intensity is little dark and the hue

is cold, THEN the wine alcohol is low. And the fuzzy rule base could be built after the sample data

pairs are traversing through all the input grids from the beginning to the end.

The algorithm is shown as following:

Input: the sample data pairs DATA, the range of the inputs and output attributes RANGES, the

partition number of fuzzy sets FUZSETS.

Output: a fuzzy rule base RULE.

1) Tell whether the sample data pairs have traversed through all the input partition grids. If the

final grid is traversed through, the algorithm will end.

2) Or, counting the weights of the sample data pairs in current grid.

3) By loading the weights in step 2), count the strength in each output fuzzy set and select the

fuzzy set with the max strength as the conclusion of the current rule.

4) Save the rule generated in step 3) to the rule base, and turn to step 1) to continue.

Reduce the rule number

In order to tell the potential usefulness of a fuzzy association rule, we introduce the definition of

Support Degree in data mining. The support of an itemset is used to describe the proportion of

transactions in the data set which contain this type of itemset. The support degree of association rule

“A→B” in Data Mining is defined as:

sup( ) { : , } sup( )A B T A B T T D D A B→ = ∪ ⊆ ∈ = ∪ (4)

where D is the set of total transactions and T is a set of transactions with A and B items.

We introduce fuzzy logic in this paper by employing the fuzzy membership function variable to

describe the quantitative variable in data base. So the support degree of rule i (i∈{1, 2, 3…P}, P is the

total rule number) could be calculated as follow:

1 1

1sup( ) ( ( ) ( ))

k

nN

iy j ix jk

j k

X y m y m xN = =

⇒ = ∑ ∏ (5)

where N is the sample number, n is the dimension of input attributes, ( )iy jm y is the membership

function value for the conclusion fuzzy set in rule i, ( )kix jkm x is the membership function value for

the kth condition fuzzy set in rule i

Applied Mechanics and Materials Vols. 182-183 2005

Page 4: Mining Fuzzy Association Rules in Quantitative Databases

We will set a threshold τ according to the actual conditions. The rules meet the requirement of the

threshold τ are strong fuzzy association rules. IF we cut down the rules with the support degree less

than 10-2

, 13 fuzzy association rules will be saved which is shown in fig.3. IF we cut down the rules

with the support degree less than 10-1

, 3 fuzzy association rules will be saved:

IF the color intensity is moderate and the hue is little warm, THEN the wine alcohol is little high.

IF the color intensity is little dark and the hue is little warm, THEN the wine alcohol is little low.

IF the color intensity is little bright and the hue is little cold, THEN the wine alcohol is moderate

Fuzzy association rule validation

In order to test the validation of the fuzzy association rules, a model with 13 fuzzy association

rules is built to predict the wine alcohol. The rule model is shown in fig.4.

The predict model could be viewed as a multiple input single output mapping. Here the inputs are

the physical attributes and the output is the corresponding wine alcohol. We use the following

centroid defuzzification formula to determine the output y for given input x:

1

1

Pi i

c

i

Pi

i

y

y

ω

ω

=

=

=∑

∑ (6)

where P is the number of fuzzy rules in the fuzzy rule base, 1

( )n

i i

j

j

m xω=

=∏ is the weight for the ith

rule, n is the dimension of input attributes, i

cy is the center of collusion fuzzy set for the ith rule.

Table 1 lists the predicting results of the sample data pairs in little high alcohol output region.

Basically, the predicted values and the actual values are the same. Each alcohol error is less than 0.1,

which satisfy the fuzzy association rules.

Table 1 The predicting results

Fig.4 The fuzzy rules and its model

Conclusions

We use fuzzy set to describe the quantitative sample data and generate fuzzy association rule base

by traversing through the input partition grids. And the definition of support degree is employed to

select strong fuzzy association rules from the origin fuzzy base.

The algorithm has the following advantages:

1) Regularization. The quantitative data are turned into the membership function value. And the

membership function value is regularized into the interval of [0, 1], which could improve the

robustness of the system [6].

2) Completeness. All the training sample data go through all the grid regions. I.e. every rule in our

data base is elected by all the sample data, not just by the sample data in one grid region. So the rule

base is also a complete rule base by means of covering all the sample information.

3) Interpretability. Each fuzzy set has the corresponding linguistic label which could be interpret

by the experts. It is easy to compare between rules and encourage the knowledge discovery.

2006 Applied Mechanics and Mechatronics Automation

Page 5: Mining Fuzzy Association Rules in Quantitative Databases

References

[1] R.Srikant and R.Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,’’

in Proc. of the ACM SIGMOD Int’l Conf., Monreal, Canada, June 1996, pp. 1-12.

[2] Wai-Ho Au and Keith C.C. Chan, “FARM: A Data Mining System for Discovering Fuzzy

Association Rules,” in Proc. Of 1999 IEEE International Fuzzy Systems Conf., Seoul, Korea,

August 1999, pp. 1217-1222.

[3] Miguel Delgado, Nicolás Marín, Daniel Sánchez and María-Amparo Vila, “Fuzzy Association

Rules: General Model and Applications,” IEEE Trans. Fuzzy Syst., vol.11, pp. 214-225, April,

2003.

[4] J. Han and M. Kamber “Data Mining: Concepts and Techniques [M],” San Francisco: Morgan

Kaufmann Publishers, 2005.

[5] S. Guillaume, “Designing fuzzy inference systems from data: An interpretability-oriented

review,” IEEE Trans. Fuzzy Syst., vol. 9, pp.426–443, June, 2001.

[6] Y. F. Wang, D. H. Wang and T. Y. Chai, “Extraction of Fuzzy Rules with Completeness and

Robustness,” Acta Automatica Sinica, Vol. 36, No. 9, September, 2010. In Chinese

[7] Wang L X, Mendel J M. “Generating fuzzy rules by learning from examples,” IEEE Trans.

Systems, Man and Cybernetics, Vol. 22, No. 6, pp. 1414–1427, November/December, 1992.

Applied Mechanics and Materials Vols. 182-183 2007

Page 6: Mining Fuzzy Association Rules in Quantitative Databases

Applied Mechanics and Mechatronics Automation 10.4028/www.scientific.net/AMM.182-183 Mining Fuzzy Association Rules in Quantitative Databases 10.4028/www.scientific.net/AMM.182-183.2003