Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan...
-
Upload
javon-vines -
Category
Documents
-
view
212 -
download
0
Transcript of Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan...
Mining for Patterns Based on Contingency Tables by KL-Miner
First Experience
Jan Rauch Milan Šimůnek (PhD. student)
Václav Lín (student)University of Economics Prague
FDM 2003 2
… KL-Miner, First Experience
KL-Miner Basic features
Application example
Implementation principles
Scalability
Concluding remarks
FDM 2003 3
KL-Miner -- Data and Patterns
M A1 A2 … AP
o1 2 12 … 1
o2 1 5 … 4
… … … … …
on 3 9 … 2
Data:
Data Matrix
Patterns i.e. KL-hypothesis: R C /
row attribute R {A1, …, AP}, possible values i.e. categories: r1, …, rK
column attribute C {A1, …, AP}, possible values i.e. categories: c1, …, cL
Boolean attribute derived from other attributes A1, …, AP
KL quantifier …. Condition imposed on contingency table of R and C
FDM 2003 4
KL – quantifiers
Contingency table
of R and C:
Examples of quantifiers:
Simple aggregate function:
Kendall’s quantifier: e.g. |b | P
FDM 2003 5
Kendall’s quantifier
b 0;1
b > 0 … positive ordinal dependence
b < 0 … negative ordinal dependence
b = 0 … ordinal independence
| b | = 1 … C is a function of R
Kendall’s quantifier: e. g. | b | p or | b | p
:Kendall’s coeficient:
FDM 2003 6
KL-Miner application example STULONG Project, 1419 patients, entry examination
See http://euromise.vse.cz
FDM 2003 7
STULONG attributes examples (1)
Systolic blood pressure
Smoking
Group of patients
FDM 2003 8
STULONG attributes examples (2)
Skinfold above musculus triceps (mm)
Beer – amount / day
219 attributes total
38 ordinal attributes
We use 17 ordinal attributes
FDM 2003 9
Example - analytic questionAre there any ordinal dependencies among attributes under some conditions?
at least 50 patients
| b | 0.75
relevant conditions :
FDM 2003 10
Example – relevant condition specification (1)
Group of patients (normal), Group of patients (risk), …
Beer 10(yes), Beer 12(yes), …, Beer 10(yes) Beer 12(yes)
Sliding windows …
FDM 2003 11
Example – relevant condition specification (2)
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 504, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
...........
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50
Sliding window
FDM 2003 12
Example – output overview2 min 1sec
550 310 verifications
25 hypotheses
3.06 GHz
512 MB DDR SDRAM
FDM 2003 13
Example – output detail (1)
b = 0.82 (i.e. strong positive ordinal dependence)
FDM 2003 14
Example – output detail (2)
b = 0.78 (i.e. strong positive ordinal dependence)
FDM 2003 15
Implementation principles (1)
M A1 A2 … AP A1[1] A1 [2] A1 [3]
o1 2 12 … 1 0 1 0
o2 1 5 … 4 1 0 0
… … … … … … … …
on 3 9 … 2 0 0 1
Attributes Cards of categories of A1
Attributes are represented by cards of categories i.e. strings of bits
FDM 2003 16
Implementation principles (2)
CARD [] = bit string representation of Booelan attribute
CARD [ Group of patients (normal) Beer 10(yes) Beer 12(yes) ]
= Group of patients [normal] Beer 10[yes] Beer 12[yes]
Count() – number of “1” in the bit string
FDM 2003 17
Implementation principles (3)
n1,1 = Count( R[r1] C[c1] CARD [])
FDM 2003 18
Scalability
75 000 verifications
approximately linear
FDM 2003 19
Concluding remarks
KL-Miner practically interesting results
Suitable for interactive work
Further quantifiers
Combinations with further mining procedures