Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan...

19
Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of Economics Prague

Transcript of Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan...

Page 1: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

Mining for Patterns Based on Contingency Tables by KL-Miner

First Experience

Jan Rauch Milan Šimůnek (PhD. student)

Václav Lín (student)University of Economics Prague

Page 2: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 2

… KL-Miner, First Experience

KL-Miner Basic features

Application example

Implementation principles

Scalability

Concluding remarks

Page 3: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 3

KL-Miner -- Data and Patterns

M A1 A2 … AP

o1 2 12 … 1

o2 1 5 … 4

… … … … …

on 3 9 … 2

Data:

Data Matrix

Patterns i.e. KL-hypothesis: R C /

row attribute R {A1, …, AP}, possible values i.e. categories: r1, …, rK

column attribute C {A1, …, AP}, possible values i.e. categories: c1, …, cL

Boolean attribute derived from other attributes A1, …, AP

KL quantifier …. Condition imposed on contingency table of R and C

Page 4: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 4

KL – quantifiers

Contingency table

of R and C:

Examples of quantifiers:

Simple aggregate function:

Kendall’s quantifier: e.g. |b | P

Page 5: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 5

Kendall’s quantifier

b 0;1

b > 0 … positive ordinal dependence

b < 0 … negative ordinal dependence

b = 0 … ordinal independence

| b | = 1 … C is a function of R

Kendall’s quantifier: e. g. | b | p or | b | p

:Kendall’s coeficient:

Page 6: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 6

KL-Miner application example STULONG Project, 1419 patients, entry examination

See http://euromise.vse.cz

Page 7: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 7

STULONG attributes examples (1)

Systolic blood pressure

Smoking

Group of patients

Page 8: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 8

STULONG attributes examples (2)

Skinfold above musculus triceps (mm)

Beer – amount / day

219 attributes total

38 ordinal attributes

We use 17 ordinal attributes

Page 9: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 9

Example - analytic questionAre there any ordinal dependencies among attributes under some conditions?

at least 50 patients

| b | 0.75

relevant conditions :

Page 10: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 10

Example – relevant condition specification (1)

Group of patients (normal), Group of patients (risk), …

Beer 10(yes), Beer 12(yes), …, Beer 10(yes) Beer 12(yes)

Sliding windows …

Page 11: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 11

Example – relevant condition specification (2)

4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 504, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50

...........

4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50

4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, ....., 43, 44, 45, 46, 47, 48, 49, 50

Sliding window

Page 12: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 12

Example – output overview2 min 1sec

550 310 verifications

25 hypotheses

3.06 GHz

512 MB DDR SDRAM

Page 13: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 13

Example – output detail (1)

b = 0.82 (i.e. strong positive ordinal dependence)

Page 14: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 14

Example – output detail (2)

b = 0.78 (i.e. strong positive ordinal dependence)

Page 15: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 15

Implementation principles (1)

M A1 A2 … AP A1[1] A1 [2] A1 [3]

o1 2 12 … 1 0 1 0

o2 1 5 … 4 1 0 0

… … … … … … … …

on 3 9 … 2 0 0 1

Attributes Cards of categories of A1

Attributes are represented by cards of categories i.e. strings of bits

Page 16: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 16

Implementation principles (2)

CARD [] = bit string representation of Booelan attribute

CARD [ Group of patients (normal) Beer 10(yes) Beer 12(yes) ]

= Group of patients [normal] Beer 10[yes] Beer 12[yes]

Count() – number of “1” in the bit string

Page 17: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 17

Implementation principles (3)

n1,1 = Count( R[r1] C[c1] CARD [])

Page 18: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 18

Scalability

75 000 verifications

approximately linear

Page 19: Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

FDM 2003 19

Concluding remarks

KL-Miner practically interesting results

Suitable for interactive work

Further quantifiers

Combinations with further mining procedures