Extracting Decisional Correlation Rules

24
Extracting Decisional Correlation Rules Alain Casali Christian Ernst

description

Extracting Decisional Correlation Rules. Alain Casali Christian Ernst. Industrial Problem. Given a supply chain (in micro- electronics) , we want to find links between some parameters ’ values and values of a specific attribute of the supply chain (the yield) . - PowerPoint PPT Presentation

Transcript of Extracting Decisional Correlation Rules

Page 1: Extracting Decisional Correlation Rules

Extracting Decisional Correlation Rules

Alain Casali

Christian Ernst

Page 2: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Industrial ProblemGiven a supply chain (in micro-electronics), we

want to find links between some parameters’ values and values of a specific attribute of the supply chain (the yield).

The use of positive (and/or negative) association rules is not suitable in our context.

We use correlation tests because: it is a more significant measure in a statistical way; the measure takes into account not only the presence but

also the absence of the items; the measure is non-directional, and can thus highlight

more complex existing links than a “simple ” implication.

Page 3: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 4: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Literal SetA literal set XY is composed by:

a positive part (X);a negative part (Y);

The variation of a literal set XY encompasses all the combinations that we can obtain from XY.Ex: Var(AB) = AB, AB, AB, AB

The support of a literal set is the number of transactions that contain its positive part and contain no 1-item of its negative part.

Page 5: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Correlation rule and χ2 (1)Contingency table

Expected Value

Tid Item Target

1 B C F T1

2 B C F T1

3 B C E T1

4 F T1

5 B D F T2

6 B F

7 B C F

8 A E

9 B C F

10 B F

Each cell of the contingency table (CT) of a pattern X contains the support of all literal sets YZ related to its variation:

CT (BF) B B ∑ line

F 7 1 8

F 1 1 2

∑ column 8 2 10

Page 6: Extracting Decisional Correlation Rules

Correlation rule and χ2 (2)Computation of χ2 (Brin’97)

Makes the link between real support and theoretical support (expected value)

Correlation rateutilization of a table giving the centile values with a single degree of freedom (existence of a bijection) Correlation (BF) ≈ 85%

Dexa'09 - Extracting Decision Correlation Rules

)( )ZE(Y

))²ZE(Y - )Z(Supp(Y)²(XTCZY

X ⇒χ2(BF) ≈ 1,67

Page 7: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Related ConstraintsAnti monotone constraint

(Cochran criteria):no cell of the CT must have a

null value; at least p% of the CT’s cells

must have a support greater or equal than MinSup;

Monotone ConstraintX symbolizes a valid correlation

rule: χ2(X) ≥ MinCor

Page 8: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Browsing the search spaceUtilization of levelwise algorithms to browse the

search space;Levelwise algorithms are adapted when:

the relation is on the disk;we have anti monotone constraints.

Problem: memory requirement for the contingency tables)*o( C2 i

n

1i

Level Memory requirement

2 4 MB3 2,5 GB4 1,3 TB

Example with |I| = 1000

Page 9: Extracting Decisional Correlation Rules

DEXA - Sept. 2006 9

Goal: enumerate the combinations (powerset lattice) with a balanced tree

Start point: 2 vectors; the 1st one is empty, the 2nd one contains the list of the itemsCreate 2 branches:

left: prune the last element of the 2nd vector (recursive call)

right: add the last element of the 2nd vector to the first (recursive call) Stop: when the 2nd vector is empty, then output the 1st vector

(,ABC)

(C,AB)(,AB)

(,A) (B,A)

(, ) (A,) (B,) (AB,)

Lectic Order & Lectic Search (LS)

Page 10: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 11: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Decision Correlation RulesWe are interested by rules satisfying the both

constraints:χ2(X) ≥ MinCorX contains 1 value of the target attribute

Problem: it does not exist a function f such that

χ2(X ∪ A) = f(χ2(X), supp(A))

Page 12: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 13: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Contingency Vector (1)Equivalence class associated with a literal

Contingency Vector of a pattern XSet of equivalence classes of the variation of X

[YZ] = {i Tid(r) / Y Tid(i) et Z Tid(i) = }Ex : [B F] = {3}

Ex : CV (B F) = { [BF], [BF], [BF], [BF]} = {{8}, {4}, {3},

{1,2,5,6,7,9,10}

Tid Item Target

1 B C F T1

2 B C F T1

3 B C E T1

4 F T1

5 B D F T2

6 B F

7 B C F

8 A E

9 B C F

10 B F

Page 14: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Contingency Vector (2)The contingency vector is a partition of the

Tid’sRecurrence relation:

In practice:

VC (X A) = (VC(X) [A]) (VC(X) [A])

Tid 1 2 3 4 5 6 7 8 9 10

VC(B) 1 1 1 0 1 1 1 0 1 1

Tid 1 2 3 4 5 6 7 8 9 10

VC(F) 1 1 0 1 1 1 1 0 1 1

Tid 1 2 3 4 5 6 7 8 9 10

VC(B) + VC(F) = VC(B F) 11 11 10 01 11 11 11 00 11 11

Additions in binary logic

Page 15: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Contingency Vector (3)Tid 1 2 3 4 5 6 7 8 9 10

VC(B) + VC(F) = VC(B F) 11 11 10 01 11 11 11 00 11 11

«Distribution» B F B F B F B F B F

TC[B F] 1 1 1 7

Computation of the contingency table

Page 16: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 17: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

LHS χ2 AlgorithmModification of LS in order to include the

contingency vectors;If we are on a node:

Call to the left branch: we do nothing;Before calling the right branch:

Computation of the new contingency vector; Test of the anti monotone constraints; [Add current pattern to the positive border] Test of the monotone constraints; Computation of the χ2

If all tests are OK, then output the pattern and its χ2

Page 18: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Memory RequirementsWhat is the needed storage requirement?Contingency Vectors of the 1-item:

|I|*|r| bitsCurrents contingency vectors (including the

previous one due to recursive call):|I|*|I|*|r| bits in theory|I|*|r| bytes in practice since we never

exceed pattern having a length greater than 8Finally we need: |r|*(|I|+|I|/8) bytes

this result has to be compared with )*o( C2 i

n

1i

Page 19: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 20: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Experimental Analysis (1) Experiments are made on PC with a 1.8 GHz

processor with a RAM of 2GoFiles are provided by 2 manufacturers

(STMicroelectronics and ATMEL)

STMicroelectronics

ATMEL

# transactions 492 426# Items 3384 1136

Page 21: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Experimental Analysis (2)

Page 22: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

Experimental Analysis (2)

Page 23: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

OutlinePreliminariesDecision Correlation RulesContingency VectorsLHS-χ2 algorithmExperimental Analysis Conclusion

Page 24: Extracting Decisional Correlation Rules

Dexa'09 - Extracting Decision Correlation Rules

ConclusionWe have discovered new parameters having an

influence on the yield (above 25% was not known before);

Better response time between 30 and 70% with LHS-χ2 compared to a levelwise algorithm;

Perspectives:Utilization of “divided and conquer” strategy for

better performances;« Cleaning » / Transformation of original data;Generalization of the rules by integrated literal

sets.