Outline. Conceptualization of diversity with unbalanced hierarchies.

31
Outline Introduction Conceptof diverse-frequent patterns . Extracting diverse-frequent patterns using balanced concept hierarchies . Extracting diverse-frequent patterns using unbalanced concept hierarchies . Experim entalResults. Related W ork Conclusionsand Future W ork.

Transcript of Outline. Conceptualization of diversity with unbalanced hierarchies.

Page 1: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 2: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conceptualization of diversity with unbalanced hierarchies

In a balanced concept hierarchy

o items in the patterns start from the same lowest level of hierarchy.

In an unbalanced concept hierarchy

o traversal of a frequent pattern starts from the level where the deepest item in the pattern lies.

o Different starting positions for different frequent patterns makes it difficult to differentiate patterns using an unbalanced concept hierarchy.

Maximum Diversity: In an unbalanced concept hierarchy, a frequent pattern with n items will have the maximum diversity when all the n items are at the level same as of the height of the concept hierarchy and have only “Root” as the common parent between them.

Value of Maximum Diversity fixed to 1.

All the other frequent patterns with n items will be given a diversity score with respect to this pattern.

Page 3: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conceptualization of diversity with unbalanced hierarchies

Balanced Frequent Pattern:

Consider a frequent pattern Y = {i1, i2 · · · , in} with n items and a concept hierarchy of height h. A balanced frequent pattern is defined when height of all the items in the pattern is same, i.e., ∀j ∈ {1, 2 · · · n} : h(ij) = k, where k is a constant ( 0 ≤ k ≤ h ).

Unbalanced Frequent Pattern:

Consider a frequent pattern Y = {i1, i2 · · · , in} with n items and a concept hierarchy of height h. An unbalanced frequent pattern is defined when height of any two items in the pattern is not same, i.e., when ∃ p, q s.t. hip 6= hiq where ( 1 ≤ p, q ≤ n ).

Page 4: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conceptualization of diversity with unbalanced hierarchies

Lower-bound balanced Frequent Pattern: Consider an unbalanced frequent pattern Y = {i1, i2 · · · in} with n items, a concept hierarchy of height h and h(i1), h(i2) · · · h(in) be the heights of the corresponding items. Let h(ij) be the least value among all the heights of items. A lower-bound balanced frequent pattern is defined when all the items in Y are brought down to same level h(ij).

Construction:Given an unbalanced frequent pattern, first calculate the height of the item whichlies highest in the concept hierarchy. Let this height be h(l).Remove all the extra edges belowthe level h(l).For the items which are below level h(l),we replace them by their corresponding parents at level h(l) (with duplicates removed).

Page 5: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conceptualization of diversity with unbalanced hierarchies

Upper-bound Balanced Frequent Pattern: Consider an unbalanced frequent pattern Y = {i1, i2 · · · in} with n items, a concept hierarchy of height h and h(i1), h(i2) · · · h(in) be the heights of the corresponding items. Let h(ij) be the highest value among all the heights of items. A upper-bound balanced frequent pattern is defined when all the items in Y are brought down to same level h(ij).

Construction:

•First calculate the height of the item which lies highest and deepest in the concept hierarchy. •Let these height be h(il) and h(id) respectively. •Starting from the level h(il),we keep on adding one dummy edge for all the imbalanced nodes till we reach the level h(id).

Page 6: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conceptualization of diversity with unbalanced hierarchies

Figure : Unbalanced and corresponding Lower-Bound and Upper-Bound Balanced Frequent Patterns. The set {i1, i2} in (b) denotes the parent of items i1 and i2 at level 1.

Page 7: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 8: Outline. Conceptualization of diversity with unbalanced hierarchies.

Computing diversity Diversity of an unbalanced frequent pattern: diversity of the balanced

part and the remaining part.

To compute diversity of an unbalanced frequent pattern:

o Conversion to lower-bound balanced frequent pattern.

DR(Y) = DRB(Y’) + Cost of removing edges.

DR(Y) > DRB(Y’)

Y’ = corresponding lower-bound balanced frequent pattern.

Conversion to upper-bound balanced frequent pattern.

o DR(Y) = DRB(Y’’) – Cost of adding edges.

o DR(Y) < DRB(Y’’)

o Y’’ = corresponding upper-bound balanced frequent pattern.

Page 9: Outline. Conceptualization of diversity with unbalanced hierarchies.

Computing diversity

•Generalization for certain items missing.•Diversity of a balanced frequent pattern depends on :

•Merging Factor•Level Factor. • Contribution of MF at every level as some items are already at a generalized level.• Notion of Adjustment Factor (AF) to calculate the contribution of MF at each level.

Page 10: Outline. Conceptualization of diversity with unbalanced hierarchies.

Adjustment Factor (AF)

Depends on the fact that w.r.t no. of edges at l in the upper-bound balanced frequent pattern, how many edges are missing at the same level l in the unbalanced frequent pattern.

Ratio of the no. of edges in Y and the no. of edges in the corresponding upper-bound balanced frequent pattern Y at l.

Where |EUFP(Y, l)| is the number of edges in the unbalanced frequent pattern Y at level l, |EUBFP(Y, l)| is the number of edges in the corresponding upper-bound balanced frequent pattern Y at level l and 0 ≤ l < h.

Page 11: Outline. Conceptualization of diversity with unbalanced hierarchies.

Example of Adjustment Factor (AF)

Y = {whole milk, pepsi, coke, shampoo}Y’= {whole milk, node2, node3, node4}Height of the hierarchy = 5Height of the item which lies deepest = 5Number of edges in Y at level 4. |EUFP(Y, 4)| = 1 Number of edges in upper-bound balanced pattern of Y = 4|EUBFP(Y, 4)| = 4.AF(Y,4) = ¼ = 0.25Similarly, AF(Y,3) = ¾ = 0.75

root Drink Beauty milk Soft Drink hair Original Cola shampoo

fat coke pepsi node1

whole milk node2 node3 node4

Page 12: Outline. Conceptualization of diversity with unbalanced hierarchies.

DiverseRank(DRB)DR is the summation of products of MF, AF and LF at every level.

The level where the length of GFP becomes 1 is s.

h is the height of the balanced concept hierarchy.

Range: 0 ≤ DR(Y) ≤ 1.

DRB(Y)=0 denotes that all the items are at the same level and have the same immediate common parent.

DRB(Y)=1 denotes that all the items are at the lowest level of hierarchy and have “root” as the only common parent between them.

Page 13: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 14: Outline. Conceptualization of diversity with unbalanced hierarchies.

Algorithm

14

Input

List of encoded frequent pattern.

Height of the unbalanced concept hierarchy.

User-specified minimum diversity threshold, minDiv.

Output

List of diverse-frequent patterns.

Algorithm

Calculate the PLF for each level of concept hierarchy.

Repeat the following steps for every encoded frequent pattern Y .

Set DiverseRank(Y) to zero.

Set hd as the height of the item in Y which lies deepest in the hierarchy.

Set l = (hd − 1).

Repeat the following steps until the length of the GFP(Y, l) becomes 1.

Generate GFP(Y, l).

Calculate the AF(Y, l).

Calculate the DiverseRank value at that level.

Set l = l − 1.

DiverseRank(Y ) > minDiv, output it as a diverse-frequent pattern

Page 15: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 16: Outline. Conceptualization of diversity with unbalanced hierarchies.

Experimental Analysis

16

The experiments were carried out on the classical R “groceries” market basket analysis data set.

The groceries data set contains 30 days of point-of-sale transaction data from a typical local grocery outlet.

Page 17: Outline. Conceptualization of diversity with unbalanced hierarchies.

Generating Concept hierarchy

17

To generate a concept hierarchy for the items, a web-based Grocery API provided by Tesco (a United Kingdom Grocery Chain Store) is used.

Some of the items that were not listed in the concept hierarchy of Tesco are added manually by consulting the domain experts.

The total numbers of nodes in the concept hierarchy were 220 and the height of the concept hierarchy is 4.

Some of the paths for items stored in the concept hierarchy are given following.

• soda : groceries/drinks/soft drinks/adult drinks & mixers/soda

• whole milk : groceries/fresh food/dairy, eggs & cheese/whole milk

• shopping bags : groceries/household/food storage/bags/shopping bags

• rolls-buns : groceries/bakery/wraps, pitta & naan/rolls-buns

• newspapers: groceries/home & ents/magazines, newspapers & tobacconis /daily/newspapers

Page 18: Outline. Conceptualization of diversity with unbalanced hierarchies.

18

Number of diverse-frequent patterns VS minDiv

With the increase in minDiv , the number of diverse-frequent patterns has decreased irrespective of minSup threshold. This is because of the fact that the items in the several frequent patterns belong to one or few categories.

Page 19: Outline. Conceptualization of diversity with unbalanced hierarchies.

Top 10 frequent patterns w.r.t to support

19

Extracted top 10 frequent patterns of size 3 w.r.t. support.Highest support count for a pattern is 2.3(%).

Page 20: Outline. Conceptualization of diversity with unbalanced hierarchies.

Top 10 diverse-frequent patterns

20

Extracted top 10 diverse-frequent patterns of size 3. The highest DiverseRank value is 1.

Page 21: Outline. Conceptualization of diversity with unbalanced hierarchies.

Experimental Observations

21

Frequent patterns having the highest DiverseRank value may not be the patterns with the highest support.

Similarly, the frequent patterns with the highest support may not have the highest value of DiverseRank.

Page 22: Outline. Conceptualization of diversity with unbalanced hierarchies.

Experimental Results

22

To observe the significant difference between the DiverseRank of the upper-bound balanced pattern and the DiverseRank of the unbalanced frequent pattern, we simulate the concept hierarchy such that it becomes very unbalanced.

We pushed some of the items deeper in the hierarchy.

A random number from the list {1, 1, 1, 1, 2, 2, 2, 9, 9, 9, 10, 10, 10} is chosen to add that many number of edges to increase the height of the respective items.

This list is used to ensure that in a frequent pattern, the height difference between highest and deepest item is high.

Page 23: Outline. Conceptualization of diversity with unbalanced hierarchies.

•Height of the simulated concept hierarchy : 14.• Distribution of items in the simulated concept hierarchy is as follows:

Experimental Results

23

Page 24: Outline. Conceptualization of diversity with unbalanced hierarchies.

Experimental Results

24

• After simulating the hierarchy, the difference between the DR of the unbalanced pattern and the DR value of upper-bound balanced frequent pattern is as high as 0.4.

Page 25: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 26: Outline. Conceptualization of diversity with unbalanced hierarchies.

Related Work

26

Frequent patterns were first introduced by Rakesh Agrawal in 1993.

Several interestingness measures have been proposed along with support to filter out uninteresting frequent patterns and association rules.

The interestingness measures can be objective or subjective.

• The objective measures are statistical by nature and do not consider any domain knowledge.

• Examples : relative support ,all-confidence, any-confidence, bond, lift etc.

• The subjective measure assess the interestingness of a pattern using the user’s existing concepts and domain knowledge.

• Examples : general impressions, fuzzy rules and hard-soft beliefs.

Page 27: Outline. Conceptualization of diversity with unbalanced hierarchies.

Concept Hierarchies in Data Mining

27

To discover the generalized association rules (Srikant and Agrawal et al, 1995).

• Capture interesting rules at all levels of multiple hierarchies

To discover multiple-level association rules (Han and Fu. et al, 1999) .

• A top-down progressive deepening technique is used to extract association rules at different levels of concept hierarchy.

A keyword suggestion approach based on the concept hierarchy has been proposed to facilitate the user’s web search (Chen, Xue and Yu. et al, 2008) .

Page 28: Outline. Conceptualization of diversity with unbalanced hierarchies.

Related Work

28

Diversity has widely studied in the literature to assess the interestingness of the summaries.

An effort to extend the diversity-based measures to assess the interestingness of the datasets using the diverse association rules (Huebner et. al , 2009).

The notion of diversity is defined as the variation in the items’ frequencies and not according to the categories of items within it.

Page 29: Outline. Conceptualization of diversity with unbalanced hierarchies.

Outline

Introduction

Concept of diverse-frequent patterns.

Extracting diverse-frequent patterns using balanced concept hierarchies.

Extracting diverse-frequent patterns using unbalanced concept hierarchies.

Experimental Results.

Related Work

Conclusions and Future Work.

Page 30: Outline. Conceptualization of diversity with unbalanced hierarchies.

Conclusions and Future Work

30

A new class of user-interest based patterns, called diverse-frequent patterns have been proposed.

An efficient algorithm to extract diverse-frequent patterns by using the item-encoding technique has been used.

The experiments on the real world data set show that the diverse-frequent patterns differ from frequent pattern knowledge.

Future work

Extend the notion of diverse-frequent patterns to improve the performance of clustering, classification and recommendation algorithms.

Another future research problem could be to extend the notion of the diversity to the association rules and investigate how the definition of diversity changes for the association rules.

Page 31: Outline. Conceptualization of diversity with unbalanced hierarchies.

R. J. B. Jr. Efficiently mining long patterns from databases. In SIGMOD Conference, pages 85–93, 1998.

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM, pages 401–407, 1994.

H.-P. Kriegel, M. Schubert, and A. Zimek. Angle-based outlier detection in high-dimensional data. In KDD,pages 444–452, 2008.

B. Liu, W. Hsu, and Y. Ma. Mining association rules with multiple minimum supports. In KDD, pages 337–341, 1999.

Tesco. Grocery api, https://secure.techfortesco.com/tescoapiweb/.

References