A BETTER APPROACH TO MINE FREQUENT ITEMSETS USING APRIORI AND FP
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window
description
Transcript of Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window
1
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window
Yun Chi, Haixun Wang, Philip S. Yu, RYun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004.ichard R. Muntz, ICDM 2004.
Adviser: Jia-Ling KohAdviser: Jia-Ling KohSpeaker: Shu-Ning ShinSpeaker: Shu-Ning ShinDate: 2005.5.6Date: 2005.5.6
2
IntroductionIntroduction• Algorithm Moment: Mime closed freque
nt itemsets in the most N transactions in data stream.
• Data structure, closed enumeration tree (CET), maintain:– Closed frequent itemsets,– Boundary between closed frequent itemset
s and the rest.
3
ProblemProblem• Lexicographic order: • Closed frequent itemset: none of its supersets
has the same support.• Items Σ={A, B, C, D}, window size N=4, minimu
m support s = ½.
CDACABCAB
4
CET (1)CET (1)• Four types of itemsets node:
– Infrequent:• Infrequent gateway node, dashed circle — D.
– Frequent but not closed:• Unpromising gateway node, dashed rectangle — AC.• Intermediate node — A.
– Closed:• Closed node, solid rectangle — ABC.
5
CET (2)CET (2)• Property 1: if nI is an infrequent gateway node, t
hen any node nJ where represents an infrequent itemset.
• Property 2: if nI is an unpromising gateway node, then nI is not closed, and none of nI’s descendents is closed.
• Property 3: if nI is an intermediate node, then nI is not closed and nI has closed descendents.
IJ
6
Moment: Build CET (1)Moment: Build CET (1)
• Node nI has information :– itemset I, node type, support, tid_sum
• Hash table: – store all closed frequent itemsets– check if nI is an unpromising gateway node,
if exit a nJ where – hash on the (support, tid_sum) of nI
IJIJ ,
7
Moment: Build CET (2)Moment: Build CET (2)
8
Moment: Build CET (3)Moment: Build CET (3)
• Items Σ={A, B, C, D}, Explore(n{i}), for each i in Σ.
ψ
D0C0B0A0
9
Moment: Add CET (1)Moment: Add CET (1)
10
Moment: Add CET (2)Moment: Add CET (2)• Adding a transaction tid 5:• Call Addition(nψ, t5, D, minsup)
5 A, C, D
4 A 4 C 2 D
3 AC 0 AD 0 CD
ψ
F={D}2 CDAD1
11
Moment: Delete CET (1)Moment: Delete CET (1)
12
Moment: Delete CET (2)Moment: Delete CET (2)• Deleting a transaction tid 1:
3 C 1 F={D}D
13
Moment: Update CET (3)Moment: Update CET (3)• Deleting a transaction tid 2:
3 A
2 AB
2 B
14
Experiment (1)Experiment (1)• Dataset: T20I4D100K• Window Size N = 100000
15
Experiment (2)Experiment (2)
16
Experiment (3)Experiment (3)• Real Datase: BMS-WebView-1• Items: 497, transactions: 59602• Window Size N = 50000