Copyright 2005, Data Mining Research Lab, The Ohio State University Cache-conscious Frequent Pattern...
-
Upload
joleen-underwood -
Category
Documents
-
view
214 -
download
0
Transcript of Copyright 2005, Data Mining Research Lab, The Ohio State University Cache-conscious Frequent Pattern...
Copyright 2005, Data Mining Research Lab, The Ohio State University
Cache-conscious Frequent Pattern Mining on a Modern Processor
Amol Ghoting, Gregory Buehrer, and Srinivasan Parthasarathy
Data Mining Research Laboratory, CSEThe Ohio State University
Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, and Pradeep Dubey
Intel Corporation
Copyright 2005, Data Mining Research Lab, The Ohio State University
Roadmap
• Motivation and Contributions• Background• Performance characterization• Cache-conscious optimizations• Related work• Conclusions
Copyright 2005, Data Mining Research Lab, The Ohio State University
Motivation
• Data mining applications– Rapidly growing segment
in commerce and science– Interactive response
time is important– Compute- and memory-
intensive• Modern architectures
– Memory wall– Instruction level
parallelism (ILP)
FP-Growth
Note: Experiment conducted on specialized hardware
SATURATION
2.4x
1.6x
Copyright 2005, Data Mining Research Lab, The Ohio State University
Contributions
• We characterize the performance and memory access behavior of three state-of-the-art frequent pattern mining algorithms
• We improve the performance of the three frequent pattern mining algorithms– Cache-conscious prefix tree
• Spatial locality + hardware pre-fetching• Path tiling to improve temporal locality• Co-scheduling to improve ILP on a
simultaneous multi-threaded (SMT) processor
Copyright 2005, Data Mining Research Lab, The Ohio State University
Roadmap
• Motivation and Contributions• Background• Performance characterization• Cache-conscious optimizations• Related work• Conclusions
Copyright 2005, Data Mining Research Lab, The Ohio State University
• Finds groups of items that co-occur frequently in a transactional data set
• Example:
• Minimum support = 2• Frequent patterns:
– Size 1: (milk), (bread), (eggs), (sugar)– Size 2: (milk & bread), (milk & eggs), (sugar & eggs)– Size 3: None
• Search space traversal: breadth-first or depth-first
Frequent pattern mining (1)
Customer 1: milk bread cereal
Customer 2: milk bread eggs sugar
Customer 3: milk bread butter
Customer 4: eggs sugar
Copyright 2005, Data Mining Research Lab, The Ohio State University
Frequent pattern mining (2)
• Algorithms under study– FP-Growth (based on the pattern-growth
method)• Winner of the 2003 Frequent Itemset Mining
Implementations (FIMI) evaluation– Genmax (depth-first search space traversal)
• Maximal pattern miner– Apriori (breadth-first search space traversal)
• All algorithms use a prefix tree as a data set representation
Copyright 2005, Data Mining Research Lab, The Ohio State University
Roadmap
• Motivation and Contributions• Background• Performance characterization• Cache-conscious optimizations• Related work• Conclusions
Copyright 2005, Data Mining Research Lab, The Ohio State University
Setup
• Intel Xeon processor– At 2Ghz with 4GB of main memory– 4-way 8KB L1 data cache– 8-way 512KB L2 cache on chip– 8-way 2MB L3 cache
• Intel VTune Performance Analyzers to collect performance data• Implementations
– FIMI repository• FP-Growth (Gosta Grahne and Jianfei Zhu)• Genmax (Gosta Grahne and Jianfei Zhu)• Apriori (Christian Borgelt)
– Custom memory managers
Copyright 2005, Data Mining Research Lab, The Ohio State University
Execution time breakdown
FPGrowth Genmax Apriori
Count-FPGrowth () – 61% Count-GM () – 91% Count-Apriori () – 70%
Project-FPGrowth () – 31% Other – 9% Candidate-Gen () – 25%
Other – 8% Other – 5%
Support counting in a prefix tree
Similar to Count-FPGrowth ()
Copyright 2005, Data Mining Research Lab, The Ohio State University
Operation mix
Count-FPGrowth ()
Count-GM ()
Count-Apriori ()
Integer ALU operations per
instruction
0.65 0.64 0.34
Memory operations per
instruction
0.72 0.69 0.66
Note: Each column need not sum up to 1
Copyright 2005, Data Mining Research Lab, The Ohio State University
Memory access behavior
8%9%9%CPU utilization
0.040.030.03L3 misses per instruction
27%40%39%L3 hit rate
49%42%43%L2 hit rate
86%87%89%L1 hit rate
Count-Apriori()Count-GM()Count-FPGrowth()
Copyright 2005, Data Mining Research Lab, The Ohio State University
Roadmap
• Motivation and Contributions• Background• Performance characterization• Cache-conscious optimizations• Related work• Conclusions
Copyright 2005, Data Mining Research Lab, The Ohio State University
FP-tree
Minimum support = 3
a:1
p:1
c:1
f:1
m:1
r
COUNT
ITEM
NODE POINTER
CHILD POINTERS
PARENT POINTER
a:2
p:1
c:2
f:2
m:1
r
b:1
m:1
a:4
p:2
c:3
f:3
m:2
c:1
b:1
p:1
f:1
b:1
b:1
r
m:1•Index based on largest common prefix Typically results in a compressed
data set representation
Node pointers allow for fast searching of items
Copyright 2005, Data Mining Research Lab, The Ohio State University
FP-Growth
• Basic step:– For each item in the FP-tree,
build conditional FP-tree– Recursively mine the
conditional FP-tree
a:4
p:2
c:3
f:3
m:2
c:1
b:1
p:1
f:1
b:1
b:1
r
m:1
r
a:3
c:3
f:3
Conditional
FP-tree for m:
COUNT
ITEM
NODE POINTER
CHILD POINTERS
PARENT POINTER
Mine conditional FP-tree
We process items p, f, c, a, b similarly
•Dynamic data structure and only two node fields are used through the bottom up traversal
•Poor spatial locality
•Large data structure
•Poor temporal locality
•Pointer de-referencing
•Poor instruction level parallelism (ILP)
Copyright 2005, Data Mining Research Lab, The Ohio State University
Cache-conscious prefix tree• Improves cache line utilization
– Smaller node size + DFS allocation• Allows the use of hardware cache line pre-fetching for bottom-
up traversals
a
p
c
f
m
c
b
p
f
b
b
r
m
ITEM
oo o o
oooooooo
Header lists (for Node pointers)
oo o o
oooooooo
Count lists (for Node counts)
DFS allocation
abcfmp
abcfmp
COUNT
CHILD POINTERS
NODE POINTER
PARENT POINTER
Copyright 2005, Data Mining Research Lab, The Ohio State University
Path tiling to improve temporal locality
• DFS order enables breaking the tree into tiles based on node addresses– These tiles can
partially overlap• Maximize tree reuse
– All tree accesses are restructured to operate on a tile-by-tile basis
r
Tile 1 Tile 2 Tile NTile N-1
Copyright 2005, Data Mining Research Lab, The Ohio State University
Improving ILP using SMT
• Simultaneous multi-threading (SMT) – Intel hyper-threading (HT) maintains two hardware contexts on
chip– Improves instruction level parallelism (ILP)
• When one thread waits, the other thread can use CPU resources
• Identifying independent threads is not good enough• Unlikely to hide long cache miss latency• Can lead to cache interference (conflicts)
• Solution: Restructure multi-threaded computation to reuse cache on a tile-by-tile basis
r
Tile 1 Tile 2 Tile NTile N-1
Thread1 Thread2 Same data but different computation
Copyright 2005, Data Mining Research Lab, The Ohio State University
Speedup for FP-Growth (Synthetic data set)
4000 4500 5000 5500
Copyright 2005, Data Mining Research Lab, The Ohio State University
Speedup for FP-Growth(Real data set)
50000 58350 66650 75000
For FP-Growth, L1 hit rate improves from 89% to 94%L2 hit rate improves from 43% to 98%
SpeedupGenmax – up to 4.5x Apriori – up to 3.5x
Copyright 2005, Data Mining Research Lab, The Ohio State University
Roadmap
• Motivation and Contributions• Background• Performance characterization• Cache-conscious optimizations• Related work• Conclusions
Copyright 2005, Data Mining Research Lab, The Ohio State University
Related work (1)
• Data mining algorithms– Characterizations
• Self organizing maps– Kim et al. [WWC99]
• C4.5– Bradford and Fortes [WWC98]
• Sequence mining, graph mining, outlier detection, clustering, and decision tree induction
– Ghoting et al. [DAMON05]– Memory placement techniques for association rule
mining• Considered the effects of memory pooling and coarse
grained spatial locality on association rule mining algorithms in a serial and parallel setting
– Parthasarathy et al. [SIGKDD98,KAIS01]
Copyright 2005, Data Mining Research Lab, The Ohio State University
Related work (2)
• Data base algorithms– DBMS on modern hardware
• Ailamaki et al. [VLDB99,VLDB2001] – Cache sensitive search trees and B+-trees
• Rao and Ross [VLDB99,SIGMOD00]– Prefetching for B+-trees and Hash-Join
• Chen et al. [SIGMOD00,ICDE04]
Copyright 2005, Data Mining Research Lab, The Ohio State University
Ongoing and future work
• Algorithm re-design for next generation architectures – e.g. graph mining on multi-core architectures
• Cache-conscious optimizations for other data mining and bioinformatics applications on modern architectures– e.g. classification algorithms, graph mining
• Out-of-core algorithm designs• Microprocessor design targeted at data
mining algorithms
Copyright 2005, Data Mining Research Lab, The Ohio State University
Conclusions
• Characterized the performance of three popular frequent pattern mining algorithms
• Proposed a tile-able cache-conscious prefix tree– Improves spatial locality and allows for cache line pre-fetching– Path tiling improves temporal locality
• Proposed novel thread-based decomposition for improving ILP by utilizing SMT– Overall, up to 4.8-fold speedup
• Effective algorithm design in data mining needs to take into account modern architectural designs.