FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge...

24
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information Sciences Ohio State University

Transcript of FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge...

Page 1: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

FREERIDE: System Support for High Performance Data Mining

Ruoming Jin Leo Glimcher Xuan Zhang

Ge Yang Gagan Agrawal

Department of Computer and Information SciencesOhio State University

Page 2: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Motivation: Data Mining Problem Datasets available for mining

are often large Our understanding of what

algorithms and parameters will give desired insights is limited

Time required for creating scalable implementations of different algorithms and running them with different parameters on large datasets slows down the data mining process

Page 3: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Project Overview

FREERIDE (Framework for Rapid Implementation of datamining engines) as the base system

Demonstrated for a variety of standard mining algorithms

Page 4: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

FREERIDE offers: The ability to rapidly prototype a high-performance mining

implementation Distributed memory parallelization Shared memory parallelization Ability to process large and disk-resident datasets Only modest modifications to a sequential implementation for

the above three

Page 5: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Key Observation from Mining Algorithms

Popular algorithms have a common canonical loop

Can be used as the basis for supporting a common middleware

While( ) {

forall( data instances d) {

I = process(d)

R(I) = R(I) op d

}

…….

}

Page 6: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Shared Memory Parallelization Techniques

Full Replication: create a copy of the reduction object for each thread

Full Locking: associate a lock with each element

Optimized Full Locking: put the element and corresponding lock on the same cache block

Fixed Locking: use a fixed number of locks Cache Sensitive Locking: one lock for all

elements in a cache block

Page 7: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Memory Layout for Various Locking Schemes

Full Locking Fixed Locking

Optimized Full Locking Cache-Sensitive Locking

Lock Reduction Element

Page 8: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Trade-offs between Techniques

Memory requirements: high memory requirements can cause memory thrashing

Contention: if the number of reduction elements is small, contention for locks can be a significant factor

Coherence cache misses and false sharing: more likely with a small number of reduction elements

Page 9: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Combining Shared Memory and Distributed Memory Parallelization

Distributed memory parallelization by replication of reduction object

Naturally combines with full replication on shared memory

For locking with non-trivial memory layouts, two options

Communicate locks also Copy reduction elements to a separate buffer

Page 10: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Apriori Association Mining

Relative Performance of Full Replication, Optimized Full Locking and Cache-Sensitive Locking

0

20000

4000060000

80000

100000

0.10% 0.05% 0.03% 0.02%

Support Levels

Tim

e(s)

fr

ofl

csl

500MB dataset, N2000,L20, 4 threads

Page 11: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

K-means Shared Memory Parallelization

0

200

400

600

800

1000

1200

1400

1600

1thread

4threads

16threads

full repl

opt full locks

cache sens.Locks

Page 12: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Performance on Cluster of SMPs

0

10000

20000

30000

40000

50000

60000

70000

1 node 2nodes

4nodes

8nodes

1 thread 2 threads 3 threads

Apriori Association Mining

Page 13: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Results from EM Clustering Algorithm

EM is a popular data mining algorithm

Can we parallelize it using the same support that worked for other clustering algo (k-means) and algo for other mining tasks

0

500

1000

1500

2000

2500

3000

1 2 4 8nodes

1 thread 2 threads 3 threads 4 threads

Page 14: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Results from FP-tree

0

10

20

30

40

50

60

1 2 4 8

1 thread 2 threads 3 threads

FPtree: 800 MB dataset 20 frequent itemsets

Page 15: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

A Case Study: Decision Tree Construction

• Question: can we parallelize decision tree construction using the same framework ?

• Most existing parallel algorithms have a fairly different structure (sorting, writing back …)

• Being able to support decision tree construction will significantly add to the usefulness of the framework

Page 16: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Approach

Implemented RainForest framework (Gehrke) Currently focus on RF-read Overview of the algorithm

While the stop condition not satisfied read the data build the AVC-group for nodes choose the splitting attributes to split nodes select a new set of node to process as long as the main

memory could hold it

Page 17: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Parallelization Strategies

Pure approach: only apply one of full replication, optimized full locking and cache-sensitive locking

Vertical approach: use replication at top levels, locking at lower

Horizontal: use replication for attributes with a small number of distinct values, locking otherwise

Mixed approach: combine the above two

Page 18: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Results

0

500

1000

1500

2000

2500

3000

3500

1 2 4 8

No. of Nodes

Tim

e(s

) fr

ofl

csl

Performance of pure versions, 1.3GB dataset with 32 million records in the training set, function 7, the depth of decision tree = 16.

Page 19: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Results

Combining full replication and full locking

0

500

1000

1500

2000

2500

3000

1 2 4 8

No. of Nodes

Tim

e(s

) horizontal

vertical

mixed

Page 20: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Results

Combining full replication and cache-sensitive locking

0

500

1000

1500

2000

2500

3000

1 2 4 8

No. of Nodes

Tim

e(s

) horizontal

vertical

mixed

Page 21: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Combining Distributed Memory and Shared Memory Parallelization for Decision Tree

The key problem: large size of AVC groups means very high communication volume

Results in very limited speedups Can we modify the algorithm to reduce

communication volume ?

Page 22: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

SPIES On (a) FREERIDE Developed a new

communication efficient decision tree construction algorithm – Statistical Pruning of Intervals for Enhanced Scalability (SPIES)

Combines RainForest with statistical pruning of intervals of numerical attributes to reduce memory requirements and communication volume

Does not require sorting of data, or partitioning and writing-back of records

Paper in SDM regular program

0

1000

2000

3000

4000

5000

6000

7000

1node

8nodes

1thread 2threads 3threads

Page 23: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Applying FREERIDE for Scientific Data Mining

Joint work with Machiraju and Parthasarathy

Focusing on feature extraction, tracking, and mining approach developed by Machiraju et al.

A feature is a region of interest in a dataset

A suite of algorithms for extracting and tracking features

Page 24: FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Summary

Demonstrated a common framework for parallelization of a wide range of mining algos

Association mining – apriori and fp-tree Clustering – k-means and EM Decision tree construction Nearest neighbor search

Both shared memory and distributed memory parallelism

A number of advantages Ease parallelization Support higher-level interfaces