Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

24
Parallel C3M 1 Parallel C3M Aylin Tokuç Erkan Okuyan Özlem Gür
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    231
  • download

    2

Transcript of Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Page 1: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 1

Parallel C3MParallel C3M

Aylin TokuçErkan Okuyan

Özlem Gür

Aylin TokuçErkan Okuyan

Özlem Gür

Page 2: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 2

OutlineOutline

• Basics of Parallel computing

• Sequential C3M

• Parallel C3M

• Basics of Parallel computing

• Sequential C3M

• Parallel C3M

Page 3: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 3

Parallel ComputationParallel Computation

Decomposition: The process of dividing a computation into smaller parts.

Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.

Decomposition: The process of dividing a computation into smaller parts.

Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.

Page 4: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 4

Parallel Computation Primary Considerations

Parallel Computation Primary Considerations

• Load Balancing

• Minimizing Communication

• Task Dependency Optimization

• Load Balancing

• Minimizing Communication

• Task Dependency Optimization

Page 5: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 5

Parallel Computation Load Balancing

Parallel Computation Load Balancing

Page 6: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 6

Parallel Computation Minimizing Communication

Parallel Computation Minimizing Communication

Page 7: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 7

Parallel Computation Task Dependency Optimization

Parallel Computation Task Dependency Optimization

Page 8: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 8

C3M AlgorithmC3M Algorithm

1- Determine the cluster seeds of the database.

2- if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d

3- If there remain unclustered documents, group them into a ragbag cluster.

Page 9: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 9

C3M FormulasC3M Formulas

Page 10: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 10

C3M – Sample MatricesC3M – Sample Matrices

000101

110000

110001

001111

101001

D

.3750.0.125.375.125

0.0.417.4170.0.167

.083.277.361.083.194

.1880.0.063.563.188

.083.111.194.25.361

C

Page 11: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 11

Parallel C3M- DistributionParallel C3M- Distribution

Distribute rows among processors

Load balancing by cyclic block distribution

Distribute rows among processors

Load balancing by cyclic block distribution

Page 12: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 12

Local CalculationsLocal Calculations

All processors calculate α, partial β and PiAll processors calculate α, partial β and Pi

Current Method for Weighted Matrix: too costlyCurrent Method for Weighted Matrix: too costly

Need coloumn vectors (but row-wise partitioned)

Need coloumn vectors (but row-wise partitioned)

Page 13: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 13

Seed Powers PiSeed Powers Pi

• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.

• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.

• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.

• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.

Page 14: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 14

Minimize Communication - Proposed Heuristic

Minimize Communication - Proposed Heuristic

m

kkii d

1

),1min('

n

j

jjijiii mmdP

1

'1

''

# of non-zeros# of non-zeros

All processors calculate α, partial β and β’

Page 15: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 15

Effectiveness of HeuristicEffectiveness of Heuristic

• A matlab script is written to compare the effectiveness of the proposed heuristic.

• Correlation Coeeficient = 0.95

• A matlab script is written to compare the effectiveness of the proposed heuristic.

• Correlation Coeeficient = 0.95

Page 16: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 16

Communication btw Processors

Communication btw Processors

• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.

• Then, all processor calculate cii=δi

• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.

• Then, all processor calculate cii=δi

Page 17: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 17

# of Clusters# of Clusters

• Processors exchange local δ

• All processors calculate nc

• Processors exchange local δ

• All processors calculate nc

m

iicn

1

Page 18: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 18

Cluster-head SelectionCluster-head Selection

• Calculate seed power of local documents

• Exchange largest nc seed powers.

• Calculate largest nc seed powers among all Pi and find cluster heads.

• Calculate seed power of local documents

• Exchange largest nc seed powers.

• Calculate largest nc seed powers among all Pi and find cluster heads.

n

j

jjijiii mmdP

1

'1

''

Page 19: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 19

Clustering Non-seed DocsClustering Non-seed Docs

• Exchange seed documents

• Cluster non-seed documents (as in sequential C3M) in each processor.

• Exchange seed documents

• Cluster non-seed documents (as in sequential C3M) in each processor.

Page 20: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 20

Future WorkFuture Work

• Term Based Clustering

• Overlapping Clusters

• Term Based Clustering

• Overlapping Clusters

Page 21: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 21

C3M SummaryC3M Summary• Load Balancing with cyclic block distribution• Communication minimization by a new

heuristic• Task dependency minimized with block

distirbution & heuristic.

• Load Balancing with cyclic block distribution• Communication minimization by a new

heuristic• Task dependency minimized with block

distirbution & heuristic.

n

j

jjijiii mmdP

1

'1

''

Page 22: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 22

ReferencesReferences• Concepts and the effectiveness of the cover

coefficient-based clustering methodology, F. Can, E. A. Ozkarahan

• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder

• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder

• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan

• Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan

• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder

• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder

• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan

Page 23: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 23

Questions?Questions?

Page 24: Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Parallel C3M 24

The EndThe End

Thank you for your patience

Thank you for your patience