Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...

15
Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010

Transcript of Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...

Page 1: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Parallel Apriori Algorithm Using MPI

Congressional Voting Records

Çankaya University

Computer Engineering Department

Ahmet Artu YILDIRIM

January 2010

Page 2: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Overview

• Apriori algorithm used for discovery of association rules

• Computation time is the major issue if dataset is pretty large

• The aim is to increase efficiency of mining process in running time manner utilizing computers for parallel computation

Page 3: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Apriori Algorithm (Example)

• Confidence({5}→{2,3})=Prob({2,3,5}/{5})=2/3=0.66

• Min support=50%

• Min support count=0.5x4 = 2

• Min confidence = 0.50

Page 4: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Technology and Methodology• Platform: GNU/Linux 2.6.20.7 i386

Programming language: ISO C99 language Cross platform APIs: MPICH API for MPI implementation and Glib API utility library Compiler suite: GNU toolchain

• Division Methodology:

1. Dataset division

2. Large frequent itemset division

• Dataset division methodology used

Page 5: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Data Division (Merging Local Support)

Page 6: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Parallel Apriori Algorithm Flowchart

Page 7: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Dataset

• 1984 United States congressional voting records

• Attribute Information: Democrat, republican, handicapped infants yes-no, water project cost sharing yes-no, adoption of the budget resolution yes-no, physician fee freeze yes-no, el salvador aid yes-no, religious groups in schools yes-no, aid to nicaraguan contras yes-no, mx-missile yes-no, immigration yes-no, synfuels corporation cutback yes-no, education spending yes-no, superfund right to sue yes-no, crime yes-no, duty free exports yes-no, export admin act south africa yes-no

Page 8: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Preprocessing of Dataset

• Data transformation applied before processing

• Attributes numbered such as democrat = 1, republican = 2, handicapped infants yes = 3, handicapped infants no = 4, water project cost sharing yes = 5 …

Page 9: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Config File and Run CommandConfig File:

attributecount=34

transactioncount=435

minsupportpercent=50

minconfidencepercent=80

Command:

mpirun -np x -machinefile machines ./aprioriparallel

Page 10: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Program Output

Page 11: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Rules

Rules according to confidence threshold level 80%:

• Democrats support

• Adoption of the budget resolution

• Aid to Nicaraguan contras

• Democrats do NOT support

• Physician fee freeze

Page 12: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Rules (cont.)

Rules according to confidence threshold level 80%:

• Those who do not support physician fee freeze, support adoption of the budget resolution

• Those who support adoption of the budget resolution also do not support physician fee freeze

Page 13: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Parallel Computation Speed Up

• Run on Çankaya University wee cluster

• Processor Specs: 600 MHz CPU, 250 Mb Ram

• Speed up = ts / tp

Page 14: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Efficient Association Rules Mining Using MPI

Conclusion

• Parallel version of Apriori algorithm is efficient in running time manner with large datasets

• Scalability gained via adding additional nodes (computers) or memory without modification of code

• High price-performance ratio by utilizing less powerful computers

Page 15: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.

Thank You

Questions?