Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...
-
Upload
reyna-thornock -
Category
Documents
-
view
213 -
download
1
Transcript of Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...
![Page 1: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/1.jpg)
Parallel Apriori Algorithm Using MPI
Congressional Voting Records
Çankaya University
Computer Engineering Department
Ahmet Artu YILDIRIM
January 2010
![Page 2: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/2.jpg)
Efficient Association Rules Mining Using MPI
Overview
• Apriori algorithm used for discovery of association rules
• Computation time is the major issue if dataset is pretty large
• The aim is to increase efficiency of mining process in running time manner utilizing computers for parallel computation
![Page 3: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/3.jpg)
Efficient Association Rules Mining Using MPI
Apriori Algorithm (Example)
• Confidence({5}→{2,3})=Prob({2,3,5}/{5})=2/3=0.66
• Min support=50%
• Min support count=0.5x4 = 2
• Min confidence = 0.50
![Page 4: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/4.jpg)
Efficient Association Rules Mining Using MPI
Technology and Methodology• Platform: GNU/Linux 2.6.20.7 i386
Programming language: ISO C99 language Cross platform APIs: MPICH API for MPI implementation and Glib API utility library Compiler suite: GNU toolchain
• Division Methodology:
1. Dataset division
2. Large frequent itemset division
• Dataset division methodology used
![Page 5: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/5.jpg)
Efficient Association Rules Mining Using MPI
Data Division (Merging Local Support)
![Page 6: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/6.jpg)
Efficient Association Rules Mining Using MPI
Parallel Apriori Algorithm Flowchart
![Page 7: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/7.jpg)
Efficient Association Rules Mining Using MPI
Dataset
• 1984 United States congressional voting records
• Attribute Information: Democrat, republican, handicapped infants yes-no, water project cost sharing yes-no, adoption of the budget resolution yes-no, physician fee freeze yes-no, el salvador aid yes-no, religious groups in schools yes-no, aid to nicaraguan contras yes-no, mx-missile yes-no, immigration yes-no, synfuels corporation cutback yes-no, education spending yes-no, superfund right to sue yes-no, crime yes-no, duty free exports yes-no, export admin act south africa yes-no
![Page 8: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/8.jpg)
Efficient Association Rules Mining Using MPI
Preprocessing of Dataset
• Data transformation applied before processing
• Attributes numbered such as democrat = 1, republican = 2, handicapped infants yes = 3, handicapped infants no = 4, water project cost sharing yes = 5 …
![Page 9: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/9.jpg)
Efficient Association Rules Mining Using MPI
Config File and Run CommandConfig File:
attributecount=34
transactioncount=435
minsupportpercent=50
minconfidencepercent=80
Command:
mpirun -np x -machinefile machines ./aprioriparallel
![Page 10: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/10.jpg)
Efficient Association Rules Mining Using MPI
Program Output
![Page 11: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/11.jpg)
Efficient Association Rules Mining Using MPI
Rules
Rules according to confidence threshold level 80%:
• Democrats support
• Adoption of the budget resolution
• Aid to Nicaraguan contras
• Democrats do NOT support
• Physician fee freeze
![Page 12: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/12.jpg)
Efficient Association Rules Mining Using MPI
Rules (cont.)
Rules according to confidence threshold level 80%:
• Those who do not support physician fee freeze, support adoption of the budget resolution
• Those who support adoption of the budget resolution also do not support physician fee freeze
![Page 13: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/13.jpg)
Efficient Association Rules Mining Using MPI
Parallel Computation Speed Up
• Run on Çankaya University wee cluster
• Processor Specs: 600 MHz CPU, 250 Mb Ram
• Speed up = ts / tp
![Page 14: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/14.jpg)
Efficient Association Rules Mining Using MPI
Conclusion
• Parallel version of Apriori algorithm is efficient in running time manner with large datasets
• Scalability gained via adding additional nodes (computers) or memory without modification of code
• High price-performance ratio by utilizing less powerful computers
![Page 15: Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer Engineering Department Ahmet Artu YILDIRIM January 2010.](https://reader035.fdocuments.us/reader035/viewer/2022081520/56649ca75503460f94969c9a/html5/thumbnails/15.jpg)
Thank You
Questions?