Optimizing data mining process using graphic processors
-
Upload
gurupad-hegde -
Category
Technology
-
view
895 -
download
2
Transcript of Optimizing data mining process using graphic processors
![Page 1: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/1.jpg)
Optimizing Data Mining Process Using Graphic Processors
![Page 2: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/2.jpg)
![Page 3: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/3.jpg)
MACHINE
LEARNING
DATABASE
SYSTEMS
STATISTICS INFORMATION
SCIENCE
PATTERN
RECOGNITION
DATA
MINING
Data Mining An interdisciplinary field
“Extracting Knowledge from the Data”
![Page 4: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/4.jpg)
CRISP-DM CRoss Industry
Standard Process for Data Mining
http://www.crisp-dm.org/ founded in 1996
SIX Phases
![Page 5: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/5.jpg)
Financial data analysis
Telecommunications
Retail Industry
Healthcare and
biomedical research
Web Data Mining
![Page 6: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/6.jpg)
Scalability
Dimensionality Complex Data Data Quality
Data Ownership
![Page 7: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/7.jpg)
![Page 8: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/8.jpg)
Architecture difference between GPU and CPU • More transistors for data processing • Many-core (hundreds of cores)
![Page 9: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/9.jpg)
General Purpose computation using GPU in applications “other than 3D graphics”
Flexible and programmable it fully supports vectorized floating
point operations at IEEE single precision
additional levels of programmability are emerging with every generation of GPU (about every 18 months)
an attractive platform for general-purpose computation
![Page 10: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/10.jpg)
![Page 11: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/11.jpg)
Thread block “a batch of threads that can cooperate together by efficiently sharing data through some fast shared memory and synchronizing their execution to coordinate memory accesses.”
Example of Block ID: A block (x,y) of a grid of DIM(X,Y) has block ID
(x + y.X)
![Page 12: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/12.jpg)
![Page 13: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/13.jpg)
![Page 14: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/14.jpg)
GPU Miner http://code.google.com/p/gpuminer/
SVM for Estimation of Aqueous Solubility
Data Mining on Cloud (Nov 22nd ‘10)
![Page 15: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/15.jpg)
![Page 16: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/16.jpg)
An itemset is frequent if its
support is not less than a threshold
specified by users
Thresholds: Minimum Confidence (in %): bond between the items of an itemset Minimum Support Count (in Numbers): how many times an itemset occur in the database
![Page 17: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/17.jpg)
“if an itemset is not frequent, any of its superset is never frequent”
An influential algorithm for mining frequent itemsets for association rules.
Proposed by Agrawal & Srikant
@ VLDB’94
![Page 18: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/18.jpg)
![Page 19: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/19.jpg)
No YES
![Page 20: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/20.jpg)
![Page 21: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/21.jpg)
Horizontal data layout
Vertical data layout
Bitmap Representation
![Page 22: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/22.jpg)
![Page 23: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/23.jpg)
Agrawal & Srikant @ VLDB’94
![Page 24: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/24.jpg)
![Page 25: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/25.jpg)
o We have presented a GPU-based implementation of Apriori algorithm for
frequent itemset mining.
o This implementation employs a bitmap data structure to encode the
transaction database on the GPU and utilize the GPU's SIMD parallelism for
support counting.
o Our implementation stores the itemsets in a bitmap, and runs entirely on the
GPU.
![Page 26: Optimizing data mining process using graphic processors](https://reader033.fdocuments.us/reader033/viewer/2022060206/55a23db61a28ab1f6e8b4642/html5/thumbnails/26.jpg)