Building k-nn Graphs From Large Text Data
-
Upload
thibault-debatty -
Category
Data & Analytics
-
view
93 -
download
1
description
Transcript of Building k-nn Graphs From Large Text Data
- 1. IEEE BigData 2014Building k-nn Graphs FromLarge Text DataThibault Debatty, Pietro Michiardi,Olivier Thonnard & Wim Mees
2. The context : TRIAGEBuilding k-nn Graphs From Large Text Data 2 3. The problemThe subject of a SPAM is more than aset of keywordsRep|icaWatches For Sale: cRolexRep1icaWatches For Sale: R0lexRepilcaWatches For Sale: RolexBuilding k-nn Graphs From Large Text Data 3 4. The problemHow to build a k-nn graph fromlarge text data using usingarbitrary similarity metric? Naive Index Locality-sensitive hashing (LSH) nn-descentBuilding k-nn Graphs From Large Text Data 4 5. NNCTPHMapReduceSPAM 1 SPAM 2CTPH* CTPH*Sig 1 Sig 2nn-descentBuilding k-nn Graphs From Large Text Data 5 6. Experimental results Dataset: 200k to 800k spam subjects Tests: Stages Buckets Comparison with MR nn-descent Scalability Measures: Speed RecallBuilding k-nn Graphs From Large Text Data 6 7. Experimental results : stagesBuilding k-nn Graphs From Large Text Data 7 8. Experimental results : bucketsBuilding k-nn Graphs From Large Text Data 8 9. Experimental results : nn-descentBuilding k-nn Graphs From Large Text Data 9 10. Experimental results : scalabilityBuilding k-nn Graphs From Large Text Data 10 11. Conclusions & future work... 10x faster than MR nn-descent Speedup increases with size of dataset Limited recall Future: Improve recall? Quality of graph? Influence of graph quality? Compare with bag-of-words modelBuilding k-nn Graphs From Large Text Data 11 12. Thank you!Building k-nn Graphs From Large Text Data 12