Building k-nn Graphs From Large Text Data

download Building k-nn Graphs From Large Text Data

If you can't read please download the document

description

Presentation given at 2014 IEEE International Conference on Big Data, 28 October 2014

Transcript of Building k-nn Graphs From Large Text Data

  • 1. IEEE BigData 2014Building k-nn Graphs FromLarge Text DataThibault Debatty, Pietro Michiardi,Olivier Thonnard & Wim Mees

2. The context : TRIAGEBuilding k-nn Graphs From Large Text Data 2 3. The problemThe subject of a SPAM is more than aset of keywordsRep|icaWatches For Sale: cRolexRep1icaWatches For Sale: R0lexRepilcaWatches For Sale: RolexBuilding k-nn Graphs From Large Text Data 3 4. The problemHow to build a k-nn graph fromlarge text data using usingarbitrary similarity metric? Naive Index Locality-sensitive hashing (LSH) nn-descentBuilding k-nn Graphs From Large Text Data 4 5. NNCTPHMapReduceSPAM 1 SPAM 2CTPH* CTPH*Sig 1 Sig 2nn-descentBuilding k-nn Graphs From Large Text Data 5 6. Experimental results Dataset: 200k to 800k spam subjects Tests: Stages Buckets Comparison with MR nn-descent Scalability Measures: Speed RecallBuilding k-nn Graphs From Large Text Data 6 7. Experimental results : stagesBuilding k-nn Graphs From Large Text Data 7 8. Experimental results : bucketsBuilding k-nn Graphs From Large Text Data 8 9. Experimental results : nn-descentBuilding k-nn Graphs From Large Text Data 9 10. Experimental results : scalabilityBuilding k-nn Graphs From Large Text Data 10 11. Conclusions & future work... 10x faster than MR nn-descent Speedup increases with size of dataset Limited recall Future: Improve recall? Quality of graph? Influence of graph quality? Compare with bag-of-words modelBuilding k-nn Graphs From Large Text Data 11 12. Thank you!Building k-nn Graphs From Large Text Data 12