SELECTION OF THE PROPER COMPACT COMPOSITE DESCRIPTOR FOR IMPROVING CONTENT BASED IMAGE RETRIEVAL
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale...
-
Upload
so-yeon-kim -
Category
Data & Analytics
-
view
58 -
download
2
Transcript of Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Im-age Descriptor
So Yeon Kim, Yenewondim Biadgie, Kyung-Ah Sohn
Department of Information and Computer Engineering, Ajou Univer-sity
Motivation
Image spams in mobile phone
Image spams are rapidly in-creasing instead of text spam
Image spam detection is also needed in mobile
phone !
Challeges
Data (mobile phone)
66 spam images, 405 non-spam images
training data – 377 images (80%)
test data - 94 images (20%)
Too small dataset
Hard to train model
Data (E-mail)
1. Image Spam hunter (2008) 929 spam, 810 non-spam
2. Dredze et al (2007) 3,299 spam, 2,021 non-spam
3. TREC Spam track (2005)
60,339 spam, 165,954 non-spam TREC 06, TREC 07 are also available.
Would be better touse those huge dataset
How to use e-mail image data?
They are much different from
phone spam images
How to use e-mail image data?
But, some images look similar to phone spam
images
How to use e-mail image data?
Similarity measure
Methods
Data acquisition
Smart phone spam images
E-mail spam images
distanceSimilarity ma-
trix
Phone im
age
Email image
66 spam images
929 spam images
…
RGB histogram feature vector
…
Data acquisition
K-meansClustering
Most similaremail images Phone +
EmailSpam Im-
age Dataset
Phone images
Total 419 spam images353 im-ages
Data acquisition
Spam Non-spam
Phone Spam Dataset 66 405
Image Spam Hunter (08) 353 -
Total 419 405
Feature Extraction
Input image PHOW feature extraction K-meansclustering
500 visual word dictionary construc-tion
• Dense grayscale SIFT • Much faster than SIFT
Visual dictionaryVisualword
VLFeat library is used for implementation
Feature Extraction
Spatial histogram
KD-tree vector quantization
Histogram(bag) of visual words
VLFeat library is used for implementation
Image classification
Image descriptor
SVM classification
spam
non-sam
VLFeat library is used for implementation
Evaluation
Spam Non-spam
Phone Spam Dataset 66 405
Image Spam Hunter (08) 353 -
Total 419 405
Training set
Test set
E-mail Phone
5-fold cross validation
Results
k-means
random (50%)
Accuracy 96.39% 95.12%Sensitiv-
ity 94.07% 89.45%
Speci-ficity 96.79% 96.05%
F-mea-sure 87.94% 83.80%
Results
False positives False negatives
Conclusion
We tried to solve the problem of data acquisition for phone spam image classification using e-mail image dataset.
Using email spam image data gained by similarity measure
is quite effective for phone spam image classification.
If email data size becomes larger, it has many kinds of feature group.
→ a more precise clustering algorithm could be useful
for the future.
Thank you !Thank you !