Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale...

21
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor So Yeon Kim, Yenewondim Biadgie, Kyung-Ah Sohn Department of Information and Computer Engineering, Ajou University

Transcript of Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale...

Page 1: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Im-age Descriptor

So Yeon Kim, Yenewondim Biadgie, Kyung-Ah Sohn

Department of Information and Computer Engineering, Ajou Univer-sity

Page 2: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Motivation

Page 3: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Image spams in mobile phone

Image spams are rapidly in-creasing instead of text spam

Image spam detection is also needed in mobile

phone !

Page 4: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Challeges

Page 5: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Data (mobile phone)

66 spam images, 405 non-spam images

training data – 377 images (80%)

test data - 94 images (20%)

Too small dataset

Hard to train model

Page 6: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Data (E-mail)

1. Image Spam hunter (2008) 929 spam, 810 non-spam

2. Dredze et al (2007) 3,299 spam, 2,021 non-spam

3. TREC Spam track (2005)

60,339 spam, 165,954 non-spam TREC 06, TREC 07 are also available.

Would be better touse those huge dataset

Page 7: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

How to use e-mail image data?

They are much different from

phone spam images

Page 8: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

How to use e-mail image data?

But, some images look similar to phone spam

images

Page 9: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

How to use e-mail image data?

Similarity measure

Page 10: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Methods

Page 11: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Data acquisition

Smart phone spam images

E-mail spam images

distanceSimilarity ma-

trix

Phone im

age

Email image

66 spam images

929 spam images

RGB histogram feature vector

Page 12: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Data acquisition

K-meansClustering

Most similaremail images Phone +

EmailSpam Im-

age Dataset

Phone images

Total 419 spam images353 im-ages

Page 13: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Data acquisition

Spam Non-spam

Phone Spam Dataset 66 405

Image Spam Hunter (08) 353 -

Total 419 405

Page 14: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Feature Extraction

Input image PHOW feature extraction K-meansclustering

500 visual word dictionary construc-tion

• Dense grayscale SIFT • Much faster than SIFT

Visual dictionaryVisualword

VLFeat library is used for implementation

Page 15: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Feature Extraction

Spatial histogram

KD-tree vector quantization

Histogram(bag) of visual words

VLFeat library is used for implementation

Page 16: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Image classification

Image descriptor

SVM classification

spam

non-sam

VLFeat library is used for implementation

Page 17: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Evaluation

Spam Non-spam

Phone Spam Dataset 66 405

Image Spam Hunter (08) 353 -

Total 419 405

Training set

Test set

E-mail Phone

5-fold cross validation

Page 18: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Results

  k-means

random (50%)

Accuracy 96.39% 95.12%Sensitiv-

ity 94.07% 89.45%

Speci-ficity 96.79% 96.05%

F-mea-sure 87.94% 83.80%

Page 19: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Results

False positives False negatives

Page 20: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Conclusion

We tried to solve the problem of data acquisition for phone spam image classification using e-mail image dataset.

Using email spam image data gained by similarity measure

is quite effective for phone spam image classification.

If email data size becomes larger, it has many kinds of feature group.

→ a more precise clustering algorithm could be useful

for the future.

Page 21: Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor

Thank you !Thank you !