HoneySpam 2.0Profiling Web Spambot Behaviour
Pedram HayatiKevin ChaiVidyasagar
PotdarAlex Talevsky
Prof. Tharam DillonProf. Elizabeth Chang
Digital Ecosystem and Business Intelligence Institute (DEBII)
2
Agenda
Agenda
• Introduction• Background
– Taxonomy of Spam 2.0 and Web Spambot– Current Literature Techniques
• HoneySpam 2.0 Architecture– Navigation Component– Form Tracking Component– Deploying HoneySpam 2.0
• Experimental Results• Related Works• Conclusion and future works
4
Web Spambot
• A kind of Web Robot or Internet Robot
• Distribute Spam content in Web 2.0 applications
• Scope– Application-Specific– Website-Specific
5
Countermeasures
• CAPTCHA • HashCash• Form variation• Nonce
1. Decrease user convenience and increase complexity of human computer interaction.
2. As programs become better at deciphering CAPTCHA, the image may become difficult for humans to decipher.
3. As computers get more powerful, they will be able to decipher CAPTCHA better than humans.
Web 2.0 Submission Workflow
6
HoneySpam 2.0
• Monitor and Track Web Spambots
• Idea of Honeypots
• Implicitly Track– Click-steam– Page navigation– Keyboard activity– Mouse movement– Page Scrolling
9
HoneySpam 2.0 in Action!
0
50
100
150
200
250
300
350
400
450
Hits
No
. o
f S
pam
bo
ts
No. of Posts vs. DateNo. of Users vs. DateNo. of Online Users vs. Date
No. of SpamBot vs. Hits
10
HoneySpam 2.0 in Action!
0
200
400
600
800
1000
1200
1400
1600
1800
Visit time
No
. o
f S
essio
ns
0
100
200
300
400
500
600
1 5 10 15 20 25 30 35 40 45 50 55
Return Visit
No
. o
f S
pam
bo
ts
No. Session vs. Dwell Visit Time
No. of Spambots Vs. Return Visits
11
Web Spambot Behaviour
• Use of search engines to find target websites
• Create numerous user accounts• Low website webpage hits and revisit
rates• Distribute spam content in a short
period of time• No web form interaction• Generated usernames
12
Conclusion
• HoneySpam 2.0 as framework to monitor/track Web spambot behaviour
• Integrated to popular open source web applications
• Web Spambots– use search engines to find target websites, – create numerous user accounts, – distribute spam content in a short amount of time, – do not revisit the website, – do not interact with forms on the website, – and register with randomly generated usernames
Future Work: Using of Machine Learning, Neural Network (SOM), extract features to do the classification
Top Related