Rule-Based On-the-fly Web Spambot Detection Using Action Strings
-
Upload
pedram-hayati -
Category
Education
-
view
1.098 -
download
2
Transcript of Rule-Based On-the-fly Web Spambot Detection Using Action Strings
RULE-BASED ON-THE-FLY WEB SPAMBOT DETECTION USINGACTION STRINGSPedram Hayati ([email protected])Vidyasagar Potdar, Alex Talevski & William F. Smyth
What is Spam 2.02
Propagation of unsolicited, anonymous, mass content to infiltrate legitimate web 2.0 applications”
www.antispamresearchlab.com
How does Spam 2.0 work
www.antispamresearchlab.com
3
Web Spambots (Spambot) A web crawler/tool that navigates the WWW with the
sole purpose of planting unsolicited content on external web 2.0 applications
How is Spam 2.0 currently managed
www.antispamresearchlab.com
4
Flood control, Nonce, Hash-Cash, Email validation Completely Automated Public Turing test to tell
Computers and Human Apart (CAPTCHA)
Problem
www.antispamresearchlab.com
5
CAPTCHA Decreases human users’ convenience Computers are getting more powerful to decipher it.
Content-Based solutions (Option Spam, Social Spam, Video Spam etc.) Focussed on one particular form of spam Do not come with satisfactory results.
Solution Idea
www.antispamresearchlab.com
6
Main assumption: human web usage behaviour is intrinsically different from spambot behaviour.
Web usage data User click-stream Widely used Two additional attributes
Session ID Username
Solution: Action
www.antispamresearchlab.com
7
Action Model web usage data into a behavioural model Set of user efforts to achieve certain purposes Suitable discriminative feature to model user behaviour Extendible to many other Web 2.0 platforms
Example Register a new user account action
1. User navigate to registration page2. User fill up registration form fields3. User click on submit button
Solution: Action String
www.antispamresearchlab.com
8
Actions String Sequence of action in alphabetical format
Solution: Trie
www.antispamresearchlab.com
9
A way to store and retrieve information Ease of updating and handling Shorter access time Removing redundancies form of a tree structure.
We construct actions strings using Trie data structure fast on-the-fly pattern matching
Solution: Framework
www.antispamresearchlab.com
10
Solution: Framework
www.antispamresearchlab.com
11
Performance Measurement
www.antispamresearchlab.com
12
Matthews Correlation Coefficient (MCC) Best performance measurement methods of binary
classifications Considers true and false positives and returns a
value between -1 and +1.
Experiment
www.antispamresearchlab.com
13
Data Set No publicly available collection Spambot data from our HoneySpam 2.0 project Human data from an active forum 16594 entries
11039 spambots records 5555 human records
Test Five random datasets (DS1 to DS5) 2/3 for building up Trie structure 1/3 for test
Experiment: On-The-Fly Detection
www.antispamresearchlab.com
14
Simulate real world practices where user action strings grow over the time
System creates action strings as they happen. Make a window over test action strings
Run our classifier Increase the window’s size
Aim: identify spambot in the least amount of actions
A B C D E F G A B C D E F G
Experiment: Results
www.antispamresearchlab.com
15
Window size ranges from 2 to 10 characters Threshold from -0.05 to 0.05
Experiment: Results
www.antispamresearchlab.com
16
Experiment: Discussion
www.antispamresearchlab.com
17
System can predict better as user uses the system over time.
Performance remains the same after some windows size Datasets are randomly selected Same happens for accuracy of results
Conclusion
www.antispamresearchlab.com
18
Quite young area of research. Current work Focussed on one particular type of
spam. Our aim: detect web spambots as a source of spam
problems on the Web 2.0 platform. Based on web usage behaviour Formulated into Actions => Action String On-the-fly detection: using Trie
Result: average accuracy of 93%
THANK YOU!
Pedram Hayati ([email protected])Vidyasagar Potdar ([email protected])Alex Talevski ([email protected])William F. Smyth ([email protected])
http://www.antispamresearchlab.comhttp://debii.curtin.edu.auhttp://www.curtin.edu.au
Appendix: Related Works
www.antispamresearchlab.com
20
Tan et al.: web robot navigational patterns such as session length and set of visited webpages is different from those of humans.
Park et al.: malicious web robot detection based on types of requests for web objects and existence of mouse/keyboard activity
Göbel et al. : interaction with spam botnet controllers. Yu et al. and Yiqun et al. : categorise spam webpages
from legitimate webpages by employing user web access logs
Appendix: Future Works
www.antispamresearchlab.com
21
Compare different performance measurement techniques.
Develop adaptive solution Experiment on different platforms (e.g. datasets)