Rule-Based On-the-fly Web Spambot Detection Using Action Strings

21
RULE-BASED ON-THE-FLY WEB SPAMBOT DETECTION USING ACTION STRINGS Pedram Hayati ([email protected]) Vidyasagar Potdar, Alex Talevski & William F. Smyth

Transcript of Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Page 1: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

RULE-BASED ON-THE-FLY WEB SPAMBOT DETECTION USINGACTION STRINGSPedram Hayati ([email protected])Vidyasagar Potdar, Alex Talevski & William F. Smyth

Page 2: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

What is Spam 2.02

Propagation of unsolicited, anonymous, mass content to infiltrate legitimate web 2.0 applications”

www.antispamresearchlab.com

Page 3: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

How does Spam 2.0 work

www.antispamresearchlab.com

3

Web Spambots (Spambot) A web crawler/tool that navigates the WWW with the

sole purpose of planting unsolicited content on external web 2.0 applications

Page 4: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

How is Spam 2.0 currently managed

www.antispamresearchlab.com

4

Flood control, Nonce, Hash-Cash, Email validation Completely Automated Public Turing test to tell

Computers and Human Apart (CAPTCHA)

Page 5: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Problem

www.antispamresearchlab.com

5

CAPTCHA Decreases human users’ convenience Computers are getting more powerful to decipher it.

Content-Based solutions (Option Spam, Social Spam, Video Spam etc.) Focussed on one particular form of spam Do not come with satisfactory results.

Page 6: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution Idea

www.antispamresearchlab.com

6

Main assumption: human web usage behaviour is intrinsically different from spambot behaviour.

Web usage data User click-stream Widely used Two additional attributes

Session ID Username

Page 7: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution: Action

www.antispamresearchlab.com

7

Action Model web usage data into a behavioural model Set of user efforts to achieve certain purposes Suitable discriminative feature to model user behaviour Extendible to many other Web 2.0 platforms

Example Register a new user account action

1. User navigate to registration page2. User fill up registration form fields3. User click on submit button

Page 8: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution: Action String

www.antispamresearchlab.com

8

Actions String Sequence of action in alphabetical format

Page 9: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution: Trie

www.antispamresearchlab.com

9

A way to store and retrieve information Ease of updating and handling Shorter access time Removing redundancies form of a tree structure.

We construct actions strings using Trie data structure fast on-the-fly pattern matching

Page 10: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution: Framework

www.antispamresearchlab.com

10

Page 11: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Solution: Framework

www.antispamresearchlab.com

11

Page 12: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Performance Measurement

www.antispamresearchlab.com

12

Matthews Correlation Coefficient (MCC) Best performance measurement methods of binary

classifications Considers true and false positives and returns a

value between -1 and +1.

Page 13: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Experiment

www.antispamresearchlab.com

13

Data Set No publicly available collection Spambot data from our HoneySpam 2.0 project Human data from an active forum 16594 entries

11039 spambots records 5555 human records

Test Five random datasets (DS1 to DS5) 2/3 for building up Trie structure 1/3 for test

Page 14: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Experiment: On-The-Fly Detection

www.antispamresearchlab.com

14

Simulate real world practices where user action strings grow over the time

System creates action strings as they happen. Make a window over test action strings

Run our classifier Increase the window’s size

Aim: identify spambot in the least amount of actions

A B C D E F G A B C D E F G

Page 15: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Experiment: Results

www.antispamresearchlab.com

15

Window size ranges from 2 to 10 characters Threshold from -0.05 to 0.05

Page 16: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Experiment: Results

www.antispamresearchlab.com

16

Page 17: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Experiment: Discussion

www.antispamresearchlab.com

17

System can predict better as user uses the system over time.

Performance remains the same after some windows size Datasets are randomly selected Same happens for accuracy of results

Page 18: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Conclusion

www.antispamresearchlab.com

18

Quite young area of research. Current work Focussed on one particular type of

spam. Our aim: detect web spambots as a source of spam

problems on the Web 2.0 platform. Based on web usage behaviour Formulated into Actions => Action String On-the-fly detection: using Trie

Result: average accuracy of 93%

Page 19: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

THANK YOU!

Pedram Hayati ([email protected])Vidyasagar Potdar ([email protected])Alex Talevski ([email protected])William F. Smyth ([email protected])

http://www.antispamresearchlab.comhttp://debii.curtin.edu.auhttp://www.curtin.edu.au

Page 20: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Appendix: Related Works

www.antispamresearchlab.com

20

Tan et al.: web robot navigational patterns such as session length and set of visited webpages is different from those of humans.

Park et al.: malicious web robot detection based on types of requests for web objects and existence of mouse/keyboard activity

Göbel et al. : interaction with spam botnet controllers. Yu et al. and Yiqun et al. : categorise spam webpages

from legitimate webpages by employing user web access logs

Page 21: Rule-Based On-the-fly Web Spambot Detection Using Action Strings

Appendix: Future Works

www.antispamresearchlab.com

21

Compare different performance measurement techniques.

Develop adaptive solution Experiment on different platforms (e.g. datasets)