PhishDef : URL Names Say It All

Post on 19-Feb-2016

37 views 6 download



PhishDef : URL Names Say It All. Michalis Faloutsos U niversity of California, Riverside USA. Anh Le, Athina Markopoulou U niversity of California, Irvine USA. What is Phishing?. Social engineering and technical means to steal consumers’ personal identity, data, etc. - PowerPoint PPT Presentation

Transcript of PhishDef : URL Names Say It All

PhishDef: URL Names Say It All

Anh Le, Athina Markopoulou

University of California, IrvineUSA

Michalis FaloutsosUniversity of California, Riverside


What is Phishing?

Anh Le - UC Irvine - PhishDef 2

• Social engineering and technical means to steal consumers’ personal identity, data, etc.

• Cause billions of dollars of loss annually

Anh Le - UC Irvine - PhishDef 3

Financial, 33.1%

Payment Services,


Classifieds; 6.6%

Auction; 5.5%

Gaming; 4.6%



Social Network-ing; 2.8%

Government; 1.3%

ISP; 1.2% Other; 3.4%

Most Targeted Industry Sectors 2nd Quarter ‘10

Example of a Phishing Site

Anh Le - UC Irvine - PhishDef 4

Current Protection

Anh Le - UC Irvine - PhishDef 5

• Google Safe Browsing

• Microsoft Smart Screen

• Third-Party

Current Protection Model

Anh Le - UC Irvine - PhishDef 6

Motivation: Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing

Google Safe Browsing

Outline o Phishing Background

o Motivation

o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results

o Concluding Remarks

Anh Le - UC Irvine - PhishDef 7

Our Proposed Protection Model

Anh Le - UC Irvine - PhishDef 8

• Main challenges: Accuracy and Classification Latency• Which classification algorithm works best?• Which set of features works best?

Prior Work o Whittaker et al. [NDSS ’10]

o Google Safe Browsing

o Ma et al. [SIGKDD ’09]o Batch-based Classification

o Ma et al. [ICML ‘09]o Batch-based vs. Online Learning

Anh Le - UC Irvine - PhishDef 9

Server-Side Classification

Main Contributions o New Protection Model:

o Client-side classification

o Propose using Adaptive Regularization of Weights (AROW)o High accuracyo Resilient to noise

o Set of Lexical Featureso Fast to extract at client sideo Obfuscation resistant

Anh Le - UC Irvine - PhishDef 10

• Batch-based Support Vector Machine

• Online Perceptron

• Confident Weighted (CW) [Dredze et al., ICML 2008]

• Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009]

Machine Learning Algorithms

Anh Le - UC Irvine - PhishDef 11

Online Classification

Anh Le - UC Irvine - PhishDef 12

• Maintaining a weight vector and use it for classification

• Online Perceptron

Trained Beforehand Extract In Real Time

Client Side:

Server Side:

Online Classification

Anh Le - UC Irvine - PhishDef 13

• Confident Weighted (CW)

• Adaptive Regularization of Weights (AROW)

minimum change

enough to correct last mistake

minimum change

penalty for mistake increasing confidence

o Phishing URLso PhishTank (4,082)o MalwarePatrol (2,001)

o Benign URLso Open directory (4,012)o Yahoo directory (4,143)

o Time period: June 2010


Anh Le - UC Irvine - PhishDef 14

Feature Selection

Anh Le - UC Irvine - PhishDef 15

o Lexical Features

o External Featureso Country, AS number, registration date,

registrant, registrar, etc.

Outlineo Phishing Background

o Motivation

o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results

o Concluding Remarks

Anh Le - UC Irvine - PhishDef 16

Evaluation Results: Lexical vs. Full Features

Lexical features alone are better-suited than full features for client-side phishing classification

Anh Le - UC Irvine - PhishDef 17

(+) ~ 1%

(-) Dependency on Remote Server

(-) Avg. Latency: 1.64 s

Evaluation Results:CW vs. AROW

AROW is more resilient to noise than CW

Anh Le - UC Irvine - PhishDef 18

Conclusion: PhishDef

19Anh Le - UC Irvine - PhishDef

o Client-side phishing classification systemo Proactive, on-the-fly

classification of zero-day phishing URLs

o Low delay client side (ms),high accuracy (97%)

o Resilient to noisy data

o Future Work: o Develop an add-on for Firefox


Anh Le - UC Irvine - PhishDef 20

Anh Le - UC Irvine - PhishDef 21

Example of a Phishing Site

22Anh Le - UC Irvine - PhishDef

Evaluation Results:Batch-Based vs. Online Learning

Online Learning outperforms Batched-Based Learningfor Phishing classificationAnh Le - UC Irvine - PhishDef 23

Chrome 11 > Firefox 4

24Anh Le - UC Irvine - PhishDef