Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable

31
Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable Gibson Mba* Jeremiah Onaolapo # Gianluca Stringhini # Lorenzo Cavallaro* *Royal Holloway, University of London # University College London WWW2017 CyberSafety Workshop // Perth, Australia // April 4, 2017

Transcript of Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable

Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable

Gibson Mba* Jeremiah Onaolapo#

Gianluca Stringhini# Lorenzo Cavallaro*

*Royal Holloway, University of London#University College London

WWW2017 CyberSafety Workshop // Perth, Australia // April 4, 2017

2

Sounds legit, right?

Source -- http://www.geekculture.com/joyoftech/joyarchives/898.html

419 scams

● Advance Fee Fraud

● “419” derived from Nigeria’s Criminal Law against such scams

● Been around for some time

● Most previous work focus on cybercrime targeting the US, EU, Asia1,2,3

● What about cybercrime targeting Africa?

○ Little attention

○ Hence our study

3

1R. Anderson et al., Measuring the Cost of Cybercrime, WEIS, 2012.2N. Christin et al., Dissecting one click frauds, CCS, 2010.3B. Stone-Gross et al., Your botnet is my botnet: Analysis of a botnet takeover, CCS, 2009.

Contributions

● Highlight a unique form of scam targeting vulnerable Nigerian students,

secondary school leavers, and unemployed persons, among others

● Provide insight into common themes around which fraudsters build their

scam schemes

○ We rely on Machine Learning (ML) techniques to achieve this

○ Themes -- Academic, Employment, Spirituality, Dating, Other

4

Automatic data classification

Dataset description

Ground truth extraction Validation ClusteringOur roadmap

5

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Data sources

Our goal -- Collect and analyze data to understand scam schemes

● Topix.com’s Nigeria forum

● 2005 -- posts on news and current affairs

● 2012 onwards -- scam posts show up and grow

● Sheds light on 419 scams perpetrated against Nigerians

● Hosts posts promoting different types of scam services

○ Also contact information (mostly phone numbers) to reach the fraudsters

6

More data

Supplementary data sourced from

● http://www.adsafrica.com

● http://www.123nigeria.com/

● http://forumng.com/

7

Data collection (Jan. 2012 -- Nov. 2013)

Total posts 711,861

Posts with phone numbers 598,572

Total unique posts 589,956

Total unique authors 37,948

Distinct locations 613

Distinct phone numbers 12,425

8

Example post

9

Growth of posts

Increase● 218 posts in Jan. 2012● 142,344 posts in Sep. 2013

Sharp drop

● After Sep. 2013● Corresponds to the time lecturers

called off six-month strike● Students resort to scams

because they have nothing else to do?

10

11

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Our goal -- Determine if a post is a scam or not

● We selected 663 posts from the total of 711,861 posts

○ Random sampling without replacement

○ Confidence level 95%

○ Error rate 5%

● We augmented the data with additional 372 random non-scam samples

○ To address the “Imbalanced Dataset” problem1

○ I.e., we used the over-sampling approach to balance the dataset

Dataset preparation

121N. V. Chawla et al., SMOTE: Synthetic Minority Over-sampling Technique, JAIR, 2002.

Our goal -- Pick scam/ not-scam posts to build ground truth data for our classifier

● Any post offering to give assistance to any candidate to gain admission into

any institution of learning is a scam

● Any post offering any form of fun services e.g., sex for money is a fraud

● Any post offering any form of assistance for jobs or employment is a scam

● Offers of spiritual/ religious assistance e.g., prayers, illuminati membership,

magical powers, healing are scams

● And more heuristics

Heuristics to build ground truth

13

14

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Automatic data classification

Our goal -- Identify scam posts on the forum

Obstacle -- Too many posts, we can’t identify all crime posts manually

Solution -- Rely on supervised ML techniques

● Binary classification task {is_scam, not_scam}

● Trained Support Vector Machine (SVM1) using ground truth dataset

● Evaluation -- 5-fold cross validation (Accuracy 95.17%)

151T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, ECML, 1998.2G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, IPM Journal, 1988.

TF-IDF2 SVMPostsFeatures

{is_scam, not_scam}

Results of automatic data classification

● Applied SVM model on entire dataset

○ 711,861 minus 1,035 posts used for training

● 679,222 (95.55% of the posts) -- YES (in other words, is_scam)

● 31,604 (4.45% of the posts) -- NO (in other words, not_scam)

● Conclusion -- The forum is a crime hub used by scammers to advertise

schemes to deceive and exploit their victims

16

17

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

5-fold validationMetric SVM

Accuracy 95.17%

Precision 96.54%

Recall 95.80%

Specificity 94.14%

F1 96.16%

Error 4.83%

18

Validation

19

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Automatic data classification

Our goal -- Identify the theme of each scam post

Obstacle -- Too many posts (679,222)

Solution -- Rely on supervised ML techniques

● We manually checked 655 scam posts (training set) and identified five themes

● Multi-class classification task {Academic, Employment, Spirituality, Dating, Other}

20

TF-IDF SVMPostsFeatures

{A, E, S, D, O}

Results of multi-class classification

Class Posts % Scam

Academic 464,069 68.32%

Employment 129,811 19.11%

Other 48,228 7.10%

Dating 22,897 3.37%

Spirituality 14,217 2.09%

Total 679,222 100.00%

● Academic, Employment scams are quite common

● Traceable to dwindling academic performance ○ As reported by examination bodies

● Unemployment is also an issue○ 23.9% as of 2011, according to the

Nigerian Bureau of Statistics

● Themes are very important○ Key contribution

21

22

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Confusion matrix (5-fold validation)

Academic Employment Spirituality Other Dating TotalCorrect

predictions

Academic 413 7 1 9 1 431 95.82%

Employment 2 136 0 2 0 140 97.14%

Spirituality 0 0 16 1 0 17 94.12%

Other 1 5 1 23 5 35 65.71%

Dating 1 0 0 0 31 32 96.88%

Total 417 148 18 35 37 655

23

24

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

Clustering

Goal -- Identify clusters of entities in the dataset, for instance,

groups of related scammers

Why? Could indicate coordination among fraudsters/ existence of criminal gangs

● We selected 16,194 posts from 679,222 crime posts

○ Random sampling without replacement

○ Confidence level 99%

○ Error rate 1%

● Density-Based Spatial Clustering of Applications with Noise (DBSCAN)1

251M. Ester et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, KDD, 1996.

Visualization of clusters

● DBSCAN computed 197 clusters from our data

● We fed some clusters through Gephi1

26

Case studySame topic, multiple phone numbers● Indicates coordination of activities among

scammers

● Could also be because some fraudsters trying to

copy post topics of other scammers

1M. Bastian et al., Gephi: An Open Source Software for Exploring and Manipulating Networks, ICWSM, 2009.

27

Case study -- a cluster of clusters● An elaborate scamming scheme● Emphasis on cluster

○ Single phone number node○ Multiple topics

28

Automatic data classification

Dataset description

Ground truth extraction Validation Clustering

● 2011 -- SIM registration policy by Nigerian Communications Commission (NCC)○ Registration involves recording some personal data and biometric information about subscribers

○ Key objective was to assist law enforcement agencies during the criminal investigations

● Did SIM registration help to reduce cybercrime on the forum?○ Posts containing phone numbers actually increased after SIM registration was introduced

○ Overall number of posts on the forum also increased (i.e., posting activity increased)

● Did SIM registration encourage the growth of criminal activity on the forum?○ No. The absence of cybercrime law until 2014, and weak investigation/ prosecution capabilities on the part of

law enforcement agencies are more likely reasons

○ Telecommunication firms were also not totally compliant with the SIM registration policy

SIM card registration: A countermeasure?

29

Takeaways

● Despite the massive coverage of 419 scams, some types are still understudied

● We highlight a unique form of scam targeting specific Nigerian demographics

● Law enforcement agencies may find the cluster analysis approach useful

○ To identify and takedown key nodes in sophisticated scam schemes

● The SIM card registration policy is not sufficient in tackling online scams

involving phone numbers

● Future work could involve studying whether certain demographics are more

susceptible to these types of scams we highlighted

30

Questions?

Call for papers

Submission link: https://scienceinpublic.org/science-in-public-2017/

Panel: Phishing and Pharming Passwords: What Are the Real World Effects of Information Theft on People?

Format: Short "paper proposals" (Word document, 300 words maximum)

Venue: Sheffield, UK

Submission deadline: April 18, 2017

Thanks!

31

Contact info

Email: j.onaolapo [*AT*] cs.ucl.ac.uk

Twitter: @jerryola