McAfee/RIM Confidential
Exploring in the Wild:
A Big Data Approach to Application Security
Research (and Exploit Detection)
Haifei Li, [email protected]
Chong Xu, [email protected]
CanSecWest, March 2014, Vancouver
About Us: Haifei
• Security Researcher at McAfee Labs • Previously: Microsoft, Fortinet
• Work on two questions:
1) How to find vulnerabilities? 2) How to exploit them? At McAfee my interests have been extended to a third:
3) How to detect the effect by answering the first and second?
Let’s help the world!
• Presented at BlackHat Europe 2010, CanSecWest 2011, REcon 2012, Syscan360 2012
• Living around here.
About Us: Chong
• Ph.D. from Duke University
• Director @ McAfee Labs IPS team
• Focus: • Advanced (0-day) exploit and malware defense • APT detection • Computer networking • Network and host security
• NIPS, HIPS, NGFW
Agenda
Exploit Detection and CVE-2013-3906
The Idea
3
2
4
5 Summary
The Situation We Are In 1
Exploring in the Wild
Some Background
• This is a practice we have been working on for more than one year
• Not a particularly technical deep drive • But the idea behind that is cool • And, most important, of long-term benefit • Extend our view on applications and threat
landscape
The Situation We Are In • In essence, security research is:
• Understanding “how it works” • Not just finding bugs or exploiting bugs
• The most popular client-side apps today
The Situation We Are In • Applications have become so “rich”
• So many features, so many “things” probably no one knows yet
• Interoperability: • IE can run Flash, Java, Reader, Office code • Chrome can run Flash, Java, Reader code • Office can run Flash, perhaps Java • Some OS-delivered features can also be
triggered by applications (e.g., .NET, DirectX) • …
• This is a complex “system” rather than a 90s server-side application
Vendors vs. Researchers
• Apps are developed by software giants • Microsoft, Google, Oracle, Adobe • Thousands of engineers fully committed to
deliver new features day by day • And don’t forget: Most of the apps are closed-
source
• What do security researchers have? • A small, inattentive community • Researchers in different orgs, different purposes,
different interests • Most of us also have daily, nonresearch duties
Usual Research Approaches • Usually, a nice vulnerability or an innovative exploitation
technology is inspired by: • A crash found by fuzzing
• A large number of methodologies/frameworks discussed in recent years, too many to name
• Digging into new features (brings new vulns/ideas) • Flash JIT =>
• JIT Spraying bypasses ASLR+DEP [1] • Custom heap managements on Reader and Flash =>
• reliable heap-spray & the Flash “Vector” exploitation [2] • HTML 5 =>
• Canvas object “ImageData” heap-spray [3] • PDF XFA feature =>
• the PDF 0day (CVE-2013-0640) [4] • Office ActiveX interoperability =>
• non-scriptable heap-spray in Office (CVE-2013-3906) [5]
• But it won’t help a lot on understanding the application • Looks only into a point, not a surface
from the application level • Need to provide “template” that
triggers the features first • Can’t find interoperability-related,
logic, info leak, bugs, etc.
• Currently, recognizing “new features” is done manually or randomly
The Limitations • Fuzzing is cool
• Maximize testing the code for specific features
Application
Fuzzing
Adam: “Hey dude, I just found something weird when surfing on YouTube.” Bob: “I smell fresh, let’s go deep.”
A Conclusion
• Researchers find it really hard to catch all the “features” delivered by vendors
• Also for application behaviors
• Researchers need an interesting entry point so they can perform future research • An entry point could be anything, not just a crash
• The question: How to get those “entry points”?
Agenda
Exploit Detection and CVE-2013-3906
The Idea
3
2
4
5 Summary
The Situation We Are In 1
Exploring in the Wild
Our Approach • At McAfee Labs, we own hundreds of millions of
samples • “Unlimited” PDF, Flash, Java, Office, HTML (URL)
samples • New ones arrive every minute
• So, we thought we might be able to leverage the huge number of samples for something new • For industry interests, we need to know if there
are “unclassified” exploits or, even worse, zero-day exploits.
• For research, we can leverage the resource to better understand “how it works”
A Simple Idea • We "execute" every sample in a single environment
• Sandboxing • By appropriate applications
• We record basically everything during the execution of the sample • File Access • Registry Access • Process Activities • Network Activities • Process Memory Status • And more
A “Big Data” Plan
• We “sign” all the information we collect for a single sample for a single environment • We call it “DNA”
• We store those DNAs in our DNA database
• Rather than drop it after execution
• We “data mine” in the DNA
database • Usually “DNA comparison”
“DNA” Comparison
• We compare the similarity of the DNAs
• Fact: Most samples we test are normal
• Example: • 1 million samples have behavior A • A few samples don’t OR • 1 million samples do NOT have behavior B • A few samples do
• So, the unusual ones attract some “interest”
The Goal: Finding the Interest
• This is the great mission!
• We collect DNAs
• We compare the DNAs
• We get something interesting (unusual)
• We analyze/research the sample
• We learn things and find new stuff! • We use the knowledge to improve our “real-time”
rules for zero-day detections! (Will discuss later)
• A learning-then-improving cycle
Case Study 1
• We don’t know whether loading C:\Windows\System32\MSCOMCTL.OCX into an Office process is malicious or interesting.
• But we found that only a few samples made that happen. • ~100 in a half-million Office samples
• Manual research showed all the samples are malicious exploits! • CVE-2012-0158 • CVE-2012-1856 • CVE-2013-3906 (the TIFF zero day we discovered)
• What we learned: If we see MSCOMCTL.OCX is loaded into an Office process, it’s likely an exploit.
Case Study 2 • Assume we find only a few HTML/URL samples triggering IE
process to access unusual location: C:\Windows\AppPatch\EMET.DLL
• While most others didn’t
• Prehacking trick to check if Microsoft EMET is installed, in the IE10 CVE-2014-0322 exploit (credit: FireEye*)
*http://www.fireeye.com/blog/technical/cyber-exploits/2014/02/operation-snowman-deputydog-actor-compromises-us-veterans-of-foreign-wars-website.html
Case Study 2 (cont.)
• What we learned: • Even if a component of the ITW exploit is missing
(say, the SWF in the CVE-2014-0322 exploit), by strictly examining (comparing) the behaviors we can still find the point, which may lead to the whole exploit discovery.
• The author believes this trick should be considered as a security vulnerability, though it’s currently not. • It allows bad guys to check the existence of a local file
from the Internet, not just for EMET dll, but also AV products, as well as detecting a VM environment
• It works on almost every IE, including IE11.
The Benefit of DNA Comparison
• We don’t have to know which application behavior is suspicious, malicious, or even interesting • We just need to find the unusual
(interesting) ones
• We can find most hidden exploits because we can find out unknown malicious behaviors through this approach
Agenda
Exploit Detection and CVE-2013-3906
The Idea
3
2
4
5 Summary
The Situation We Are In 1
Exploring in the Wild
Exploit Detection Isn’t Hard
• Zero-day detection has become a hot topic in the industry
• However, it’s never been a “technology problem” • How hard is it to build a VM and hook
“CreateProcess”?
• Exploit behaviors, including post-exploitation behaviors, are usually very clear • Not much an exploit can do compared with malware,
especially on modern systems • Little room during/after bypassing all mitigations • Having not seen one in the wild is a real challenge
The Devil Is in the Details
• Lack of sample source (no sample)
• Poor management (have sample, didn’t test)
• Not the right environment (will discuss later)
• VM-detecting exploit • This would be cool, but we haven’t seen one in
the real world
• Our prediction for 2014
• It’s more like an intelligence or management problem!
The Devil Is in the Details (cont.)
• “Watering hole” attacks, or many online attacks, are targeting one or more specific environments • Running Win7? I exploit only XP (so many to name) • Running Office 2010 on Win7? I exploit Office 2007/2010
on XP (CVE-2013-3906) • Running IE8 and IE11? I exploit only IE10 (CVE-2014-
0322) • Using English? Sorry I work only on CN/JP/KR markets • Running latest Flash? No, I exploit only old versions, even
as a zero day! (CVE-2014-0497) [6] • …
• This is the most challenging one
Possible Mitigations • Run as many environment as possible
• Good: easy to do • Bad: applies only to Lab projects; will still miss some
because you can’t install every version
• Hooking on the “version-checking” code • Good: powerful • Bad: deep research (RE) required, not easy to do
• Static and Sandboxing • Static-scanning can find those “suspicious” ones • Mark them, and do deep multienvironment tests • I guess this would be the most practical way
• Our DNA-comparison approach can help with this!
Our System & Zero-Day Detection
• Initially, we insert a set of “strict” rules in the middle to detect zero-day exploits • PE dropped? Something downloaded? Process
created? And More • As the system helps us understand application
behaviors at a deep level, we continue to add and improve the rules
• A “learning-then-improving” cycle
Our System & Zero-Day Detection
Rules for exploit detection
Sample Feeder DNA Database
Core
Learn and improve
What Happened on Oct. 31?
• Halloween :P
• Our beta-running project for Office documents came online just the previous night • Got lucky?
• The suspicious sample we detected: • MD5: 1FD4F3F063D641F84C5776C2C15E4621
• Strong malicious behaviors: • http://flatnet.com/bruce/winword.exe • C:\Documents and Settings\<username>\Local
Settings\Temp\winword.exe
The Timeline • 09:21 AM: Detected the issue • 11:06 AM: I noted the log, started manual tests
• Beta running, no alerting component :p
• 12:44 PM: Call for researchers to confirm and analyze • Kudos to Bing Sun, Chong Xu, Xiaoning Li, Lijun Chen, and Vinay Karecha
• 01:20 PM: Reported to MSRC • Might not a bad idea to let the vendor know early
• Next days: Phone calls from MSRC, internal cooperation,
email exchanges, conf calls, etc. • Nov 5: Coordinated announcements
• http://blogs.mcafee.com/mcafee-labs/mcafee-labs-detects-zero-day-exploit-targeting-microsoft-office-2 • http://technet.microsoft.com/en-us/security/advisory/2896666
• Nov 6: Follow-up technical post discussing the DEP mystery in Office
• http://blogs.mcafee.com/mcafee-labs/solving-the-mystery-of-the-office-zero-day-exploit-and-dep
Some Points: New Heap-Spray • First exploit seen using ActiveX in Office to do heap-spray
• Non-scriptable, Office security mitigations won’t block • Embedded Control Persistence Binary Data
• Other methods such as leveraging Flash ActionScript is already blocked, e.g. “Flash Click-to-Play”
• http://blogs.adobe.com/security/2013/02/click-to-play-for-office-is-here.html
Some Points: OpenXML != Safe
• First exploit seen organized in Office OpenXML format (.docx)
• Even though this is a TIFF-parsing vulnerability
• Seen a lot doc/ppt/xls/rtf exploits, but no docx/pptx/xlsx before
• Previously, OpenXML format was usually considered “safe”
Some Points: DEP Mystery • After our announcement of the discovery, several parties
analyzed the threat as well, but have reported inconsistent DEP/ROP results • Many claimed it uses ROP gadgets to bypass DEP
• While our analysis (on the sample we detected) showed it didn’t do any DEP bypass, and DEP is not even enabled on Win7 with Office 2007
http://blogs.mcafee.com/mcafee-labs/solving-the-mystery-of-the-office-zero-day-exploit-and-dep
Microsoft [7]
FireEye [8]
CrowdStrike [9]
DEP Mystery: What’s Going On? • Our later analysis showed that it’s actually because of the
environment that the sample aims to exploit
• Office 2007 running on XP • Affected, no DEP, so simple heap spray works
• Office 2007 running on Windows 7 • Affected, still no DEP, simple heap spray still works
• Office 2010 running on Windows XP • Affected, DEP on, heap spray doesn’t work!
• The sample that runs ROP gadgets is for Office 2010 on XP!
The Attack
• A massive attack • We have identified more than 60 unique samples since
the first discovery • We were not “just lucky” to detect this
• First sample can be tracked to early July 2013 on VirusTotal • In the wild for at least four months
• Someone provided some good insights on Twitter • http://pastebin.com/64pBCgbw
• A carefully orchestrated attack • Exploits prepared for every vulnerable environment (as
seen in the DEP mystery)
• A nice exploit template • Bad guys need to modify only a little to make a future
Office zero day
Agenda
Exploit Detection and CVE-2013-3906
The Idea
3
2
4
5 Summary
The Situation We Are In 1
Exploring in the Wild
I: Document Tracking
• Network activities are important behaviors
• Previously, I thought a document (PDF, Office) wouldn’t connect to third-party address. You?
• But I was pretty surprised: • Office documents connecting to third parties are *usual*
• Found too many samples • Mostly embedding remote images
• Not too many for PDFs, in two categories: • A patched vuln in PDF JavaScript API implementation • A feature in “rights-managed” PDFs
Office Document Tracking
• Affected all Office file formats, and RTF
• Mostly because of embedded pictures via HTTP or UNC protocols
• Even if Office sometimes warns users, the traffic has happened
PDF Document Tracking
• In April 2013, we reported a vuln in PDF JavaScript implementation that brings PDF tracking
• http://blogs.mcafee.com/mcafee-labs/tracking-pdf-usage-poses-a-security-problem
• This was considered a security vuln and was patched
• We also found a more complicated one • A PDF signed by Adobe LiveCycle Rights Management
• When a user opens the PDF, it connects to the “server” (defined in the PDF) to check out “policies”
• Anyone can define the server in the PDF • This is a PDF feature! - “rights management”
And More.. • Is this a big deal?
• It depends on you.. Your Privacy!
• Who is exploiting it? • A number of “services” are
offered online
• Note: All samples were found in the wild, some *may* have malicious intentions (domains registered under services companies)
• More discussion : http://justhaifei1.blogspot.com/2013/10/document-tracking-what-you-should-know.html
II: Unusual Crashes
• If you are exploring in the wild, anything could happen
• A crash is a strongly suspicious behavior
• Various crashes we have seen: • Crash “on purpose”—Office! • Unconfirmed crashes in Office • Stack overflow in IE • ..
Crash on Purpose • We have detected a couple hundred samples that crash a
fully patched Office
• mscomctl.ocx 6.1.98.34, offset 0x00054f86 => CVE-2012-0158
• kernel32.dll, offset 0x00012fd3 => CVE-2013-3906 (integer overflow exception)
Crash on Purpose
• They really are “on purpose”! • Believe or not, this is how MS Office patches
vulns :P • I personally have no idea about this
• I got a novel idea to detect previous Office exploits
with updated systems • Just check where it crashed (offset and stacktrace) • Highly-accurate detection*
*During my analysis, I saw a lack of VT detections for many previous
Office exploits
Crash Not on Purpose
• Sometimes, even clean samples will deliver crashes!
• Need tough, manual research • Our strategy is to first rule out the possibility of
zero-day exploit • Though, sometimes you can’t be 100% sure! Especially for
Office binary formats
• No time to do deep/full research for every one
• “Free zero-day time”!
Excel Zero-Day Crash • Crashes on Office
2007/2013, but not Office 2010
• .XLS
• No malicious content found
• No detection on VT
Excel Zero-Day Crash 2 • Crashes all versions of
Office
• .XLS
• No malicious content found
• No detection on VT
IE11 Stack Overflow • Not a stack buffer overflow
• Not malicious
• Just an unfortunate endless loop
• A sample window for a sense of IE11’s QA
III: Identifying Unknown Attacks & Tech
• A real example showing how this approach can detect unknown potential attacks & provide “entry points” for future research
• During our project, we saw some web pages will trigger the IE process accessing following locations
• While most others didn’t
• A typical Windows searching order when the full path is not provided
Some Backgrounds Found
• Future manual research shows that it’s triggered by the following JavaScript code
• The code we detected is hosted at: • http://cpro.baidustatic.com/cpro/ui/
ci.js • A website owned by Baidu.com, a
well-known Chinese search giant • Not a malicious exploit • Used to detect if user is using the
360 Browser, a browser developed by another well-known Chinese local company 360.cn
• Not a new trick, discussed here: • http://segmentfault.com/q/1010000
000117437
Some More Thoughts
• An entry point for future research
• What we learn? • Leveraging “Image()” object & the “res://” protocol
allows you to access local file via IE • Providing full path will allow to determinate the
existence for specific local file (most for PE files) • Pretty much like the “EMET checking” trick in the
recent CVE-2014-0322 IE10 exploit (discussed before)
• Thoughts: should also be considered as a security vulnerability?
IV: Example: Identifying a New Feature
• As we’ve discussed, “DNA comparison” will help on identifying some interesting features
• Just an example, in PDFs • We noted some little-known DLL(s) are loaded
• MD5: 98D3249FE81732805685F538EB57A518 • Indicates that Adobe Reader can play multimedia
natively in PDFs • An entry point: Fuzzing? REing? More?
Agenda
Exploit Detection and CVE-2013-3906
The Idea
3
2
4
5 Summary
The Situation We Are In 1
Exploring in the Wild
Summary and Conclusion
• This “Big Data” approach not only helps (zero-day) exploit detection, but also benefits advanced security research: • Provides various “entry points” for future research • Have a “surface” view of the application’s
behaviors as well as features • Open our eyes!
• Have a sense of the exploit threat landscape • Zero-day detection! • A way to find most hidden attacks
Challenges
• Manpower • Need teamwork to build it (architecting,
automation, coding, etc.) • Need a talent security research team to analyze
various samples that attract interest, especially in early stages
• Still, the devil is in the details
• A research-oriented project • Careful automation to collect meaningful data • Required strong security knowledge on
application and OS
Major References [1] Dion Blazakis, "Interpreter Exploitation: Pointer Inference and Spraying" [Online]. Available: http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf [2] Haifei Li, "Smashing the Heap with Vector: Advanced Exploitation Technique in Recent Flash Zero-day Attack" [Online]. Available: https://sites.google.com/site/zerodayresearch/smashing_the_heap_with_vector_Li.pdf [3] Federico Muttis, " HTML5 Heap Sprays: Pwn all the things" [Online]. Available: http://exploiting.files.wordpress.com/2012/10/html5-heap-spray.pdf [4] Matthieu Bonetti, "CVE-2013-0640: Adobe Reader XFA oneOfChild Un-initialized memory vulnerability"[Online]. Available: https://labs.portcullis.co.uk/blog/cve-2013-0640-adobe-reader-xfa-oneofchild-un-initialized-memory-vulnerability-part-1 [5] Haifei Li, "McAfee Labs Detects Zero-Day Exploit Targeting Microsoft Office" [Online]. Available: http://blogs.mcafee.com/mcafee-labs/mcafee-labs-detects-zero-day-exploit-targeting-microsoft-office-2 [6] Vyacheslav Zakorzhevsky, "CVE-2014-0497 – a 0-day vulnerability" [Online]. Available: https://www.securelist.com/en/blog/8177/CVE_2014_0497_a_0_day_vulnerability [7] Elia Florio, "CVE-2013-3906: a graphics vulnerability exploited through Word documents" [Online]. Available: http://blogs.technet.com/b/srd/archive/2013/11/05/cve-2013-3906-a-graphics-vulnerability-exploited-through-word-documents.aspx [8] Xiaobo Chen, Dan Caselden and Mike Scott, "The Dual Use Exploit: CVE-2013-3906 Used in Both Targeted Attacks and Crimeware Campaigns" [Online]. Available: http://www.fireeye.com/blog/technical/cyber-exploits/2013/11/the-dual-use-exploit-cve-2013-3906-used-in-both-targeted-attacks-and-crimeware-campaigns.html [9] Jason Geffner, "Analysis of a CVE-2013-3906 Exploit“ [Online]. Available: http://www.crowdstrike.com/blog/analysis-cve-2013-3906-exploit/index.html
Top Related