A Crawler-based Study of Spyware on the Web Authors: Alexander Moshchuk, Tanya Bragin, Steven...

20
A Crawler-based Study of Spyware on the Web Authors: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, and Henry M. Levy University of Washington 13 th Annual Network and Distributed System Security Symposium (NDSS 2006) Presented by Hao Cheng, 2006.03

Transcript of A Crawler-based Study of Spyware on the Web Authors: Alexander Moshchuk, Tanya Bragin, Steven...

A Crawler-based Study of Spyware on the Web

Authors: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, and Henry M. LevyUniversity of Washington13th Annual Network and Distributed System Security Symposium (NDSS 2006)

Presented by Hao Cheng, 2006.03

What is Spyware?

• Spyware (wiki): “a broad category of malicious software designed to intercept or take partial control of a computer’s operation without the informed consent of that machine’s owner or legitimate user”.

• no self-replica• keylogging, dialer, Trojan downloader,

browser hijacker, adware.

from wiki

• Two types of spyware:– spyware-infected executables: piggy-

backed spyware code attached.– drive-by download: exploit

vulnerability in user’s browser.

Contribution

• A quantitative analysis of the extent of spyware content in the Web.

• Internet point of view, study websites.

• have answers to below questions:

.

• Crawl webpages– May 2005, 18.2 millions URL– Oct 2005, 21.8 millions URL

• Virtual Machine (VM) to sandbox and analyze malicious content

• spyware-infected executables: commercial anti-spyware tools

• Drive-by download: heuristic triggers

Spyware-Infected

• automated solution– determine whether a web object has

executable software– download, install, and execute in VM– analyze, identify.

• .

steps

• Finding executables in web– HTTP header

content-type = application/octet-stream– URL has extension (.exe, .cab, .msi)– After downloading, the beginning bits in

a file to identify file type.• Automatic Install

– use heuristic to simulate common user interaction during the process of installation.

steps

• The last step- Analyze– Lavasoft AdAware anti-spyware tool.

(use signature within its detection database).

– script to launch the installed software and collect the logs generated by the anti-spyware tool.

– identify functions of those spywares.

• .

Drive-by Download• automated solution

– visit potential malicious webpage in unmodified browser in a clean VM

– any attempt to break out of security sandbox of browser- suspicious

– perform AdAware scan to detect installed spyaware.

• .

Complex web content

• Complex web content (JavaScript)• Time bomb code (occur in some future):

accelerate OS wall-clock 15 times• Page-close code, simulate page-close by

fetching a clear webpage to cause code insurgence.

• Pop-up code, wait for all pop-up window to finish loading and then closed them in order to trigger any potential codes.

Browser Configuration

• IE 6.0 on unpatched XP.• cfg_y, when IE ask for permission, all

approved.• cfg_n, refuse all requests for

permission.• most malicious, simple visit a

webpage will cause infection.• also study Firefox, basically more

secure.

System

• 10-node cluster• dual-processor, 4GB RAM, 80GB

disk• one VM per processor

Performance

• 92 second- 1st type spyware– 1-2 second creating a VM– 55 seconds installing and running

executables– 35 seconds AdAware Sweep– Analyze 18,782 spywares per day

• 11.7 second- 2nd type spyware– 6.3 second- restart a browser and load a

single webpage.– 108 second- AdAware pages with trigger

(5%)– Analyze 14,768 pages per CPU per day

Executable

• over 2,500 web sites• 8 different categories• for each web site, crawl to a depth =

3 from the top page.• Average 6,577 pages per site.• Also crawl “random selected” web

sites.

.

.some spyware has multiple functions.

• Summary– around 90 distinct executable spyware.– instances spread 4% of domains.– 1 out of 20 executables in web are

spyware.– 2 new executable spywares come out per

month.

Drive-by Download

• webpages selected from different categories,

.

limitation

• heavily rely on commercial anti-spyware software.

• Many computers are patched, and now less vulnerabilities.

Questions?